-
Notifications
You must be signed in to change notification settings - Fork 3
Index strings #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index strings #19
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a string indexing mechanism to optimize storage, which should reduce file sizes and memory consumption. The implementation is centered around a new MStrings manager for string-to-ID mapping, and the data classes are updated to use these IDs. The changes are well-implemented, but I found a critical copy-paste error in MGenerator.java that would cause database insertion to fail. With that fix, this is a solid improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements string indexing to optimize storage by replacing frequently used strings with integer references. The changes introduce a new MStrings object that maintains a bidirectional mapping between strings and integer IDs, significantly reducing data size (from 3,089 to 64 strings for STM data).
Key changes:
- New
MStringsandMStringclasses to manage string-to-integer mappings - Updated data classes to convert strings to IDs via
.toStringIds()extension function - Migration from
Constants.COLUMN_SEPARATOR_toSQLUtils.COLUMN_SEPARATORfor consistency
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| MStrings.kt | New singleton object managing bidirectional string-to-ID mappings with thread-safe operations |
| MString.kt | New data class representing a string with its integer ID and serialization logic |
| MStop.kt | Updated to index stop names and use consistent column separator |
| MRoute.kt | Updated to index route short/long names and use consistent column separator |
| MSchedule.kt | Updated to index headsign values, replace Constants.EMPTY with .orEmpty(), and use consistent separator |
| MDirection.kt | Updated to index headsign values, introduce HEADSIGN_DEFAULT_VALUE constant |
| MServiceIds.kt | Added count() method for consistency with MStrings |
| MServiceId.kt | Updated to use unquotesUnescape() and consistent separator |
| MServiceDate.kt | Updated to use consistent column separator |
| MFrequency.kt | Updated to use consistent column separator |
| MDirectionStop.kt | Updated to use consistent column separator |
| MReader.kt | Added loadStrings() method to load persisted string mappings |
| MGenerator.java | Added dumpStrings() method and logging for string count |
| GSpec.java | Replaced separator usage with string literals for toString() |
| SQLUtils.kt | Exposed COLUMN_SEPARATOR constant |
| DumpDbUtils.kt | Added string table creation/dropping in database setup |
| DefaultAgencyTools.java | Added loading of last strings on startup |
| Constants.java | Removed now-unused separator constants |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a string indexing mechanism to optimize storage, which is a great improvement. The implementation in MStrings and its usage across data classes are well done. I've found a couple of areas for improvement:
- One file uses a hardcoded separator instead of the new constant.
- The new
dumpStringsmethod could be improved for security and performance by usingPreparedStatementand for better resource management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Strings count:
643,089