-
Notifications
You must be signed in to change notification settings - Fork 248
Add digit separators to Jsonnet #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
31c770b to
f10caa0
Compare
internal/parser/lexer.go
Outdated
| // Run the postprocessor if the token kind has one defined. | ||
| if pp, ok := tokenKindPostprocessors[kind]; ok { | ||
| data = pp(data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think this is an unnecessary generalisation. There are various other tokens that already put edited/processed content into the token data field, but they just have the processing inline in the lexer code, and call emitFullToken directly.
Examples:
- Chomping newlines from a text block:
go-jsonnet/internal/parser/lexer.go
Lines 786 to 794 in 604f4d7
var str string = cb.String() if chompTrailingNl { str = str[:len(str)-1] } l.emitFullToken(tokenStringBlock, str, stringBlockIndent, stringBlockTermIndent) l.resetTokenStart() return nil - Removing the quotes from a string literal:
go-jsonnet/internal/parser/lexer.go
Lines 900 to 905 in 604f4d7
if r == '"' { // Don't include the quotes in the token data l.emitFullToken(tokenStringDouble, l.input[l.tokenStart+1:l.pos.byteNo-1], "", "") l.resetTokenStart() break }
I think we can just do the same for lexNumber - currently it calls emitToken just before returning; it can process the token data and call emitFullToken instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I remember being surprised that this would be the first instance of needing to post-process a lexed token. I guess I didn't look hard enough.
|
Ok, I can go ahead and rebase+add-adjustments+merge this, unless you have further changes in flight. |
|
👍 Thanks John, I welcome the assist. I have nothing in flight. |
…onents See also the corresponding C++ jsonnet commit: google/jsonnet@82ebe7d There are some cases which are a little strange but lexically valid. - `1.2.3.4` lexically this tokenises as `1.2` DOT `3.4`, because a dot in the fractional or exponent part of a number is simply treated the same as any other possible terminating character (any character that isn't part of the valid number lexical syntax) - `1e2.34` lexically is `1e2` DOT `34` (same as the first case) - `1e2e34` lexically is `1e2` (number) `e34` (identifier) These behaviours are basically preserved/extrapolated in the case of digit separators, so for example `1_2.3_4.5_6` is lexically parsed as `12.34` DOT `56`. And `1e2_3e4` is lexically parsed as `1e23` (number), `e4` (identifier). These both look very confusing, but it probably doesn't matter because those token sequences are, I think, not valid syntactically so they'll just be rejected by the parser. Note that in JSON (and jsonnet), leading zeros are not allowed in numeric literals. This behaviour is explicitly kept with digit separators, so `0_5` is explicitly rejected. The alternatives are: - Treat underscore after an initial zero the same as any terminator character, so `0_5` lexes as tokens `0` followed by identifier `_5`. - Allow underscore, thereby breaking the no-leading-zeros rule, so `0_5` tokenises as `05`. Either option seems confusing, hence it seems better to explicitly reject an underscore after an initial zero.
f10caa0 to
a52ac8d
Compare
This adds digit separators (
1_000) to Jsonnet's numeric literals.Companion to the same PR in C++ repo: google/jsonnet#1160
Reference issue with spec proposal: google/jsonnet#1155