Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero heap allocations).
go get -u github.com/dtgorski/jsonlex
Using an io.Reader that directly uses system calls (e.g. os.File) will result in poor performance. Wrap your input reader with bufio.Reader or better bytes.Reader to achieve best results.
package main
import (
"bytes"
"github.com/dtgorski/jsonlex"
)
func main() {
reader := bytes.NewReader(
[]byte(`{ "foo": "bar", "baz": 42 }`),
)
cursor := jsonlex.NewCursor(reader, nil)
println(cursor.Curr().String())
println(cursor.Next().String())
if !cursor.Next().Is(jsonlex.TokenEOF) {
println("there is more ...")
}
}
package main
import (
"bytes"
"github.com/dtgorski/jsonlex"
)
func main() {
reader := bytes.NewReader(
[]byte(`{ "foo": "bar", "baz": 42 }`),
)
lexer := jsonlex.NewLexer(
func(kind jsonlex.TokenKind, load []byte, pos uint) bool {
save := make([]byte, len(load))
copy(save, load)
println(pos, kind, string(save))
return true
},
)
lexer.Scan(reader)
}
Please note, that the Scan() function is reentrant and subsequent invocations will continue to consume the available byte stream as long as you provide a reader that implements an UnreadByte() error interface, and you configure the Lexer with the LexerOptEnableUnreadBuffer option activated.
jsonlex |
Representation |
|---|---|
TokenEOF |
signals end of file/stream |
TokenERR |
error string (other than EOF) |
TokenLIT |
literal (true, false, null) |
TokenNUM |
float number |
TokenSTR |
"...\"..." |
TokenCOL |
: colon |
TokenCOM |
, comma |
TokenLSB |
[ left square bracket |
TokenRSB |
] right square bracket |
TokenLCB |
{ left curly brace |
TokenRCB |
} right curly brace |
Each benchmark consists of complete tokenization of a JSON document of a given size (2kB, 20kB, 200kB and 2000kB) using one CPU core. The unit doc/s means tokenized documents per second, so more is better.
The comparison candidate is Go's encoding/json.Decoder.Token() implementation.
| 2kB | 20kB | 200kb | 2000kB | |
|---|---|---|---|---|
encoding/json |
9910 doc/s |
1152 doc/s |
126 doc/s |
14 doc/s |
dtgorski/jsonlex |
71880 doc/s |
7341 doc/s |
753 doc/s |
85 doc/s |
cpus: 1 core (~8000 BogoMIPS)
goos: linux
goarch: amd64
pkg: github.com/dtgorski/jsonlex/bench
Benchmark_encjson_2kB 9910 120475 ns/op 36528 B/op 1963 allocs/op
Benchmark_encjson_20kB 1152 1040771 ns/op 318432 B/op 18231 allocs/op
Benchmark_encjson_200kB 126 9494534 ns/op 2877968 B/op 164401 allocs/op
Benchmark_encjson_2000kB 14 77593586 ns/op 23355856 B/op 1319126 allocs/op
Benchmark_jsonlex_lexer_2kB 71880 16691 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_20kB 7341 163210 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_200kB 753 1594025 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_2000kB 85 14107866 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_cursor_2kB 38002 31776 ns/op 3680 B/op 592 allocs/op
Benchmark_jsonlex_cursor_20kB 4058 300490 ns/op 25168 B/op 5446 allocs/op
Benchmark_jsonlex_cursor_200kB 422 2777058 ns/op 248816 B/op 49141 allocs/op
Benchmark_jsonlex_cursor_2000kB 50 23559879 ns/op 2254896 B/op 396298 allocs/op
The implementation and features of jsonlex follow the YAGNI principle.
There is no claim for completeness or reliability.
Try make:
$ make
make help Displays this list
make clean Removes build/test artifacts
make test Runs integrity test with -race
make bench Executes artificial benchmarks
make prof-cpu Creates CPU profiler output
make prof-mem Creates memory profiler output
make sniff Checks format and runs linter (void on success)
make tidy Formats source files, cleans go.mod
MIT - © dtg [at] lengo [dot] org