-
Notifications
You must be signed in to change notification settings - Fork 30
HAMT with SIMD (swiss table like) #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
146aa21 to
7ed594d
Compare
|
Sorry for throwing out half-baked comments, as I haven't had time to try anything myself. But I think you can separate out the small and big nodes to save both some space and some branching. The big node would always be the "pure HAMT" mode and the small node would always be in simd mode, so it would look like struct Node<A, P, ...> {
data: SparseChunk<Entry<...>>
}
struct SmallNode<A> {
control: wide::u8x16,
// you don't need to reserve the zero hash value, because this is a dense chunk and so
// you can mask out empty slots by and-ing the bitfield with 0xFF >> (16 - data.len())
data: Chunk<A>,
} |
|
That's an interesting idea. Something else that could be helpful is to have more APIs in SparseChunk, such as being able to find an empty index or insert into the first free index and return the index. |
56d3d7a to
37a59b6
Compare
|
Ready for review now.
|
16c4c16 to
e8a83eb
Compare
e8a83eb to
61f6fa8
Compare
jneem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for following up on this!
.gitignore
Outdated
| target/ | ||
| dist/ | ||
| **/*.rs.bk | ||
| Cargo.lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentional? I think I prefer not to ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably accidental, removed now.
Remove Cargo.lock from .gitignore
PR changes the small and non-small nodes to hold partial hashes (u8x16) that can be used to quickly find items. This also allows small nodes to grow up to 100% occupancy and non-small nodes to grow up until one of their groups overflows, which produces very dense tries.
Took a long time to flesh this one out. The benchmarks are sensitive to various aspects, including the data type and how the trie looks. It's hard to isolate cache effects, so I suspect the denser trie will look even better in practice.
insert_mutandremove_mut) show substantial improvements across the boardlookup_ne) also show substantial improvementsiter) shows significant improvement, due to a more compact trie shape.