Skip to content

Conversation

@arthurprs
Copy link
Contributor

@arthurprs arthurprs commented Aug 16, 2025

PR changes the small and non-small nodes to hold partial hashes (u8x16) that can be used to quickly find items. This also allows small nodes to grow up to 100% occupancy and non-small nodes to grow up until one of their groups overflows, which produces very dense tries.

Took a long time to flesh this one out. The benchmarks are sensitive to various aspects, including the data type and how the trie looks. It's hard to isolate cache effects, so I suspect the denser trie will look even better in practice.

Benchmark Baseline New Change
hashmap_i64/insert_100 36.32 µs 37.31 µs +2.7%
hashmap_i64/insert_1000 680.60 µs 632.24 µs -7.1%
hashmap_i64/insert_10000 10.25 ms 10.17 ms -0.8%
hashmap_i64/insert_50000 64.11 ms 64.21 ms +0.2%
hashmap_i64/insert_mut_100 4.92 µs 4.25 µs -13.6%
hashmap_i64/insert_mut_1000 71.50 µs 58.50 µs -18.2%
hashmap_i64/insert_mut_5000 276.75 µs 238.43 µs -13.8%
hashmap_i64/insert_mut_10000 628.02 µs 406.68 µs -35.2%
hashmap_i64/insert_mut_50000 4.20 ms 3.84 ms -8.6%
hashmap_i64/insert_mut_100000 8.22 ms 7.24 ms -11.9%
hashmap_i64/iter_1000 6.95 µs 6.48 µs -6.8%
hashmap_i64/iter_10000 56.16 µs 58.17 µs +3.6%
hashmap_i64/iter_100000 1.48 ms 1.25 ms -15.5%
hashmap_i64/lookup_100 1.32 µs 1.38 µs +4.5%
hashmap_i64/lookup_1000 16.95 µs 14.49 µs -14.5%
hashmap_i64/lookup_5000 93.02 µs 79.98 µs -14.0%
hashmap_i64/lookup_10000 204.87 µs 169.65 µs -17.2%
hashmap_i64/lookup_50000 1.62 ms 1.62 ms 0.0%
hashmap_i64/lookup_100000 3.25 ms 3.29 ms +1.2%
hashmap_i64/lookup_500000 58.68 ms 24.94 ms -57.5%
hashmap_i64/lookup_ne_10000 215.75 µs 143.57 µs -33.5%
hashmap_i64/lookup_ne_100000 3.42 ms 2.86 ms -16.4%
hashmap_i64/remove_100 33.34 µs 34.39 µs +3.1%
hashmap_i64/remove_1000 675.75 µs 671.72 µs -0.6%
hashmap_i64/remove_10000 10.31 ms 10.33 ms +0.2%
hashmap_i64/remove_50000 64.78 ms 66.75 ms +3.0%
hashmap_i64/remove_mut_100 5.26 µs 4.41 µs -16.2%
hashmap_i64/remove_mut_1000 63.75 µs 51.62 µs -19.0%
hashmap_i64/remove_mut_10000 645.17 µs 484.52 µs -24.9%
hashmap_str/insert_100 63.60 µs 62.45 µs -1.8%
hashmap_str/insert_1000 977.86 µs 1042.10 µs +6.6%
hashmap_str/insert_10000 12.92 ms 12.83 ms -0.7%
hashmap_str/insert_50000 80.28 ms 82.95 ms +3.3%
hashmap_str/insert_mut_100 8.41 µs 7.97 µs -5.2%
hashmap_str/insert_mut_1000 116.37 µs 104.64 µs -10.1%
hashmap_str/insert_mut_5000 510.01 µs 482.34 µs -5.4%
hashmap_str/insert_mut_10000 1127.10 µs 901.49 µs -20.0%
hashmap_str/insert_mut_50000 6.68 ms 6.58 ms -1.5%
hashmap_str/insert_mut_100000 15.86 ms 14.79 ms -6.7%
hashmap_str/iter_1000 7.04 µs 6.25 µs -11.2%
hashmap_str/iter_10000 58.57 µs 57.42 µs -2.0%
hashmap_str/iter_100000 1.56 ms 1.28 ms -17.9%
hashmap_str/lookup_100 1.84 µs 1.89 µs +2.7%
hashmap_str/lookup_1000 21.93 µs 21.99 µs +0.3%
hashmap_str/lookup_5000 144.90 µs 145.87 µs +0.7%
hashmap_str/lookup_10000 486.16 µs 418.74 µs -13.9%
hashmap_str/lookup_50000 3.10 ms 3.05 ms -1.6%
hashmap_str/lookup_100000 6.80 ms 6.51 ms -4.3%
hashmap_str/lookup_500000 169.22 ms 136.83 ms -19.1%
hashmap_str/lookup_ne_10000 549.76 µs 397.95 µs -27.6%
hashmap_str/lookup_ne_100000 7.69 ms 6.96 ms -9.5%
hashmap_str/remove_100 62.93 µs 61.80 µs -1.8%
hashmap_str/remove_1000 949.88 µs 966.02 µs +1.7%
hashmap_str/remove_10000 13.01 ms 12.78 ms -1.8%
hashmap_str/remove_50000 80.64 ms 79.35 ms -1.6%
hashmap_str/remove_mut_100 7.67 µs 7.10 µs -7.4%
hashmap_str/remove_mut_1000 86.58 µs 75.54 µs -12.7%
hashmap_str/remove_mut_10000 1103.80 µs 923.78 µs -16.3%
  • 42 benchmarks (72%) showed improvements
  • 11 benchmarks (19%) showed regressions
  • 5 benchmarks (9%) remained neutral
  1. Bulk operations (insert_mut and remove_mut) show substantial improvements across the board
  2. Negative lookups** (lookup_ne) also show substantial improvements
  3. Iteration (iter) shows significant improvement, due to a more compact trie shape.
  4. Regular lookups took some extra effort to get the right code gen, but now it's mostly a win, and a big one at larger sizes.
  5. Regressions are few and mostly on smaller sizes

@arthurprs arthurprs marked this pull request as ready for review August 16, 2025 09:01
@arthurprs arthurprs force-pushed the hashmap-optimization-simd branch 2 times, most recently from 146aa21 to 7ed594d Compare August 18, 2025 20:40
@arthurprs arthurprs changed the title HAMT with a smaller node variant and SIMD lookups HAMT with SIMD (swiss table like) Aug 18, 2025
@jneem
Copy link
Owner

jneem commented Aug 19, 2025

Sorry for throwing out half-baked comments, as I haven't had time to try anything myself. But I think you can separate out the small and big nodes to save both some space and some branching. The big node would always be the "pure HAMT" mode and the small node would always be in simd mode, so it would look like

struct Node<A, P, ...> {
    data: SparseChunk<Entry<...>>
}

struct SmallNode<A> {
    control: wide::u8x16,
    // you don't need to reserve the zero hash value, because this is a dense chunk and so
    // you can mask out empty slots by and-ing the bitfield with 0xFF >> (16 - data.len())
    data: Chunk<A>,
}

@arthurprs
Copy link
Contributor Author

That's an interesting idea.

Something else that could be helpful is to have more APIs in SparseChunk, such as being able to find an empty index or insert into the first free index and return the index.

@arthurprs arthurprs marked this pull request as draft September 29, 2025 16:20
@arthurprs arthurprs force-pushed the hashmap-optimization-simd branch 2 times, most recently from 56d3d7a to 37a59b6 Compare October 6, 2025 22:22
@arthurprs arthurprs marked this pull request as ready for review October 7, 2025 17:07
@arthurprs
Copy link
Contributor Author

arthurprs commented Oct 7, 2025

Ready for review now. The structure could probably be improved a little bit more (e.g. have a different type for non-SIMD nodes), but the results are already substantial. Edit: Fixed

CI is failing because of min Rust version, but I can fix it by downgrading wide to 0.7 later. Edit: Fixed

@arthurprs arthurprs force-pushed the hashmap-optimization-simd branch 4 times, most recently from 16c4c16 to e8a83eb Compare October 12, 2025 22:14
@arthurprs arthurprs force-pushed the hashmap-optimization-simd branch from e8a83eb to 61f6fa8 Compare October 12, 2025 22:27
Copy link
Owner

@jneem jneem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for following up on this!

.gitignore Outdated
target/
dist/
**/*.rs.bk
Cargo.lock
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional? I think I prefer not to ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably accidental, removed now.

Remove Cargo.lock from .gitignore
@jneem jneem merged commit c1e5e0d into jneem:main Dec 9, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants