HAMT with SIMD (swiss table like) #131

arthurprs · 2025-08-16T08:58:41Z

PR changes the small and non-small nodes to hold partial hashes (u8x16) that can be used to quickly find items. This also allows small nodes to grow up to 100% occupancy and non-small nodes to grow up until one of their groups overflows, which produces very dense tries.

Took a long time to flesh this one out. The benchmarks are sensitive to various aspects, including the data type and how the trie looks. It's hard to isolate cache effects, so I suspect the denser trie will look even better in practice.

Benchmark	Baseline	New	Change
hashmap_i64/insert_100	36.32 µs	37.31 µs	+2.7%
hashmap_i64/insert_1000	680.60 µs	632.24 µs	-7.1%
hashmap_i64/insert_10000	10.25 ms	10.17 ms	-0.8%
hashmap_i64/insert_50000	64.11 ms	64.21 ms	+0.2%
hashmap_i64/insert_mut_100	4.92 µs	4.25 µs	-13.6%
hashmap_i64/insert_mut_1000	71.50 µs	58.50 µs	-18.2%
hashmap_i64/insert_mut_5000	276.75 µs	238.43 µs	-13.8%
hashmap_i64/insert_mut_10000	628.02 µs	406.68 µs	-35.2%
hashmap_i64/insert_mut_50000	4.20 ms	3.84 ms	-8.6%
hashmap_i64/insert_mut_100000	8.22 ms	7.24 ms	-11.9%
hashmap_i64/iter_1000	6.95 µs	6.48 µs	-6.8%
hashmap_i64/iter_10000	56.16 µs	58.17 µs	+3.6%
hashmap_i64/iter_100000	1.48 ms	1.25 ms	-15.5%
hashmap_i64/lookup_100	1.32 µs	1.38 µs	+4.5%
hashmap_i64/lookup_1000	16.95 µs	14.49 µs	-14.5%
hashmap_i64/lookup_5000	93.02 µs	79.98 µs	-14.0%
hashmap_i64/lookup_10000	204.87 µs	169.65 µs	-17.2%
hashmap_i64/lookup_50000	1.62 ms	1.62 ms	0.0%
hashmap_i64/lookup_100000	3.25 ms	3.29 ms	+1.2%
hashmap_i64/lookup_500000	58.68 ms	24.94 ms	-57.5%
hashmap_i64/lookup_ne_10000	215.75 µs	143.57 µs	-33.5%
hashmap_i64/lookup_ne_100000	3.42 ms	2.86 ms	-16.4%
hashmap_i64/remove_100	33.34 µs	34.39 µs	+3.1%
hashmap_i64/remove_1000	675.75 µs	671.72 µs	-0.6%
hashmap_i64/remove_10000	10.31 ms	10.33 ms	+0.2%
hashmap_i64/remove_50000	64.78 ms	66.75 ms	+3.0%
hashmap_i64/remove_mut_100	5.26 µs	4.41 µs	-16.2%
hashmap_i64/remove_mut_1000	63.75 µs	51.62 µs	-19.0%
hashmap_i64/remove_mut_10000	645.17 µs	484.52 µs	-24.9%
hashmap_str/insert_100	63.60 µs	62.45 µs	-1.8%
hashmap_str/insert_1000	977.86 µs	1042.10 µs	+6.6%
hashmap_str/insert_10000	12.92 ms	12.83 ms	-0.7%
hashmap_str/insert_50000	80.28 ms	82.95 ms	+3.3%
hashmap_str/insert_mut_100	8.41 µs	7.97 µs	-5.2%
hashmap_str/insert_mut_1000	116.37 µs	104.64 µs	-10.1%
hashmap_str/insert_mut_5000	510.01 µs	482.34 µs	-5.4%
hashmap_str/insert_mut_10000	1127.10 µs	901.49 µs	-20.0%
hashmap_str/insert_mut_50000	6.68 ms	6.58 ms	-1.5%
hashmap_str/insert_mut_100000	15.86 ms	14.79 ms	-6.7%
hashmap_str/iter_1000	7.04 µs	6.25 µs	-11.2%
hashmap_str/iter_10000	58.57 µs	57.42 µs	-2.0%
hashmap_str/iter_100000	1.56 ms	1.28 ms	-17.9%
hashmap_str/lookup_100	1.84 µs	1.89 µs	+2.7%
hashmap_str/lookup_1000	21.93 µs	21.99 µs	+0.3%
hashmap_str/lookup_5000	144.90 µs	145.87 µs	+0.7%
hashmap_str/lookup_10000	486.16 µs	418.74 µs	-13.9%
hashmap_str/lookup_50000	3.10 ms	3.05 ms	-1.6%
hashmap_str/lookup_100000	6.80 ms	6.51 ms	-4.3%
hashmap_str/lookup_500000	169.22 ms	136.83 ms	-19.1%
hashmap_str/lookup_ne_10000	549.76 µs	397.95 µs	-27.6%
hashmap_str/lookup_ne_100000	7.69 ms	6.96 ms	-9.5%
hashmap_str/remove_100	62.93 µs	61.80 µs	-1.8%
hashmap_str/remove_1000	949.88 µs	966.02 µs	+1.7%
hashmap_str/remove_10000	13.01 ms	12.78 ms	-1.8%
hashmap_str/remove_50000	80.64 ms	79.35 ms	-1.6%
hashmap_str/remove_mut_100	7.67 µs	7.10 µs	-7.4%
hashmap_str/remove_mut_1000	86.58 µs	75.54 µs	-12.7%
hashmap_str/remove_mut_10000	1103.80 µs	923.78 µs	-16.3%

42 benchmarks (72%) showed improvements
11 benchmarks (19%) showed regressions
5 benchmarks (9%) remained neutral

Bulk operations (insert_mut and remove_mut) show substantial improvements across the board
Negative lookups** (lookup_ne) also show substantial improvements
Iteration (iter) shows significant improvement, due to a more compact trie shape.
Regular lookups took some extra effort to get the right code gen, but now it's mostly a win, and a big one at larger sizes.
Regressions are few and mostly on smaller sizes

jneem · 2025-08-19T03:14:05Z

Sorry for throwing out half-baked comments, as I haven't had time to try anything myself. But I think you can separate out the small and big nodes to save both some space and some branching. The big node would always be the "pure HAMT" mode and the small node would always be in simd mode, so it would look like

struct Node<A, P, ...> {
    data: SparseChunk<Entry<...>>
}

struct SmallNode<A> {
    control: wide::u8x16,
    // you don't need to reserve the zero hash value, because this is a dense chunk and so
    // you can mask out empty slots by and-ing the bitfield with 0xFF >> (16 - data.len())
    data: Chunk<A>,
}

arthurprs · 2025-08-31T14:39:57Z

That's an interesting idea.

Something else that could be helpful is to have more APIs in SparseChunk, such as being able to find an empty index or insert into the first free index and return the index.

arthurprs · 2025-10-07T17:12:55Z

Ready for review now. ~~The structure could probably be improved a little bit more (e.g. have a different type for non-SIMD nodes), but the results are already substantial.~~ Edit: Fixed

~~CI is failing because of min Rust version, but I can fix it by downgrading wide to 0.7 later.~~ Edit: Fixed

jneem

Very nice, thanks for following up on this!

jneem · 2025-11-22T20:08:26Z

.gitignore

 target/
 dist/
 **/*.rs.bk
+Cargo.lock


Was this intentional? I think I prefer not to ignore it.

Probably accidental, removed now.

Remove Cargo.lock from .gitignore

arthurprs marked this pull request as ready for review August 16, 2025 09:01

arthurprs force-pushed the hashmap-optimization-simd branch 2 times, most recently from 146aa21 to 7ed594d Compare August 18, 2025 20:40

arthurprs changed the title ~~HAMT with a smaller node variant and SIMD lookups~~ HAMT with SIMD (swiss table like) Aug 18, 2025

arthurprs marked this pull request as draft September 29, 2025 16:20

arthurprs force-pushed the hashmap-optimization-simd branch 2 times, most recently from 56d3d7a to 37a59b6 Compare October 6, 2025 22:22

arthurprs marked this pull request as ready for review October 7, 2025 17:07

arthurprs force-pushed the hashmap-optimization-simd branch 4 times, most recently from 16c4c16 to e8a83eb Compare October 12, 2025 22:14

HAMT with SIMD (swiss table like)

61f6fa8

arthurprs force-pushed the hashmap-optimization-simd branch from e8a83eb to 61f6fa8 Compare October 12, 2025 22:27

jneem approved these changes Nov 27, 2025

View reviewed changes

Remove Cargo.lock from .gitignore

5c2d5c6

Remove Cargo.lock from .gitignore

jneem merged commit c1e5e0d into jneem:main Dec 9, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HAMT with SIMD (swiss table like) #131

HAMT with SIMD (swiss table like) #131

Uh oh!

arthurprs commented Aug 16, 2025 •

edited

Loading

Uh oh!

jneem commented Aug 19, 2025

Uh oh!

arthurprs commented Aug 31, 2025

Uh oh!

arthurprs commented Oct 7, 2025 •

edited

Loading

Uh oh!

jneem left a comment

Uh oh!

jneem Nov 22, 2025

Uh oh!

arthurprs Nov 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HAMT with SIMD (swiss table like) #131

HAMT with SIMD (swiss table like) #131

Uh oh!

Conversation

arthurprs commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jneem commented Aug 19, 2025

Uh oh!

arthurprs commented Aug 31, 2025

Uh oh!

arthurprs commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jneem left a comment

Choose a reason for hiding this comment

Uh oh!

jneem Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

arthurprs Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arthurprs commented Aug 16, 2025 •

edited

Loading

arthurprs commented Oct 7, 2025 •

edited

Loading