Skip to content

Conversation

@gendx
Copy link

@gendx gendx commented Oct 11, 2025

Paralight is a lightweight parallelism library tuned for indexed structures such as slices. Given that the internal representation of hashbrown's hash tables is a slice of buckets (that each optionally contain a value), it's a good fit to integrate with (gendx/paralight#5).

This pull request is here to iterate on the design. As the integration needs access to the raw hash table representation, it's done here in the hashbrown crate (similarly to Rayon's integration).

@clarfonthey
Copy link
Contributor

I think that it's fair to do this given the presence of a Rayon implementation also (although, IMHO, we shouldn't be including these implementations in this crate…) but I also think that regardless of what is done, any primitives needed to make this work should be added to HashTable directly so that people can code their own versions of this.

For example, at one point I was contemplating offering an API that directly provided access to the &[MaybeUninit<T>] and &[Tag] slices in the table, and I think such an API might be helpful here too. Note that the tags and items in the hash table are actually in reverse order to each other, since one uses negative offsets from the central pointer and the other one uses positive offsets.

@Amanieu
Copy link
Member

Amanieu commented Oct 20, 2025

I talked about this in person with @gendx at EuroRust. The main issue with the current implementation is that it returns an iterator of Option<&(mut) T> instead of &(mut) T. I think this should be addressed before paralight supported is added to this crate.

A mid-level API that just exposes buckets like #613 might work for par_iter but would be insufficient for par_iter_mut and into_par_iter. You would need a more complete low-level API as proposed in #545 for that, but it would fundamentally be unstable since it exposes too much of the internal layout of the hash table.

@gendx
Copy link
Author

gendx commented Jan 14, 2026

With Paralight v0.0.10, it's now possible to define parallel iterators where each bucket produces an Option<Self::Item> without needing Item to contain an option. This addresses @Amanieu's comment and makes the API much nicer to use. This change is therefore now ready for review.

Some general remarks:

  • Regarding Send/Sync bounds, I've created dedicated wrapper types around RawTable, because the default bounds were not suitable:
    • into_par_iter() requires Send items but the wrapper needs to be Sync to be shared with worker threads.
    • HashMap::par_iter_mut() requires a Sync key (to produce &K) and a Send value (to produce &mut V) and again the wrapper needs to be Sync.
    • In all cases, the Allocator shouldn't have any particular Send nor Sync bounds, as Paralight iterators don't deallocate (nor allocate) the table on other threads. Only the Drop implementation will deallocate in the into_par_iter() case, and none of the wrappers have new Send implementations anyway.
  • I've added a new support function RawTable::deallocate_cleared_table() to directly deallocate the table without clearing the control bytes when the iterator is dropped (after having fetched all the items*). This should be more efficient than using the pre-existing clear_no_drop() (although using that would work too).
  • Paralight is still in the 0.0.x version. It's already usable but the API changes often between versions as more iterator adaptors become supported, hence I haven't committed to publishing a 0.x version yet. That said I don't envision further changes to the IntoParallelSource traits for the time being. Happy to leave this pull request open and keep updating it until Paralight v0.1 is ready (but feedback on the design is valuable to progress towards version 0.1).

*Internally, the Paralight execution engine will ensure that each bucket index has been passed once and only once to a call to fetch_item() + cleanup_item_range() calls. While it's possible to bypass the execution engine and directly manipulate and drop a SourceDescriptor with safe code (definitely not the intended use of the API), this is at worst a memory leak (not dropping all the items in the map) and therefore not unsound.

@gendx gendx marked this pull request as ready for review January 14, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants