I noticed that there were races in the code.
For example, in line 45 of src/concurrent_map/warp:
uint32_t src_unit_data = (next == SlabHashT::A_INDEX_POINTER)
? *(getPointerFromBucket(src_bucket, laneId))
: *(getPointerFromSlab(next, laneId));
Here a weak load operation is used to read data from the bucket.
Later on (line 65/line 83), an atomicCAS is performed to the same location.
Different threads may read and atomicCAS on the same location, creating a race.