Skip to content

Conversation

@hhwyt
Copy link

@hhwyt hhwyt commented Nov 17, 2025

Close tikv/tikv#19106
Fix data race in MemTable::GetBloomFilter() on ARM weak memory model introduced in #305 (commit 405de0e)

Use release-acquire memory ordering instead of relaxed to ensure
DynamicBloom object is fully initialized before pointer publication.

Problem:

  • On ARM (weak memory model), using relaxed ordering allows the pointer
    to be published before DynamicBloom::data_ is initialized
  • Readers may see non-null pointer but access uninitialized data_,
    causing crashes

Solution:

  • Use memory_order_acquire for load and memory_order_release for store
  • This establishes synchronizes-with relationship, guaranteeing readers
    only see fully initialized objects

Performance impact:

  • The acquire load executes on every call (hot path in Get/Add operations)
  • Overhead is minimal: typically 0-2 cycles on x86 (same as relaxed),
    ~1-3 cycles on ARM (may add memory barrier)
  • This overhead is <1% of total latency compared to subsequent bloom
    filter operations (hash computation + memory access)

The issue was verified using Relacy formal verification tool, which confirmed the data race exists when using relaxed memory ordering.

The bloom_filter_ptr_ was published using memory_order_relaxed, which
does not establish a proper happens-before relationship between the
pointer store and the DynamicBloom object initialization. This can
lead to readers observing a partially constructed DynamicBloom on weak
memory models (e.g., ARM), causing crashes when accessing uninitialized
data.

The issue was verified using Relacy formal verification tool, which
confirmed the data race exists when using relaxed memory ordering.

Fix:
- Change load from memory_order_relaxed to memory_order_acquire
- Change store from memory_order_relaxed to memory_order_release

This establishes a proper release-acquire synchronization, ensuring
that readers only observe a fully constructed DynamicBloom object after
the pointer is published.

Fixes: facebook#305 (introduced in commit 405de0e)
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 17, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign little-wallace for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hhwyt
Copy link
Author

hhwyt commented Nov 17, 2025

close because this is duplicated to #424.

@hhwyt hhwyt closed this Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiKV Crash during RocksDB write operation

1 participant