docs: proposal for thread safe thread manager #2739

irozzo-1A · 2025-12-05T15:57:22Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind test

/kind feature

/kind sync

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

ref: falcosecurity/falco#3749

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

poiana · 2025-12-05T15:57:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: irozzo-1A
Once this PR has been reviewed and has the lgtm label, please assign leogr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2025-12-05T16:03:38Z

Perf diff from master - unit tests

     4.25%    +13.19%  [.] std::__shared_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__weak_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
     3.40%    +11.27%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_get_use_count() const
     3.97%     +8.85%  [.] sinsp_threadinfo::get_main_thread()
     3.33%     +7.63%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_lock_nothrow()
     3.18%     +7.15%  [.] sinsp_threadinfo::update_main_fdtable()
     5.70%     +6.87%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
     2.99%     +6.67%  [.] sinsp_threadinfo::get_fd_table()
     1.99%     +5.87%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
    10.86%     -5.42%  [.] sinsp::next(sinsp_evt**)
     1.74%     +3.49%  [.] thread_group_info::get_first_thread() const

Heap diff from master - unit tests

peak heap memory consumption: 27.64K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: -24B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0522         -0.0523           248           235           247           234
BM_sinsp_split_median                                          -0.0557         -0.0556           248           234           247           234
BM_sinsp_split_stddev                                          +0.2094         +0.1640             1             2             1             2
BM_sinsp_split_cv                                              +0.2760         +0.2283             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0392         +0.0391            70            73            70            73
BM_sinsp_concatenate_paths_relative_path_median                +0.0455         +0.0455            70            73            69            73
BM_sinsp_concatenate_paths_relative_path_stddev                -0.7716         -0.7707             1             0             1             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.7803         -0.7793             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0565         -0.0568            44            41            44            41
BM_sinsp_concatenate_paths_empty_path_median                   -0.0672         -0.0675            44            41            44            41
BM_sinsp_concatenate_paths_empty_path_stddev                   +1.1334         +1.1354             1             2             1             2
BM_sinsp_concatenate_paths_empty_path_cv                       +1.2612         +1.2639             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0253         +0.0252            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_median                +0.0219         +0.0219            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_stddev                +2.3836         +2.4716             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_cv                    +2.3002         +2.3862             0             0             0             0

codecov · 2025-12-05T16:07:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.57%. Comparing base (e65bb86) to head (0d6c9b0).
⚠️ Report is 33 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2739      +/-   ##
==========================================
- Coverage   77.02%   74.57%   -2.45%     
==========================================
  Files         296      292       -4     
  Lines       30818    30025     -793     
  Branches     4670     4657      -13     
==========================================
- Hits        23738    22392    -1346     
- Misses       7080     7633     +553

Flag	Coverage Δ
libsinsp	`74.57% <ø> (-2.45%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ekoops

I read the proposal, and overall looks good. I just added some minor comments.
The critical point I would like to discuss is the global std::mutex protecting any modification to the topological structure in sinsp_thread_manager. This has a big impact, as it serializes any clone/clone3/fork/procexit or, in other terms, any thread addition/replacement. What is worse is that this applies also to unrelated topological thread operations.
In theory, if we adopt some form of partitioning, we could have each thread belonging, for its entire lifetime, to the same partition, and so being managed by the same worker thread. However, there are topological structures, both inside sinsp_thread_manager and sinsp_threadinfo that links thread belonging to different partitions. For example, the addition of one thread requires the insertion of a new element in sinsp_thread_manager::table_, sinsp_thread_manager::list_head_, and sinsp_thread_info::m_children on the parent side. I'm wondering if we could constraint this kind of accesses in such a way that always the same writer is designated for changes on the same threads/topological sectors... Following the above example, if we partition by thread id or, even better, by thread group id, we would solve the problem for sinsp_thread_manager::table_, bot not for the other ones...
Whatever is the mechanism, if we found a way of doing so, we could get rid of the global lock and have big benefits both in term of performance and code simplicity.

proposals/20251127-thread-safe-sinsp-thread-manager.md

irozzo-1A · 2025-12-09T17:27:44Z

I read the proposal, and overall looks good. I just added some minor comments. The critical point I would like to discuss is the global std::mutex protecting any modification to the topological structure in sinsp_thread_manager. This has a big impact, as it serializes any clone/clone3/fork/procexit or, in other terms, any thread addition/replacement. What is worse is that this applies also to unrelated topological thread operations. In theory, if we adopt some form of partitioning, we could have each thread belonging, for its entire lifetime, to the same partition, and so being managed by the same worker thread. However, there are topological structures, both inside sinsp_thread_manager and sinsp_threadinfo that links thread belonging to different partitions. For example, the addition of one thread requires the insertion of a new element in sinsp_thread_manager::table_, sinsp_thread_manager::list_head_, and sinsp_thread_info::m_children on the parent side. I'm wondering if we could constraint this kind of accesses in such a way that always the same writer is designated for changes on the same threads/topological sectors... Following the above example, if we partition by thread id or, even better, by thread group id, we would solve the problem for sinsp_thread_manager::table_, bot not for the other ones... Whatever is the mechanism, if we found a way of doing so, we could get rid of the global lock and have big benefits both in term of performance and code simplicity.

Yes, as pointed out the synchronization cost of writes is the weakest part of this approach, and I would like it would be possible to find a partition strategy to avoid the global writer lock.
Unfortunately, having a linked structure like this makes it difficult to find partition schema to avoid a sinsp_thread_infos in guaranteed to be updated by a single writer, that would drop the need for the writer lock.
There are probably more sophisticated approaches we could try, but I suggest starting with something simple as the global lock as a starting point. If we see the global writer lock is a show-stopper (that is not that unlikely 😄), we can try out something else.

This proposal has to considered as a starting point for experimentation, but we may need several iterations.

Signed-off-by: irozzo-1A <iacopo@sysdig.com>

github-project-automation bot added this to Falco Roadmap Dec 5, 2025

poiana added release-note-none do-not-merge/work-in-progress labels Dec 5, 2025

github-project-automation bot moved this to Todo in Falco Roadmap Dec 5, 2025

poiana added kind/design dco-signoff: no area/libsinsp area/proposals size/L labels Dec 5, 2025

poiana requested review from hbrueckner and terror96 December 5, 2025 15:57

ekoops reviewed Dec 9, 2025

View reviewed changes

irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from 6d27c93 to 352a971 Compare December 9, 2025 17:15

irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from 352a971 to d407b67 Compare December 9, 2025 18:37

poiana added dco-signoff: yes and removed dco-signoff: no labels Dec 9, 2025

irozzo-1A marked this pull request as ready for review December 10, 2025 08:27

poiana removed the do-not-merge/work-in-progress label Dec 10, 2025

docs: proposal for thread safe thread manager

0d6c9b0

Signed-off-by: irozzo-1A <iacopo@sysdig.com>

irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from d407b67 to 0d6c9b0 Compare December 18, 2025 15:30

ekoops added this to the 0.24.0 milestone Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: proposal for thread safe thread manager #2739

docs: proposal for thread safe thread manager #2739

Uh oh!

irozzo-1A commented Dec 5, 2025

Uh oh!

poiana commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

ekoops left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

irozzo-1A commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: proposal for thread safe thread manager #2739

Are you sure you want to change the base?

docs: proposal for thread safe thread manager #2739

Uh oh!

Conversation

irozzo-1A commented Dec 5, 2025

Uh oh!

poiana commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf diff from master - unit tests

Heap diff from master - unit tests

Heap diff from master - scap file

Benchmarks diff from master

Uh oh!

codecov bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ekoops left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

irozzo-1A commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 5, 2025 •

edited

Loading

codecov bot commented Dec 5, 2025 •

edited

Loading

ekoops left a comment •

edited

Loading