Skip to content

Conversation

@irozzo-1A
Copy link
Contributor

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind test

/kind feature

/kind sync

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

ref: falcosecurity/falco#3749

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@poiana
Copy link
Contributor

poiana commented Dec 5, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: irozzo-1A
Once this PR has been reviewed and has the lgtm label, please assign leogr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana requested review from hbrueckner and terror96 December 5, 2025 15:57
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Perf diff from master - unit tests

     4.25%    +13.19%  [.] std::__shared_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__weak_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
     3.40%    +11.27%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_get_use_count() const
     3.97%     +8.85%  [.] sinsp_threadinfo::get_main_thread()
     3.33%     +7.63%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_lock_nothrow()
     3.18%     +7.15%  [.] sinsp_threadinfo::update_main_fdtable()
     5.70%     +6.87%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
     2.99%     +6.67%  [.] sinsp_threadinfo::get_fd_table()
     1.99%     +5.87%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
    10.86%     -5.42%  [.] sinsp::next(sinsp_evt**)
     1.74%     +3.49%  [.] thread_group_info::get_first_thread() const

Heap diff from master - unit tests

peak heap memory consumption: 27.64K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: -24B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0522         -0.0523           248           235           247           234
BM_sinsp_split_median                                          -0.0557         -0.0556           248           234           247           234
BM_sinsp_split_stddev                                          +0.2094         +0.1640             1             2             1             2
BM_sinsp_split_cv                                              +0.2760         +0.2283             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0392         +0.0391            70            73            70            73
BM_sinsp_concatenate_paths_relative_path_median                +0.0455         +0.0455            70            73            69            73
BM_sinsp_concatenate_paths_relative_path_stddev                -0.7716         -0.7707             1             0             1             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.7803         -0.7793             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0565         -0.0568            44            41            44            41
BM_sinsp_concatenate_paths_empty_path_median                   -0.0672         -0.0675            44            41            44            41
BM_sinsp_concatenate_paths_empty_path_stddev                   +1.1334         +1.1354             1             2             1             2
BM_sinsp_concatenate_paths_empty_path_cv                       +1.2612         +1.2639             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0253         +0.0252            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_median                +0.0219         +0.0219            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_stddev                +2.3836         +2.4716             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_cv                    +2.3002         +2.3862             0             0             0             0

@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.57%. Comparing base (e65bb86) to head (0d6c9b0).
⚠️ Report is 33 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2739      +/-   ##
==========================================
- Coverage   77.02%   74.57%   -2.45%     
==========================================
  Files         296      292       -4     
  Lines       30818    30025     -793     
  Branches     4670     4657      -13     
==========================================
- Hits        23738    22392    -1346     
- Misses       7080     7633     +553     
Flag Coverage Δ
libsinsp 74.57% <ø> (-2.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ekoops ekoops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the proposal, and overall looks good. I just added some minor comments.
The critical point I would like to discuss is the global std::mutex protecting any modification to the topological structure in sinsp_thread_manager. This has a big impact, as it serializes any clone/clone3/fork/procexit or, in other terms, any thread addition/replacement. What is worse is that this applies also to unrelated topological thread operations.
In theory, if we adopt some form of partitioning, we could have each thread belonging, for its entire lifetime, to the same partition, and so being managed by the same worker thread. However, there are topological structures, both inside sinsp_thread_manager and sinsp_threadinfo that links thread belonging to different partitions. For example, the addition of one thread requires the insertion of a new element in sinsp_thread_manager::table_, sinsp_thread_manager::list_head_, and sinsp_thread_info::m_children on the parent side. I'm wondering if we could constraint this kind of accesses in such a way that always the same writer is designated for changes on the same threads/topological sectors... Following the above example, if we partition by thread id or, even better, by thread group id, we would solve the problem for sinsp_thread_manager::table_, bot not for the other ones...
Whatever is the mechanism, if we found a way of doing so, we could get rid of the global lock and have big benefits both in term of performance and code simplicity.

@irozzo-1A irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from 6d27c93 to 352a971 Compare December 9, 2025 17:15
@irozzo-1A
Copy link
Contributor Author

I read the proposal, and overall looks good. I just added some minor comments. The critical point I would like to discuss is the global std::mutex protecting any modification to the topological structure in sinsp_thread_manager. This has a big impact, as it serializes any clone/clone3/fork/procexit or, in other terms, any thread addition/replacement. What is worse is that this applies also to unrelated topological thread operations. In theory, if we adopt some form of partitioning, we could have each thread belonging, for its entire lifetime, to the same partition, and so being managed by the same worker thread. However, there are topological structures, both inside sinsp_thread_manager and sinsp_threadinfo that links thread belonging to different partitions. For example, the addition of one thread requires the insertion of a new element in sinsp_thread_manager::table_, sinsp_thread_manager::list_head_, and sinsp_thread_info::m_children on the parent side. I'm wondering if we could constraint this kind of accesses in such a way that always the same writer is designated for changes on the same threads/topological sectors... Following the above example, if we partition by thread id or, even better, by thread group id, we would solve the problem for sinsp_thread_manager::table_, bot not for the other ones... Whatever is the mechanism, if we found a way of doing so, we could get rid of the global lock and have big benefits both in term of performance and code simplicity.

Yes, as pointed out the synchronization cost of writes is the weakest part of this approach, and I would like it would be possible to find a partition strategy to avoid the global writer lock.
Unfortunately, having a linked structure like this makes it difficult to find partition schema to avoid a sinsp_thread_infos in guaranteed to be updated by a single writer, that would drop the need for the writer lock.
There are probably more sophisticated approaches we could try, but I suggest starting with something simple as the global lock as a starting point. If we see the global writer lock is a show-stopper (that is not that unlikely 😄), we can try out something else.

This proposal has to considered as a starting point for experimentation, but we may need several iterations.

@irozzo-1A irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from 352a971 to d407b67 Compare December 9, 2025 18:37
@irozzo-1A irozzo-1A marked this pull request as ready for review December 10, 2025 08:27
Signed-off-by: irozzo-1A <iacopo@sysdig.com>
@irozzo-1A irozzo-1A force-pushed the proposal/thread-safe-thread-manager branch from d407b67 to 0d6c9b0 Compare December 18, 2025 15:30
@ekoops ekoops added this to the 0.24.0 milestone Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants