Skip to content

Conversation

@fremmi
Copy link
Contributor

@fremmi fremmi commented Nov 11, 2025

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind test

/kind feature

/kind sync

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Add definitive thread detection in parse_clone_exit_child() using the Linux kernel's absolute thread definition: pid != tid. When a thread is detected but missing the CLONE_FILES flag, force the flag to ensure proper FD table sharing between threads.

Which issue(s) this PR fixes:

Fixes #2761

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@poiana
Copy link
Contributor

poiana commented Nov 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fremmi
Once this PR has been reviewed and has the lgtm label, please assign mstemm for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana requested a review from terror96 November 11, 2025 16:16
@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.90%. Comparing base (a8fb3d2) to head (7c80fc9).
⚠️ Report is 44 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2715   +/-   ##
=======================================
  Coverage   76.90%   76.90%           
=======================================
  Files         296      296           
  Lines       30875    30879    +4     
  Branches     4693     4694    +1     
=======================================
+ Hits        23745    23749    +4     
  Misses       7130     7130           
Flag Coverage Δ
libsinsp 76.90% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…river

Add definitive thread detection in parse_clone_exit_child() using the
Linux kernel's absolute thread definition: pid != tid. When a thread is
detected but missing the CLONE_FILES flag, force the flag to ensure
proper FD table sharing between threads.

Signed-off-by: Francesco Emmi <francesco.as@gmail.com>
@terror96
Copy link
Contributor

Hi @fremmi, is this a fix to some issue or why is this change actually needed?

@fremmi
Copy link
Contributor Author

fremmi commented Nov 12, 2025

Yes, @terror96 this is a fix for an issue I encountered with file descriptors.

The problem happens when PPM_CL_CLONE_THREAD is not set on thread creation events arriving from the driver. When that flag is missing, PPM_CL_CLONE_FILES doesn't get set either at

child_tinfo->m_flags |= PPM_CL_CLONE_FILES;

This causes threads to use the wrong FD table at

inline sinsp_fdtable* get_fd_table() {
which breaks FD lookups when threads try to access file descriptors created by other threads in the same process.

The fix uses pid != tid to detect threads in userspace (which is always correct in Linux) and forces PPM_CL_CLONE_FILES regardless of what the driver provided. This ensures threads share the correct FD table.

I don't exclude there may be other consequences I'm missing. I'm also investigating why events sometimes arrive from the driver without the CLONE_THREAD flag in the first place.

Co-authored-by: Tero Kauppinen <tero.kauppinen@est.tech>
Signed-off-by: Francesco Emmi <francesco.as@gmail.com>
@fremmi fremmi requested a review from terror96 November 12, 2025 11:22
@terror96
Copy link
Contributor

terror96 commented Nov 12, 2025

Yes, @terror96 this is a fix for an issue I encountered with file descriptors.

The problem happens when PPM_CL_CLONE_THREAD is not set on thread creation events arriving from the driver. When that flag is missing, PPM_CL_CLONE_FILES doesn't get set either at

child_tinfo->m_flags |= PPM_CL_CLONE_FILES;

This causes threads to use the wrong FD table at

inline sinsp_fdtable* get_fd_table() {

which breaks FD lookups when threads try to access file descriptors created by other threads in the same process.
The fix uses pid != tid to detect threads in userspace (which is always correct in Linux) and forces PPM_CL_CLONE_FILES regardless of what the driver provided. This ensures threads share the correct FD table.

I don't exclude there may be other consequences I'm missing. I'm also investigating why events sometimes arrive from the driver without the CLONE_THREAD flag in the first place.

Great, thanks for the clarification. Could this also be related to the problem that was posted on Slack regarding a flood of:
Tue Nov 04 16:15:37 2025: [libs]: Unable to determine path for file descriptor. messages? I guess no issue was created on that, though.

I don't know this part of the code, but I'll re-trigger the CI tests.

@fremmi
Copy link
Contributor Author

fremmi commented Nov 12, 2025

. Could this also be related to the problem that was posted on Slack regarding a flood of:
Tue Nov 04 16:15:37 2025: [libs]: Unable to determine path for file descriptor. messages? I guess no issue was created on that, though.

I think there are good chances the two things could be related

@terror96
Copy link
Contributor

terror96 commented Nov 12, 2025

ping @irozzo-1A you were at least reacting on the message in Slack.

@fremmi
Copy link
Contributor Author

fremmi commented Dec 15, 2025

Converting temporarily into draft. I'd like to check if there are better solutions addressed in the kernel space

@fremmi fremmi marked this pull request as draft December 15, 2025 15:40
@irozzo-1A
Copy link
Contributor

irozzo-1A commented Dec 15, 2025

/milestone TBD

@poiana
Copy link
Contributor

poiana commented Dec 15, 2025

@irozzo-1A: The provided milestone is not valid for this repository. Milestones in this repository: [0.23.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

Details

In response to this:

/milestone 0.24.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@irozzo-1A irozzo-1A added this to the TBD milestone Dec 15, 2025
@irozzo-1A
Copy link
Contributor

Converting temporarily into draft. I'd like to check if there are better solutions addressed in the kernel space

@fremmi If I recall correctly from our offline discussion, the problem here is that the flags we get here from the clone exit child are not reliable. At the moment, we are not checking if bpf_probe_read_user succeeds or fails, but experiments show that the flag values could be incorrect even if the return code is 0. This seems related to race conditions in the kernel, the parent process (which owns that memory) may have already modified or freed the args memory location while the child is waking up. Feel free to link any reference.

I agree it would be better to address the issue on the kernel side, even if it represents more work. We could either:

  1. Use a heuristic similar to the one used for the tp_btf/sched_process_fork tracepoint, as we don't have access to the flags there.
  2. Just use the tp_btf/sched_process_fork tracepoint for all architectures, and skip clone child exit events.

AFAIU, the tp_btf/sched_process_fork tracepoint was introduced to overcome limitations with sys_exit_tracepoint on arm64 and s390x architectures regarding clone/clone3 child exit.

Feel free to add color if I missed some important detail.

cc @leogr @ekoops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

The flags extracted on clone/clone3 child exit are sometimes wrong

4 participants