-
Notifications
You must be signed in to change notification settings - Fork 176
new(userspace/plugin) Add support for start and end field offsets #2322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Related changes can be found at: falcosecurity/plugin-sdk-go#103 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2322 +/- ##
==========================================
+ Coverage 77.17% 77.19% +0.01%
==========================================
Files 227 227
Lines 30192 30250 +58
Branches 4607 4625 +18
==========================================
+ Hits 23302 23351 +49
- Misses 6890 6899 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Perf diff from master - unit testsHeap diff from master - unit testsHeap diff from master - scap fileBenchmarks diff from master |
gnosek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the use case you're aiming for but I'm not sure this is the right approach. You cannot rely on knowing a plugin's "extract metadata" extract field, so you still need explicit per-plugin support. At that point, you don't need the new capability, since you know the plugin supports it.
To me it looks like you could simply:
- extend
ss_plugin_extract_fieldwith:bool extract_offsets; u32 start_offset; u32 end_offset; - set extract_offsets = true if you want the actual byte offsets (so we don't pay the cost unnecessarily)
- if extract_offsets == true, fill these fields in the plugin and reset extract_offsets to false (to mark that the offsets are valid; we could also have an enum here)
So then every extraction will also include the byte offsets without introducing new capabilities or new data types and you don't need to support plugins explicitly.
(cc @jasondellaluce for the stuff below)
Though I just realized the comment for ss_plugin_extract_field is wrong and adding new fields at the end will break the ABI, since we pass them as a pointer-to-array, so changing the size will make things explode when num_fields>1.
We never seem to actually submit multiple extract requests (sinsp_filter_check_plugin::extract_nocache just sets num_fields=1) so I guess we might as well deprecate the functionality (and possibly reintroduce it later once we actually have a use case for it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: I did not notice the above comment from @gnosek. We are proposing a similar thing in the end.
Hey @geraldcombs,
Thank you for this PR. Although I totally agree with fixing the issue of providing the location of each field, I think the approach used in this PR does not fully align with the design of the plugin API. I've also discussed this privately with @jasondellaluce, and we mostly agree.
In details:
- Adding these metadata looks to me like an extension of the current capability and not a distinct capability (which would also introduce unnecessary cross-capability dependency)
- Creating a kind of polymorphism for the extraction function is too cumbersome and has a non-negligible performance impact (the consumer will have to call it twice)
- Fields types are intended to map field types in libs, not for providing metadata. So adding
FTYPE_METADATA_SLICEfor this purpose seems misleading to us. - More straightforward solutions that align with the API's original design are available.
So, I want to propose a feasible alternative:
ss_plugin_extract_field allows us to add members in a backward-compatible way. So we can either:
- Add two u32 members to hold the positions
- Or a more complex datatype to allow generic metadata (I'd not go for this path unless we have a compelling use case since this would introduce another opaque format which both the plugin and the framework need to deal with)
This will not require introducing a new FTYPE. Moreover, plugins that do not support this feature will not populate those members, so there's no need to introduce a new capability either.
In the same struct, we can add another member that signals to the plugin when the consumer requires these metadata (e.g., ss_plugin_bool request_metadata) so that the plugin will produce that result only when requested.
Finally, this will require some additions to sinsp_filter_check_plugin (or perhaps in some other sinsp API) to expose these data to the final consumer.
@gnosek You are right. I did not consider that. If we deprecate batch extraction, we will be fine. But I'm not sure of the consequences. @jasondellaluce wdyt? |
29aeaa1 to
7b19369
Compare
|
@gnosek @leogr thanks for the review! I initially started with the idea that it would be useful to be able to extract multiple types of metadata, but as you point out just adding offsets is cleaner and more efficient. This and the related PRs shoudl be more in line with your suggestions.
Stratoshark uses it. It's much faster than extracting each field individually, at least for the cloudtrail plugin. |
I'm not sure it's something we should draw inspiration from, but the Win32 OPENFILENAME struct has an lStructSize member, presumably to solve the same sort of problem. https://learn.microsoft.com/en-us/windows/win32/api/commdlg/ns-commdlg-openfilenamew |
Out of pure ignorance, does performance matter (that much) for Stratoshark? I.e. do you do mass field extraction on entire captures (where I can see how perf would be critical), or only on the current one selected in the GUI (where I'd expect it to be lost in the noise)?
Yup, embedding the struct size as a member is a classic solution to this problem. This is going to be painful for SDKs, since you can no longer treat How about we try replacing with a union (and clean it up properly for v4): This way we can keep the ABI compatible: new plugins will know that if size_of_field==0, then it's whatever we have now and the plugin framework can enforce size_of_field==0 for legacy plugins and num_fields==1 after we change the struct layout. For v4, I'd consider using an array of pointers to ss_plugin_extract_field, rather than just an array of objects Also, as a hardening measure against future changes:
Also, it feels like we should start working on API v4 soon, there are enough EDIT: we'll have to handle the uint16_t ordering based on arch endianness :| |
|
Hey @geraldcombs and @gnosek Although I generally like embedding the size, I'm afraid that introducing it now would be a hybrid solution in the APIs. Thus, I'd propose postponing it to By rethinking it a bit (and apologizing for going back and forth), I came up with a simpler idea (possibly 😅 ). Appending members to
By doing so, we can retain full backward compatibility. wdyt? |
Big +1 from me, though you just did the same thing as we originally did with We can make this less painful by embedding the size from day 1, but it will still involve pointer arithmetic in the plugin. Doing this the next_batch way (ss_plugin_extract_field_metadata***) might be overkill for a pair of |
|
I like @leogr idea but, as pointed out by @gnosek , this will introduce the same problem we are having now with the exposure of array of structs. If the solution to this problem involves including the size of the struct, at this point there are no big advantages with respect to @gnosek 's solution in #2322 (comment) , so I would go with this one. |
|
@gnosek and @ekoops you're totally right and thank you for helping with brainstorming this. So, considering your point, I'd keep it as simple as possible by going very specific for the offsets use case, thus something like: If we need to add new metadata before v4, we will continue with this pattern since I don't expect we will need too many extensions. |
|
I can work with this :) (but please let's consider v4 some time this century) How will the plugin indicate that the offsets were actually generated? Setting ->request_offsets to false is one option, or the caller could just zero-init the array and treat 0..0 ranges as explicitly missing |
I believe 0..0 is ok to signal offsets weren't generated. |
We do both. Although we try to avoid mass field extraction (full dissection) as much as possible, we have to do so when initially loading a file so that we can properly render event columns and correlate events. Full redissection also happens any time the user applies a display filter, changes profiles, and opens various analysis windows. We dissect single events whenever the user selects an event so that we can build the detail tree, and in the background in order to colorize the event list. |
Is |
Stratoshark and Wireshark use the term "generated" to refer to a field that is present but doesn't correspond to event or packet bytes. Generated fields are shown in the UI with square brackets, e.g. the next TCP sequence number which is computed from the current sequence number + the segment length: For libs field extraction, I'm currently interpreting 0..0 to mean "generated" in the Stratoshark/Wireshark sense. |
7b19369 to
25de39f
Compare
Stratoshark, Wireshark, and tcpdump all use start+length as a convention for offsets, so I switched to that. |
|
/milestone 0.21.0 |
Add ss_plugin_extract_field_offsets as a companion struct to ss_plugin_extract_field. Signed-off-by: Gerald Combs <gerald@wireshark.org>
Remove field_offsets from ss_plugin_field_extract_input. We can just check to see if field_offsets is set. Update some comments. Signed-off-by: Gerald Combs <gerald@wireshark.org>
Add extraction offsets to the filter cache. Add an offset parameter to the various extract_nocache functions. Implement offset extraction in sinsp_filter_check_plugin::extract_nocache, and ignore offsets elsewhere. Add sinsp_filter_check::extract_with_offsets. Add an offsets test to plugins.ut.cpp. Signed-off-by: Gerald Combs <gerald@wireshark.org>
Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Signed-off-by: Gerald Combs <gerald@wireshark.org>
Wireshark and tcpdump both handle offsets using start+length pairs, so use that convention here. Signed-off-by: Gerald Combs <gerald@wireshark.org>
Signed-off-by: Gerald Combs <gerald@wireshark.org>
bf8f202 to
256bada
Compare
Add support for extracting offsets for each value instead of just the first one. Signed-off-by: Gerald Combs <gerald@wireshark.org>
256bada to
ddaad8b
Compare
sinsp_filter_extract_cache::offset() was unused, so remove it. Signed-off-by: Gerald Combs <gerald@wireshark.org>
|
LGTM label has been added. DetailsGit tree hash: 732b14f0b7b277c03f1fd6f4128dc484038014e1 |
FedeDP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: FedeDP, geraldcombs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add FTYPE_METADATA_SLICE, CAP_EXTRACT_METADATA, and associated routines so that plugins can provide the location of each field.
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area API-version
Does this PR require a change in the driver versions?
What this PR does / why we need it:
This makes it possible to extend extractor plugins so that they can provide the location of each field in each event or log message.
Which issue(s) this PR fixes:
This is required to fix https://gitlab.com/wireshark/wireshark/-/issues/20449
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: