-
Notifications
You must be signed in to change notification settings - Fork 205
perf(vello_common): use SIMD dispatch in flattening codegen #1336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a ~10% reduction in flattening time.
Flattening was already dispatched to have access to the SIMD witness,
but it did not yet unambiguously make use of target features for codegen
as the functions weren't forced to be inlined.
```
flatten/Ghostscript_Tiger
time: [208.76 µs 209.06 µs 209.42 µs]
change: [-12.177% -11.979% -11.768%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
flatten/paris-30k time: [13.157 ms 13.202 ms 13.253 ms]
change: [-10.728% -10.307% -9.8772%] (p = 0.00 < 0.05)
Performance has improved.
```
| let max = simd.vectorize( | ||
| #[inline(always)] | ||
| || { | ||
| flatten_cubic_simd( | ||
| simd, | ||
| c, | ||
| flatten_ctx, | ||
| tolerance as f32, | ||
| &mut flattened_cubics, | ||
| ) | ||
| }, | ||
| let max = flatten_cubic_simd( | ||
| simd, | ||
| c, | ||
| flatten_ctx, | ||
| tolerance as f32, | ||
| &mut flattened_cubics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This vectorize is no longer necessary, as the flatten_cubic_simd call gets inlined into flatten which itself gets vectorized.
LaurenzV
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change on ARM in my benchmarks.
| let iter = path.into_iter().map( | ||
| #[inline(always)] | ||
| |el| affine * el, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious whether this has any effect? I would expect this to be inlined basically always given the small closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious whether this has any effect?
This one more than likely has no effect as this indeed very likely gets inlined without the attribute, too, but this makes it as unambiguous as Rust allows. This follows the suggestion in https://docs.rs/fearless_simd/0.3.0/fearless_simd/#inlining.
Thanks for checking, the compiler made better inlining decisions on ARM then! |
Unfortunately, it's even stupider than that :) Essentially, all of the relevant |
This is a ~10% reduction in flattening time on x86, I haven't measured AArch64.
Flattening was already dispatched to have access to the SIMD witness, but it did not yet unambiguously make use of target features for codegen as the functions weren't forced to be inlined.