Skip to content

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Apr 6, 2025

When exerting additional pressure on regalloc with bytecodealliance/wasmtime#10502, which can lead to call instructions that have significantly more (any-constrained) defs, we hit panics in RA2 where (i) a bundle merged several LiveRanges, (ii) one of these LiveRanges had a fixed-reg constraint on an early use, (iii) this fixed-reg constraint conflicted with a clobber (which always happens at the late-point), (iv) the bundle merged in another LiveRange of some arbitrary def at the late point.

This would make a bundle (which is the atomic unit of allocation) that covers the whole inst, including the late point; and is required to be in the fixed reg; this is unallocatable because the clobber is also at the late point in that reg.

Our allocate-or-split-and-retry logic does not split if a bundle is "minimal". This is meant to give a base case to the retries: when bundles break down into their minimal pieces, any solvable set of constraints should result in allocations.

Unfortunately the "is minimal" predicate definition did not account for multiple LiveRanges, but rather only tested whether the total program-point range of the bundle was over one instruction. If there are multiple LiveRanges, we can still split them off, and the resulting split bundles may cover only half the instruction, avoiding the clobbers.

When exerting additional pressure on regalloc with
bytecodealliance/wasmtime#10502, which can lead to call instructions
that have significantly more (`any`-constrained) defs, we hit panics
in RA2 where (i) a bundle merged several LiveRanges, (ii) one of these
LiveRanges had a fixed-reg constraint on an early use, (iii) this
fixed-reg constraint conflicted with a clobber (which always happens
at the late-point), (iv) the bundle merged in another LiveRange of
some arbitrary def at the late point.

This would make a bundle (which is the atomic unit of allocation) that
covers the whole inst, including the late point; and is required to be
in the fixed reg; this is unallocatable because the clobber is also at
the late point in that reg.

Our allocate-or-split-and-retry logic does not split if a bundle is
"minimal". This is meant to give a base case to the retries: when
bundles break down into their minimal pieces, any solvable set of
constraints should result in allocations.

Unfortunately the "is minimal" predicate definition did not account
for multiple LiveRanges, but rather only tested whether the total
program-point range of the bundle was over one instruction. If there
are multiple LiveRanges, we can still split them off, and the
resulting split bundles may cover only half the instruction, avoiding
the clobbers.
@cfallin cfallin requested review from alexcrichton and fitzgen April 6, 2025 03:06
@cfallin
Copy link
Member Author

cfallin commented Apr 6, 2025

This fixes one of the fuzzbugs that came in (https://issues.oss-fuzz.com/issues/408523840) -- likely the others too, but I haven't checked yet. I'm fuzzing locally with ion_checker and will leave it running on 32 threads over the weekend to validate this.

@cfallin
Copy link
Member Author

cfallin commented Apr 7, 2025

I fuzzed this (with the ion_checker target, -s none for more speed, 16c/32t on a Ryzen 9950x system) for 25h36m with no checker failures found; ran up to:

#540595696: cov: 27746 ft: 26948 corp: 3367 exec/s 117 oom/timeout/crash: 0/50/0 time: 92203s job: 9713 dft_time: 0

@alexcrichton
Copy link
Member

Would it make sense to beef up the fuzzer in this repo to catch this as well? (assuming it isn't possible for the preexisting fuzzer to eventually catch this)

@cfallin
Copy link
Member Author

cfallin commented Apr 7, 2025

Yes definitely, I need to look into that -- it won't be super-simple though, as there isn't really a separate oracle for "is this set of constraints allocatable", so I need to do some thinking about how to generate more interesting sets of constraints by construction that work. In other words I don't want to block merging this fix on that as it will be a bit of a deeper task.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on trying to cover these cases in the fuzz target when possible

@cfallin cfallin merged commit 01904ff into bytecodealliance:main Apr 7, 2025
6 checks passed
@cfallin cfallin deleted the minimal-bundles-are-actually-really-minimal branch April 7, 2025 17:35
@cfallin cfallin mentioned this pull request Apr 7, 2025
cfallin added a commit that referenced this pull request Apr 7, 2025
Includes fix from #214.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Apr 7, 2025
This pulls in a fix for a fuzzbug found after bytecodealliance#10502 started generating
more challenging constraints for regalloc. The fix in
bytecodealliance/regalloc2#214 updates bundle-splitting logic to
properly handle bundles with multiple live-ranges all covering one
instruction.
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this pull request Apr 7, 2025
* Cranelift: update to regalloc2 0.11.3.

This pulls in a fix for a fuzzbug found after #10502 started generating
more challenging constraints for regalloc. The fix in
bytecodealliance/regalloc2#214 updates bundle-splitting logic to
properly handle bundles with multiple live-ranges all covering one
instruction.

* Update test expectations after regalloc perturbation.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 7, 2025
This is a followup from bytecodealliance#214. We have use-cases in Cranelift now where
we generate constraints on call instructions that have:

- A number of fixed-reg uses at "early";
- A number of fixed-reg defs at "late";
- A number of register clobbers, potentially overlapping with uses;
- A number of "any"-constrained defs at "late", to add register
  pressure.

The existing fuzzing function generator would create at most 3 operands,
and would add *either* fixed-reg constraints *or* clobbers, not both.

This PR adds a "callsite-ish constraints" branch as a subcase of the
fixed-reg choice that also adds the "any" defs to add register pressure,
and adds new clobbers that don't interfere with existing constraints. It
also loosens the logic around constructing fixed-reg constraints: if a
conflict is found with some arbitrarily-chosen constraint, it keeps
going to the next operand and tries to make it fixed, rather than giving
up on further operands.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 7, 2025
This is a followup from bytecodealliance#214. We have use-cases in Cranelift now where
we generate constraints on call instructions that have:

- A number of fixed-reg uses at "early";
- A number of fixed-reg defs at "late";
- A number of register clobbers, potentially overlapping with uses;
- A number of "any"-constrained defs at "late", to add register
  pressure.

The existing fuzzing function generator would create at most 3 operands,
and would add *either* fixed-reg constraints *or* clobbers, not both.

This PR adds a "callsite-ish constraints" branch as a subcase of the
fixed-reg choice that also adds the "any" defs to add register pressure,
and adds new clobbers that don't interfere with existing constraints. It
also loosens the logic around constructing fixed-reg constraints: if a
conflict is found with some arbitrarily-chosen constraint, it keeps
going to the next operand and tries to make it fixed, rather than giving
up on further operands.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 7, 2025
This is a followup from bytecodealliance#214. We have use-cases in Cranelift now where
we generate constraints on call instructions that have:

- A number of fixed-reg uses at "early";
- A number of fixed-reg defs at "late";
- A number of register clobbers, potentially overlapping with uses;
- A number of "any"-constrained defs at "late", to add register
  pressure.

The existing fuzzing function generator would create at most 3 operands,
and would add *either* fixed-reg constraints *or* clobbers, not both.

This PR adds a "callsite-ish constraints" branch as a subcase of the
fixed-reg choice that also adds the "any" defs to add register pressure,
and adds new clobbers that don't interfere with existing constraints. It
also loosens the logic around constructing fixed-reg constraints: if a
conflict is found with some arbitrarily-chosen constraint, it keeps
going to the next operand and tries to make it fixed, rather than giving
up on further operands.
cfallin added a commit that referenced this pull request Apr 7, 2025
This is a followup from #214. We have use-cases in Cranelift now where
we generate constraints on call instructions that have:

- A number of fixed-reg uses at "early";
- A number of fixed-reg defs at "late";
- A number of register clobbers, potentially overlapping with uses;
- A number of "any"-constrained defs at "late", to add register
pressure.

The existing fuzzing function generator would create at most 3 operands,
and would add *either* fixed-reg constraints *or* clobbers, not both.

This PR adds a "callsite-ish constraints" branch as a subcase of the
fixed-reg choice that also adds the "any" defs to add register pressure,
and adds new clobbers that don't interfere with existing constraints. It
also loosens the logic around constructing fixed-reg constraints: if a
conflict is found with some arbitrarily-chosen constraint, it keeps
going to the next operand and tries to make it fixed, rather than giving
up on further operands.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 11, 2025
This PR rewords minimal-bundle logic to be consistent between property
computation and splitting, and overall simpler.

To recap: a "minimal bundle" is an allocation bundle that is as small as
possible. Because RA2 does backtracking, i.e., can kick a bundle out of
an allocation to make way for another, and because it can split, i.e.
create more work for itself, we need to ensure the algorithm "bottoms
out" somewhere to ensure termination. The way we do this is by defining
a "minimal bundle": this is a bundle that cannot be split further. A
minimal bundle is never evicted, and any allocatable input (set of
uses/defs with constraints), when split into minimal bundles, should
result in no conflicts. We thus want to define minimal bundles as
*minimally* as possible so that we have the maximal solving capacity.

For a long time, we defined minimal bundles as those spanning a single
instruction (before- and after- progpoints) -- because we cannot insert
moves in the middle of an instruction. In bytecodealliance#214, we updated
minimal-bundle splitting to avoid putting two different unrelated uses
in a single-instruction bundle, beacuse doing so and calling it
"minimal" (unsplittable) can artificially extend the liverange of a use
with a fixed-reg constraint at the before-point into the after-point,
causing an unsolveable conflict. This was triggered by new and tighter
constraints on callsites in Cranelift after
bytecodealliance/wasmtime#10502 (merging retval defs into calls) landed.

Unfortunately this also resulted in an infinite allocation loop, because
the definition of "minimal bundle" did not agree between the
split-into-minimal-bundles fallback/last-ditch code, and the bundle
property computation. The splitter was splitting as far as it was
willing to go, but our predicate didn't consider those bundles minimal,
so we continued to re-attempt splitting indefinitely.

While investigating this, I found that the minimal-bundle concept had
accumulated significant cruft ("the detritus of dead fuzzbugs") and this
tech-debt was making things more confusing than not -- so I started by
clearly defining what a minimal bundle *is*. Precisely:

- A single use, within a single LiveRange;
- With that LiveRange having a program-point span consistent with the
  use:
  - Early def: whole instruction (must live past Late point so it can
    reach its uses; moves not possible within inst);
  - Late def: Late point only;
  - Early use: Early point only;
  - Late use: whole instruction (must be live starting at Early so the
    value can reach this use; moves not possible within inst).

This is made easier and simpler than what we have before largely
because the minimal-bundle splitter aggressively puts spans of LiveRange
without uses into the spill bundle, and because we support overlapping
LiveRanges for a vreg now (thanks Trevor!), so we can rely on having
*some* connectivity between the def and its uses even if we aggressively
trim LiveRanges in the minimal bundles down to just their defs/uses.

Fixes bytecodealliance#218, bytecodealliance#219.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 11, 2025
This PR rewords minimal-bundle logic to be consistent between property
computation and splitting, and overall simpler.

To recap: a "minimal bundle" is an allocation bundle that is as small as
possible. Because RA2 does backtracking, i.e., can kick a bundle out of
an allocation to make way for another, and because it can split, i.e.
create more work for itself, we need to ensure the algorithm "bottoms
out" somewhere to ensure termination. The way we do this is by defining
a "minimal bundle": this is a bundle that cannot be split further. A
minimal bundle is never evicted, and any allocatable input (set of
uses/defs with constraints), when split into minimal bundles, should
result in no conflicts. We thus want to define minimal bundles as
*minimally* as possible so that we have the maximal solving capacity.

For a long time, we defined minimal bundles as those spanning a single
instruction (before- and after- progpoints) -- because we cannot insert
moves in the middle of an instruction. In bytecodealliance#214, we updated
minimal-bundle splitting to avoid putting two different unrelated uses
in a single-instruction bundle, beacuse doing so and calling it
"minimal" (unsplittable) can artificially extend the liverange of a use
with a fixed-reg constraint at the before-point into the after-point,
causing an unsolveable conflict. This was triggered by new and tighter
constraints on callsites in Cranelift after
bytecodealliance/wasmtime#10502 (merging retval defs into calls) landed.

Unfortunately this also resulted in an infinite allocation loop, because
the definition of "minimal bundle" did not agree between the
split-into-minimal-bundles fallback/last-ditch code, and the bundle
property computation. The splitter was splitting as far as it was
willing to go, but our predicate didn't consider those bundles minimal,
so we continued to re-attempt splitting indefinitely.

While investigating this, I found that the minimal-bundle concept had
accumulated significant cruft ("the detritus of dead fuzzbugs") and this
tech-debt was making things more confusing than not -- so I started by
clearly defining what a minimal bundle *is*. Precisely:

- A single use, within a single LiveRange;
- With that LiveRange having a program-point span consistent with the
  use:
  - Early def: whole instruction (must live past Late point so it can
    reach its uses; moves not possible within inst);
  - Late def: Late point only;
  - Early use: Early point only;
  - Late use: whole instruction (must be live starting at Early so the
    value can reach this use; moves not possible within inst).

This is made easier and simpler than what we have before largely
because the minimal-bundle splitter aggressively puts spans of LiveRange
without uses into the spill bundle, and because we support overlapping
LiveRanges for a vreg now (thanks Trevor!), so we can rely on having
*some* connectivity between the def and its uses even if we aggressively
trim LiveRanges in the minimal bundles down to just their defs/uses.

The split-at-program-point splitter (i.e., not the fallback
split-into-minimal-bundles splitter) also got a small fix related to
this: it has a mode that was intended to "split off one use" if we enter
with a split-point at the start of the bundle, but this was really
splitting off all uses at the program point (if there are multiple of
the same vreg at the same program point). In the case that we still need
to split these apart, this just falls back to the minimal-bundle
splitter now.

Fixes bytecodealliance#218, bytecodealliance#219.
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Apr 11, 2025
This PR reworks minimal-bundle logic to be consistent between property
computation and splitting, and overall simpler.

To recap: a "minimal bundle" is an allocation bundle that is as small as
possible. Because RA2 does backtracking, i.e., can kick a bundle out of
an allocation to make way for another, and because it can split, i.e.
create more work for itself, we need to ensure the algorithm "bottoms
out" somewhere to ensure termination. The way we do this is by defining
a "minimal bundle": this is a bundle that cannot be split further. A
minimal bundle is never evicted, and any allocatable input (set of
uses/defs with constraints), when split into minimal bundles, should
result in no conflicts. We thus want to define minimal bundles as
*minimally* as possible so that we have the maximal solving capacity.

For a long time, we defined minimal bundles as those spanning a single
instruction (before- and after- progpoints) -- because we cannot insert
moves in the middle of an instruction. In bytecodealliance#214, we updated
minimal-bundle splitting to avoid putting two different unrelated uses
in a single-instruction bundle, beacuse doing so and calling it
"minimal" (unsplittable) can artificially extend the liverange of a use
with a fixed-reg constraint at the before-point into the after-point,
causing an unsolveable conflict. This was triggered by new and tighter
constraints on callsites in Cranelift after
bytecodealliance/wasmtime#10502 (merging retval defs into calls) landed.

Unfortunately this also resulted in an infinite allocation loop, because
the definition of "minimal bundle" did not agree between the
split-into-minimal-bundles fallback/last-ditch code, and the bundle
property computation. The splitter was splitting as far as it was
willing to go, but our predicate didn't consider those bundles minimal,
so we continued to re-attempt splitting indefinitely.

While investigating this, I found that the minimal-bundle concept had
accumulated significant cruft ("the detritus of dead fuzzbugs") and this
tech-debt was making things more confusing than not -- so I started by
clearly defining what a minimal bundle *is*. Precisely:

- A single use, within a single LiveRange;
- With that LiveRange having a program-point span consistent with the
  use:
  - Early def: whole instruction (must live past Late point so it can
    reach its uses; moves not possible within inst);
  - Late def: Late point only;
  - Early use: Early point only;
  - Late use: whole instruction (must be live starting at Early so the
    value can reach this use; moves not possible within inst).

This is made easier and simpler than what we have before largely
because the minimal-bundle splitter aggressively puts spans of LiveRange
without uses into the spill bundle, and because we support overlapping
LiveRanges for a vreg now (thanks Trevor!), so we can rely on having
*some* connectivity between the def and its uses even if we aggressively
trim LiveRanges in the minimal bundles down to just their defs/uses.

The split-at-program-point splitter (i.e., not the fallback
split-into-minimal-bundles splitter) also got a small fix related to
this: it has a mode that was intended to "split off one use" if we enter
with a split-point at the start of the bundle, but this was really
splitting off all uses at the program point (if there are multiple of
the same vreg at the same program point). In the case that we still need
to split these apart, this just falls back to the minimal-bundle
splitter now.

Fixes bytecodealliance#218, bytecodealliance#219.
cfallin added a commit that referenced this pull request Apr 11, 2025
This PR reworks minimal-bundle logic to be consistent between property
computation and splitting, and overall simpler.

To recap: a "minimal bundle" is an allocation bundle that is as small as
possible. Because RA2 does backtracking, i.e., can kick a bundle out of
an allocation to make way for another, and because it can split, i.e.
create more work for itself, we need to ensure the algorithm "bottoms
out" somewhere to ensure termination. The way we do this is by defining
a "minimal bundle": this is a bundle that cannot be split further. A
minimal bundle is never evicted, and any allocatable input (set of
uses/defs with constraints), when split into minimal bundles, should
result in no conflicts. We thus want to define minimal bundles as
*minimally* as possible so that we have the maximal solving capacity.

For a long time, we defined minimal bundles as those spanning a single
instruction (before- and after- progpoints) -- because we cannot insert
moves in the middle of an instruction. In #214, we updated
minimal-bundle splitting to avoid putting two different unrelated uses
in a single-instruction bundle, beacuse doing so and calling it
"minimal" (unsplittable) can artificially extend the liverange of a use
with a fixed-reg constraint at the before-point into the after-point,
causing an unsolveable conflict. This was triggered by new and tighter
constraints on callsites in Cranelift after
bytecodealliance/wasmtime#10502 (merging retval defs into calls) landed.

Unfortunately this also resulted in an infinite allocation loop, because
the definition of "minimal bundle" did not agree between the
split-into-minimal-bundles fallback/last-ditch code, and the bundle
property computation. The splitter was splitting as far as it was
willing to go, but our predicate didn't consider those bundles minimal,
so we continued to re-attempt splitting indefinitely.

While investigating this, I found that the minimal-bundle concept had
accumulated significant cruft ("the detritus of dead fuzzbugs") and this
tech-debt was making things more confusing than not -- so I started by
clearly defining what a minimal bundle *is*. Precisely:

- A single use, within a single LiveRange;
- With that LiveRange having a program-point span consistent with the
use:
- Early def: whole instruction (must live past Late point so it can
reach its uses; moves not possible within inst);
  - Late def: Late point only;
  - Early use: Early point only;
- Late use: whole instruction (must be live starting at Early so the
value can reach this use; moves not possible within inst).

This is made easier and simpler than what we have before largely because
the minimal-bundle splitter aggressively puts spans of LiveRange without
uses into the spill bundle, and because we support overlapping
LiveRanges for a vreg now (thanks Trevor!), so we can rely on having
*some* connectivity between the def and its uses even if we aggressively
trim LiveRanges in the minimal bundles down to just their defs/uses.

The split-at-program-point splitter (i.e., not the fallback
split-into-minimal-bundles splitter) also got a small fix related to
this: it has a mode that was intended to "split off one use" if we enter
with a split-point at the start of the bundle, but this was really
splitting off all uses at the program point (if there are multiple of
the same vreg at the same program point). In the case that we still need
to split these apart, this just falls back to the minimal-bundle
splitter now.

Fixes #218, #219.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants