-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Until now, our choice of specialization in the SAI has been driven by performance of the interpreter alone https://github.com/python/cpython/blob/main/InternalDocs/interpreter.md#performance-analysis.
However, we now expect any further performance improvements to be provided by the JIT, not the interpreter.
This means that specializations other role, that of gathering type and branching information for the JIT, is at least as important as pure interpreter performance.
We should therefore seek to broaden specialization to gather more information, as long as it does not make interpreter performance worse, or at least no significantly so.
Using some old stats, by fraction of unspecialized bytecode executed, the top 10 were:
BINARY_OP 31.3%
FOR_ITER 19.4%
LOAD_ATTR 10.9%
STORE_SUBSCR 9.2%
BINARY_SLICE 7.3%
COMPARE_OP 7.0%
TO_BOOL 5.8%
CALL 2.5%
CONTAINS_OP 2.4%
SEND 1.7%
We should fully specialize most, if not all, of these.
In general, the above instructions have a matching __dunder__ method which determines the behavior of the operation. Recording the type of the operand(s) allows us to know what __dunder__ method is to be called.
We cannot specialize for all possible types, but we can ensure we have good inputs and type information for the JIT by adding the following two specializations for all families of instructions:
__dunder__implemented in Python. Most of the above instructions have a matching__dunder__method. These specializations should jump directly into the method.LOAD_ATTR_GETATTRIBUTE_OVERRIDDENalready does this forLOAD_ATTR. Other families should follow this template.__dunder__implemented in C. In practice, this is just the generic instruction with a bit more information recorded.
Three instructions need special casing:
- BINARY_OP. Because the behavior depends on two types, we will need a table driven approach: Specialize long tail of binary operations using a table. #100239
- BINARY_SLICE. This is supposed to avoid creating temporary slice objects for expressions like
a[b:c]but has yet to be implemented properly. There is no corresponding__dunder__method, so we would need to expose slicing methods to use. - SEND. There is no
__send__method. For iterators,__next__is called if the value isNone, otherwise.send()is called. Rather than try to replicate the specializations ofFOR_ITERwe should maybe look to combineSENDandFOR_ITERmuch like we did forCALLandCALL_METHOD
First step
Add two specializations for __dunder__ in Python and the fallback __dunder__ in C for:
- FOR_ITER
- LOAD_ATTR
- STORE_SUBSCR
- COMPARE_OP
- TO_BOOL
- CALL
- CONTAINS_OP
For a total of 12 new instructions as LOAD_ATTR already has the specialization for the Python __getattribute__ and CALL already has the generic fallback.
Second step
Implement #100239
Third step
Handle BINARY_SLICE and SEND