Skip to content

Broader specialization in the Specializing Adaptive Interpreter for better JIT performance #143732

@markshannon

Description

@markshannon

Until now, our choice of specialization in the SAI has been driven by performance of the interpreter alone https://github.com/python/cpython/blob/main/InternalDocs/interpreter.md#performance-analysis.

However, we now expect any further performance improvements to be provided by the JIT, not the interpreter.
This means that specializations other role, that of gathering type and branching information for the JIT, is at least as important as pure interpreter performance.

We should therefore seek to broaden specialization to gather more information, as long as it does not make interpreter performance worse, or at least no significantly so.

Using some old stats, by fraction of unspecialized bytecode executed, the top 10 were:
BINARY_OP 31.3%
FOR_ITER 19.4%
LOAD_ATTR 10.9%
STORE_SUBSCR 9.2%
BINARY_SLICE 7.3%
COMPARE_OP 7.0%
TO_BOOL 5.8%
CALL 2.5%
CONTAINS_OP 2.4%
SEND 1.7%

We should fully specialize most, if not all, of these.

In general, the above instructions have a matching __dunder__ method which determines the behavior of the operation. Recording the type of the operand(s) allows us to know what __dunder__ method is to be called.

We cannot specialize for all possible types, but we can ensure we have good inputs and type information for the JIT by adding the following two specializations for all families of instructions:

  • __dunder__ implemented in Python. Most of the above instructions have a matching __dunder__ method. These specializations should jump directly into the method. LOAD_ATTR_GETATTRIBUTE_OVERRIDDEN already does this for LOAD_ATTR. Other families should follow this template.
  • __dunder__ implemented in C. In practice, this is just the generic instruction with a bit more information recorded.

Three instructions need special casing:

  • BINARY_OP. Because the behavior depends on two types, we will need a table driven approach: Specialize long tail of binary operations using a table. #100239
  • BINARY_SLICE. This is supposed to avoid creating temporary slice objects for expressions like a[b:c] but has yet to be implemented properly. There is no corresponding __dunder__ method, so we would need to expose slicing methods to use.
  • SEND. There is no __send__ method. For iterators, __next__ is called if the value is None, otherwise .send() is called. Rather than try to replicate the specializations of FOR_ITER we should maybe look to combine SEND and FOR_ITER much like we did for CALL and CALL_METHOD

First step

Add two specializations for __dunder__ in Python and the fallback __dunder__ in C for:

  • FOR_ITER
  • LOAD_ATTR
  • STORE_SUBSCR
  • COMPARE_OP
  • TO_BOOL
  • CALL
  • CONTAINS_OP

For a total of 12 new instructions as LOAD_ATTR already has the specialization for the Python __getattribute__ and CALL already has the generic fallback.

Second step

Implement #100239

Third step

Handle BINARY_SLICE and SEND

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.15new features, bugs and security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions