Improve Pytorch Copy Lock #820

rich7420 · 2026-01-13T06:41:07Z

Purpose of PR

Now we make multiple copy in pytorch import like: torch.Tensor -> tolist() -> Vec.
This PR replaced the tolist() approach with zero-copy conversion using PyO3 NumPy interface:

Convert PyTorch tensor to NumPy view via tensor.detach().numpy() (zero-copy when C-contiguous)
Extract &[f64] slice directly using PyReadonlyArrayDyn::as_slice()
Eliminates intermediate Python list and Rust Vec allocations

Otherwise, this pr provide a benchmark to test latency of pytorch.
We could remove the benchmark if you think it's unnecessary.

Related Issues or PRs

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

rich7420 · 2026-01-13T06:41:43Z

before

uv run python benchmark_latency_pytorch.py --qubits 16 --batches 100 --batch-size 32 2>&1
Uninstalled 1 package in 0.40ms
Installed 1 package in 2ms
PyTorch Tensor Encoding Benchmark: 16 Qubits, 3200 Samples
  Batch size   : 32
  Vector length: 65536
  Batches      : 100
  Prefetch     : 16

======================================================================
PYTORCH TENSOR LATENCY BENCHMARK: 16 Qubits, 3200 Samples
======================================================================

[Mahout-PyTorch] PyTorch Tensor Input (Zero-Copy Optimization)...
  Total Time: 2.8140 s (0.879 ms/vector)

[Mahout-NumPy] NumPy Array Input (Baseline)...
  Total Time: 2.3389 s (0.731 ms/vector)

======================================================================
LATENCY COMPARISON (Lower is Better)
Samples: 3200, Qubits: 16
======================================================================
PyTorch Tensor          0.879 ms/vector
NumPy Array             0.731 ms/vector
----------------------------------------------------------------------
Speedup: 0.83x
Improvement: -20.3%

rich7420 · 2026-01-13T06:42:04Z

after

uv run python benchmark_latency_pytorch.py --qubits 16 --batches 100 --batch-size 32 2>&1
Uninstalled 1 package in 0.40ms
Installed 1 package in 4ms
PyTorch Tensor Encoding Benchmark: 16 Qubits, 3200 Samples
  Batch size   : 32
  Vector length: 65536
  Batches      : 100
  Prefetch     : 16

======================================================================
PYTORCH TENSOR LATENCY BENCHMARK: 16 Qubits, 3200 Samples
======================================================================

[Mahout-PyTorch] PyTorch Tensor Input (Zero-Copy Optimization)...
  Total Time: 2.3464 s (0.733 ms/vector)

[Mahout-NumPy] NumPy Array Input (Baseline)...
  Total Time: 2.3175 s (0.724 ms/vector)

======================================================================
LATENCY COMPARISON (Lower is Better)
Samples: 3200, Qubits: 16
======================================================================
PyTorch Tensor          0.733 ms/vector
NumPy Array             0.724 ms/vector
----------------------------------------------------------------------
Speedup: 0.99x
Improvement: -1.2%

guan404ming

LGTM

guan404ming · 2026-01-13T12:45:27Z

PyTorch Tensor Encoding Benchmark: 16 Qubits, 3200 Samples            
Batch size   : 32
Vector length: 65536
Batches      : 100
Prefetch     : 16

======================================================================
PYTORCH TENSOR LATENCY BENCHMARK: 16 Qubits, 3200 Samples
======================================================================

[Mahout-PyTorch] PyTorch Tensor Input (Zero-Copy Optimization)...
Total Time: 1.4448 s (0.451 ms/vector)

[Mahout-NumPy] NumPy Array Input (Baseline)...
Total Time: 1.4269 s (0.446 ms/vector)

======================================================================
LATENCY COMPARISON (Lower is Better)
Samples: 3200, Qubits: 16
======================================================================
PyTorch Tensor          0.451 ms/vector
NumPy Array             0.446 ms/vector
----------------------------------------------------------------------
Speedup: 0.99x
Improvement: -1.3%

rich7420 added 2 commits January 13, 2026 14:15

to do zero-copy

00eeabc

benchmark

4cfbf1d

guan404ming approved these changes Jan 13, 2026

View reviewed changes

guan404ming merged commit 011e851 into apache:main Jan 13, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Pytorch Copy Lock #820

Improve Pytorch Copy Lock #820

rich7420 commented Jan 13, 2026

Uh oh!

rich7420 commented Jan 13, 2026 •

edited

Loading

Uh oh!

rich7420 commented Jan 13, 2026

Uh oh!

guan404ming left a comment

Uh oh!

guan404ming commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve Pytorch Copy Lock #820

Improve Pytorch Copy Lock #820

Conversation

rich7420 commented Jan 13, 2026

Purpose of PR

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

rich7420 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rich7420 commented Jan 13, 2026

Uh oh!

guan404ming left a comment

Choose a reason for hiding this comment

Uh oh!

guan404ming commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rich7420 commented Jan 13, 2026 •

edited

Loading