Skip to content

Conversation

@guan404ming
Copy link
Member

@guan404ming guan404ming commented Jan 12, 2026

Purpose of PR

Why

  • API had 6 different encoding methods (encode, encode_batch, encode_tensor, encode_from_parquet, encode_from_arrow_ipc, encode_from_numpy)
  • Users needed to know which method to call for each input type

How

  • Unified encode() method - auto-detects lists, NumPy arrays, PyTorch tensors, and file paths
  • Default encoding method - encoding_method="amplitude" is now the default

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

@guan404ming guan404ming requested a review from rich7420 January 12, 2026 03:28
@guan404ming
Copy link
Member Author

cc @ryankert01 @400Ping

@guan404ming guan404ming force-pushed the simplify-qdp-python-api branch from 2d28941 to fe5206a Compare January 12, 2026 08:12
@rich7420
Copy link
Contributor

rich7420 commented Jan 12, 2026

@guan404ming thanks for the patch!
I think we need to change encode() functions in all benchmark files as well. (the rest qdp/qdp-python/benchmark/benchmark_numpy_io.py and qdp/qdp-python/benchmark/benchmark_latency.py)
And it would be better to check the tensor's shape. like, if it's 1D, use the single-sample encode path and then if it's 2D, use the batch encoding path and for anything else, throw a clear error explaining what shapes are supported.
On the other hand , maybe we could support for pathlib.Path objects by using os.fspath() to convert them to strings before processing.

@guan404ming guan404ming force-pushed the simplify-qdp-python-api branch from fe5206a to 76a80a1 Compare January 12, 2026 08:51
@guan404ming
Copy link
Member Author

guan404ming commented Jan 12, 2026

I've updated with three additions. Please take another look, thanks!

  1. Updated benchmark files to use unified encode() API:
  2. Added tensor shape validation with clear error messages (lib.rs):
  3. Added pathlib.Path support

@rich7420
Copy link
Contributor

Thanks for the update!
I think some comments been removed unnecessarily.
maybe shape validation in pytorch hasn't done yet. but it could be follow-up.

Copy link
Member

@ryankert01 ryankert01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Overall lg, I agree with @rich7420 that it definitely needs refactor and can be a follow up!

Copy link
Contributor

@400Ping 400Ping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@guan404ming
Copy link
Member Author

Thanks, I'll send a follow up for it.

@guan404ming guan404ming merged commit 10233ee into apache:main Jan 12, 2026
4 checks passed
@guan404ming guan404ming deleted the simplify-qdp-python-api branch January 12, 2026 13:06
@guan404ming guan404ming mentioned this pull request Jan 12, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants