SAM3 → TensorRT

Export Meta AI's Segment Anything 3 (SAM3) model to ONNX, then build a TensorRT engine for real-time segmentation. This repo includes a CUDA inference library and demo apps for semantic and instance segmentation.

Project Overview

Python tooling to export SAM3 to a clean ONNX graph.
TensorRT-ready workflows for building optimized engines.
A C++/CUDA library for high-performance inference with demo apps.
Support for Promptable concept segmentation (PCS), the latest feautre in SAM3.
Zero-copy support on unified-memory platforms (Jetson, DGX Spark). Great for robotics/real-time interaction.
Everything runs inside a reproducible docker environment (x86, Jetson, Spark).
MIT license for the love of everything nice :)

Benchmarks

The numbers show end to end image processing latency per image (4K resolution) in ms excluding image load/save time.

Hardware	HF+PyTorch	TensorRT+CUDA	Speedup	Notes
Jetson Orin NX	6600 ms	950 ms	6.95x	Uses zero-copy
Jetson Thor				Please contribute
DGX Spark				Please contribute
RTX 3090	438 ms	75 ms	5.82x
A10	545.3 ms	161.1	3.38x	GPU hits 100% utilization
A100	314.1 ms	48.8 ms	6.43x	40GB SXM4 variant
H100	265.3 ms	34.6 ms	7.66x	PCIe variant
H100	213.2 ms	24.9 ms	8.56x	SXM5 variant
GH200	142.3 ms	23.3 ms	6.11x	arm64+H100 iGPU, without zero-copy
GH200	142.3 ms	26.4 ms	5.39x	using zero-copy
B200	160.0 ms	17.7 ms	9.03x	SXM6 variant

Note: the HF+PyTorch path is GPU-backed too, so these numbers compare two GPU implementations rather than CPU vs GPU.

Please contribute your results and I will be happy to add them here. Use this guide to run the benchmarks yourself.

Demos

Video demo (click to play):

Semantic segmentation produced by the C++ demo app (prompt='dog')

Instance segmentation results (prompt='box')

Repo Layout

python/ - ONNX export and visualization scripts.
cpp/ - C++/CUDA library and apps (TensorRT inference).
docker/ - Container setup (Dockerfile.x86, with an aarch64 variant expected).
demo/ - Example outputs from the C++ demo app.

Quickstart

Request access to the gated model
- Visit https://huggingface.co/facebook/sam3 and request access.
- Ensure your HF_TOKEN has permission.
- Set HF_TOKEN as environment variable in the host. Docker will pick it up from there.
Build the Docker container for your platform (all commands below run inside it)

On x86

docker build -t sam3-trt -f docker/Dockerfile.x86 .

On Jetson/Spark

For aarch64 platforms with shared CPU/GPU memory, the C++ library in this repo supports zero-copy inference paths.

Build and run the aarch64 container:

docker build -t sam3-trt-aarch64 -f docker/Dockerfile.aarch64 .

Export HF_TOKEN and run the docker container

export HF_TOKEN=<YOUR TOKEN>
docker run -it --rm \
  --network=host \
  --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --env HF_TOKEN \
  -v "$PWD":/workspace \
  -w /workspace \
  sam3-trt bash

Export to ONNX

python python/onnxexport.py

This produces onnx_weights/sam3_static.onnx plus external weight shards.

Build a TensorRT engine

trtexec --onnx=onnx_weights/sam3_static.onnx --saveEngine=sam3_fp16.plan --fp16 --verbose

Build the C++/CUDA library and sample app

mkdir cpp/build && cd cpp/build
cmake ..
make

Run the demo app

./sam3_pcs_app <image_dir> <engine_path.engine>

Results are written to a results/ folder.

Extensions

This is a very raw project and provides the crucial backend TensorRT/CUDA bits necessary for anything. From here, please feel free to fan out into any application you like. Pull requests are very welcome! Here are some ideas I can think of:

ROS2 wrapper for real-time robotics pipelines.
Interactive voice-based segmentation app. Have someone speak into a microphone, use a TTS model to transcribe it and feed into the engine, which then produces the segmentation mask live. I don't have the time to build it but I hope you can.
Live camera input and overlays. You will need a beefy GPU. SAM3 doesn't run realtime on a Jetson nano.

Troubleshooting

Access errors: Make sure your HF_TOKEN has access to facebook/sam3.
ONNX export fails: Install transformers from source if SAM3 is missing.
TensorRT parse errors: Ensure the full onnx_weights/ directory is copied (external data is required).
C++ build errors: Confirm CUDA, TensorRT, and OpenCV are installed and discoverable via pkg-config.

Development guide

CUDA Library Notes

The shared library target is sam3_trt.
Demo app: sam3_pcs_app (semantic/instance visualization modes).
Outputs include semantic segmentation and instance segmentation mask logits. If you choose SAM3_VISUALIZATION::VIS_NONE in your application, you need to apply sigmoid yourself.
The library does not support building engines. Use trtexec instead.

Benchmarking

Use the same image directory and prompt for all runs. Both paths time the model pipeline and exclude image load/save.

Huggingface + PyTorch:

python python/basic_script.py <image_dir>

TensorRT + CUDA (benchmark mode disables output writes):

./sam3_pcs_app <image_dir> <engine_path.engine> 1

ONNX Export Details

Default export runs on CPU for compatibility (switch device to cuda if desired).
SAM3 is large and exports with external weight shards; keep the entire onnx_weights/ directory together.

TensorRT Notes

Use trtexec for quick engine builds and benchmarking.
FP16 is the usual starting point; INT8/FP8/INT4 require calibration or compatible tooling.

License

MIT (see LICENSE).

If this saved you time, drop a ⭐ so others can find it and ship SAM-3 faster.

Disclaimer

All views expressed here are my own. This project is not affiliated with my employer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAM3 → TensorRT

Table of Contents

Project Overview

Benchmarks

Demos

Repo Layout

Quickstart

On x86

On Jetson/Spark

Extensions

Troubleshooting

Development guide

CUDA Library Notes

Benchmarking

ONNX Export Details

TensorRT Notes

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
cpp		cpp
demo		demo
docker		docker
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

dataplayer12/SAM3-TensorRT

Folders and files

Latest commit

History

Repository files navigation

SAM3 → TensorRT

Table of Contents

Project Overview

Benchmarks

Demos

Repo Layout

Quickstart

On x86

On Jetson/Spark

Extensions

Troubleshooting

Development guide

CUDA Library Notes

Benchmarking

ONNX Export Details

TensorRT Notes

License

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages