AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

This repository contains the official implementation of AR-RAG: Autoregressive Retrieval Augmentation for Image Generation.

Overview

AR-RAG introduces a novel retrieval augmentation paradigm that enhances modern photorealistic image generation by augmenting image predictions with k-nearest neighbor (k-NN) retrievals at the patch level. Unlike existing approaches that rely on full-image retrieval conditioned on textual captions, AR-RAG retrieves locally similar patches based on their surrounding visual context, enabling caption-free retrieval while enforcing spatial coherence and semantic consistency for higher-quality image generation.

We propose two parallel frameworks:

Distribution-Augmentation in Decoding (DAiD): A training-free decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches.
Feature-Augmentation in Decoding (FAiD): A parameter-efficient fine-tuning method that smoothly integrates retrieved patches into the generation process via convolution operations.

Performance Highlights

Our methods significantly improve image generation quality across multiple benchmarks:

GenEval Benchmark

Method	Single Obj.	Two Obj.	Counting	Colors	Position	Color Attri.	Overall ↑
Janus-Pro	0.98	0.77	0.52	0.84	0.61	0.55	0.71
DAiD (ours)	0.98	0.82	0.54	0.87	0.63	0.49	0.72
FAiD (ours)	1.00	0.92	0.41	0.87	0.71	0.60	0.75

DPG-Bench

Method	Global	Entity	Attribute	Relation	Other	Overall ↑
Janus-Pro	81.76	84.53	84.34	92.22	75.20	77.26
DAiD (ours)	83.58	84.46	84.76	91.49	76.40	77.88
FAiD (ours)	82.67	85.80	85.38	92.3	76.80	79.36

MSCOCO and Midjourney Benchmarks (FID ↓)

Model	MSCOCO FID	Midjourney FID
Janus-Pro	19.59	12.81
DAiD (ours)	18.02	11.93
FAiD (ours)	17.60	9.31

Model Zoo

Model	Description	Size	HF Link
AR-RAG-FAiD	Fine-tuned model with Smoothly Feature Blending	1.2B	🤗 Model

Patch-level Retrieval Database

Data Source	Image Num	Suggest GPU Memory	HF Link
JourneyDB	1M	12 GB	ZIP
CC12M	12M	96 GB	ZIP
DataCamp	70M	-	🤗 Coming soon

Installation

git clone https://github.com/PLUM-Lab/AR-RAG.git
cd AR-RAG

# Create and activate conda environment
conda env create -f arrag.yml

Patch-level Retrieval Database & Retriever Construction

Download the checkpoint of VQ-VAE model from LlamaGen

wget -P arrag/Janus/janus https://huggingface.co/peizesun/llamagen_t2i/resolve/main/vq_ds16_t2i.pt

Construct Retreiver from Image Data

bash arrag/build_retriever/build_retriever.sh

The output faiss index will be: data/retriever/index_L

Download Pre-built Retrieval Database

# Download pre-built retrieval database
wget http://nlplab1.cs.vt.edu/~jingyuan/AR-RAG/retrieval_db.zip

Training

FAiD Model Training

bash ./arrag/train/train_FAiD.sh

The default output checkpoint path: result/ckpts/ckpts_FAiD_bx_hx.

Text to Image Generation

DAiD

python arrag/t2i_example/t2i_daid_L.sh

The default output image path: result/generated_imgs/example_t2i_daid.jpg.

FAiD

python arrag/t2i_example/t2i_faid_L.sh

The default output image path: result/generated_imgs/example_t2i_faid.jpg.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
arrag		arrag
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arrag.yml		arrag.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Contents

Overview

Performance Highlights

GenEval Benchmark

DPG-Bench

MSCOCO and Midjourney Benchmarks (FID ↓)

Model Zoo

Patch-level Retrieval Database

Installation

Patch-level Retrieval Database & Retriever Construction

Download the checkpoint of VQ-VAE model from LlamaGen

Construct Retreiver from Image Data

Download Pre-built Retrieval Database

Training

FAiD Model Training

Text to Image Generation

DAiD

FAiD

License

About

Uh oh!

Releases

Packages

Languages

License

PLUM-Lab/AR-RAG

Folders and files

Latest commit

History

Repository files navigation

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Contents

Overview

Performance Highlights

GenEval Benchmark

DPG-Bench

MSCOCO and Midjourney Benchmarks (FID ↓)

Model Zoo

Patch-level Retrieval Database

Installation

Patch-level Retrieval Database & Retriever Construction

Download the checkpoint of VQ-VAE model from LlamaGen

Construct Retreiver from Image Data

Download Pre-built Retrieval Database

Training

FAiD Model Training

Text to Image Generation

DAiD

FAiD

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages