The emergence of large language models (LLMs) has transformed recommendation paradigms from conventional matching to generative frameworks. Although prior research has successfully formulated recommendations as end-to-end generative tasks, these methods typically function as direct predictors without incorporating explicit reasoning mechanisms.
To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. By generating high-quality reasoning paths, our model not only improves recommendation precision but also maintains its native conversational ability.
The framework consists of three components:
- Itemic Alignment, which projects itemic tokens into the LLM's textual space to establish semantic grounding.
- Reasoning Activation, which constructs simple yet useful chain-of-thought (CoT) fine-tuning examples to stimulate reasoning capabilities within the recommendation context.
- Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences.
We validate our model's effectiveness on multiple public datasets, with its deployment on an industrial-scale short-video platform yielding a further online gain of 0.159% in APP Stay Time. Additionally, we conduct extensive case studies that provide qualitative evidence for the role of reasoning in recommendation.
Run the environment setup script before proceeding:
bash setup_conda_env.sh- Download Qwen3-1.7B from Hugging Face
cd basemodel
python3 download_basemodel.pyThe model is saved under basemodel/Qwen3-1-7B/.
- Extend the vocabulary to support SID tokens
python3 expand_vocab.pyThe script reads basemodel/Qwen3-1-7B/ and writes the extended model to basemodel/Qwen3-1-7B-expand/.
cd data
python3 generate_training_data.pyThe script consumes data/sequential_data_processed.txt and data/Beauty.pretrain.json, producing train/validation/test parquet files (training_data_train.parquet, training_data_val.parquet, training_data_test.parquet) for the alignment stage.
- Launch the alignment stage
cd train
bash run_training_stage1.shThis launches the LoRA-based alignment training with the parquet files generated above; adjust the script variables as needed for your environment.
- Merge the best LoRA checkpoint into the expanded base model
cd basemodel
python3 merge_model.pyEdit lora_model_path inside basemodel/merge_model.py so it targets the checkpoint you want to merge. The script combines the LoRA weights with basemodel/Qwen3-1-7B-expand/ and saves the full model to basemodel/merged_beauty_model_1-1/.
- Generate SID-only recommendation data
cd data
python3 generate_sid_prediction_data.py
python3 generate_RA_data.pyThese scripts consume the sequential data and Beauty metadata, producing training_prediction_sid_data_{train,val,test}.parquet for recommendation training and training_RA_{train,val,test}.parquet for the reasoning activation stage.
cd train
bash run_training_stage2.shThis helper first executes the recommendation training (via scripts/run_training_rec.sh), waits for it to finish, captures the latest checkpoint-* under results/beauty_sid_rec/, and then launches the reasoning activation training (via scripts/run_training_RA.sh) with that checkpoint. After it completes, you will have both the recommendation checkpoints and the CoT-enhanced checkpoints under results/ReasoningActivation/.
cd train
bash scripts/run_training_rec.shEnsure MODEL_DIR points to basemodel/merged_beauty_model_1-1/ (or your merged output) and that the train/val parquet paths reference the freshly generated SID prediction files. The script writes checkpoints to train/results/beauty_sid_rec/. (Skip this step if you already ran the combined pipeline above.)
- Manual two-step execution
- Identify the best recommendation checkpoint (e.g.,
train/results/beauty_sid_rec/checkpoint-XXXX). - Pass that directory to the RA trainer:
cd train bash scripts/run_training_RA.sh /path/to/beauty_sid_rec/checkpoint-XXXX
- Identify the best recommendation checkpoint (e.g.,
- Direct recommendation model (no CoT)
cd test
bash eval_parallel_8gpu.shUpdate MERGED_MODEL_PATH and TEST_PARQUET to target the checkpoint produced by the recommendation training (either from the combined pipeline or the standalone run).
- CoT-enhanced reasoning model
cd test
bash eval_parallel_8gpu_cot.shPoint MERGED_MODEL_PATH to the directory output by the reasoning activation training (typically under train/results/ReasoningActivation/epoch_*/; generated automatically when running the combined pipeline). This script evaluates the CoT-first, then recommendation pipeline.
