Thank you for your impressive work! However, I'd like to ask is there code available to generate the training data Windy0822/ultrainteract_math_rollout? I am interested in the followings:
- What are the generation hyperparameters (e.g. temperature, top_p, max_new_tokens ...)? What's the instruction (like "reason step by step") given to the model?
- How to split the steps? Is there code or explicit rules available?
- How to evaluate on whether a reasoning path is correct or not? Is it the same as your prior work https://github.com/OpenBMB/Eurus/blob/main/eval/Math/math/evaluate_math_cot.py ?
Thank you and wish you all the best!