Request for evaluate code

Hi, thank you for your great work and the open-source release!

How do I evaluate the model's performance on each dataset?
Could you please provide some scripts or commands for the evaluation?
Or could you share the code repo you use for evaluation?

Thanks!