Hello, I attempted to replicate your method by conducting training on my local machine.
While reviewing the metrics on wandb, I noticed that the prompt_length is increasing during training.
I was under the impression that the training data is shuffled, so I didn't expect to see a distinct trend. Are you able to provide any insights on this observation?
