Hello, I am working with the goal of completing the text generation by calling the gpt onnx model using C++ API like you. By the way, I wonder where the top_k_logits and sample_sequence functions in sample.py(original gpt-2 source) were processed in your post-processing operation code. I couldn't find any work in your code related to variable "logits" and any torch functions in sample.py.
Also the onnx model you used has 13 outputs, but your ORT in the main function specifies one output. It looks like a combined output , but I wonder if this is different from the original text generation algorithm used in the pytorch(sample_sequence + topk_logits). To summarize, I wonder how your text generation code works.
Your code inspires me a lot. So thank you very much for your work. I look forward to your answer.