In language models, this refers to the dependency of future token predictions on the tokens previously generated by the model, not just the initial input.
In language models, this refers to the dependency of future token predictions on the tokens previously generated by the model, not just the initial input.