A technique for aligning models by using human feedback to train a reward model, subsequently fine-tuning a large language model.
A technique for aligning models by using human feedback to train a reward model, subsequently fine-tuning a large language model.