A technique for aligning models by using human feedback to train a reward model, subsequently fine-tuning a large language model.

Related Terms