Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Milestone Papers
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

(2023-05) DPO by Stanford

(2023-05) DPO by Stanford

Relevant Sites

Leave a Reply

Your email address will not be published. Required fields are marked *