Direct Preference Optimization: Your Language Model is Secretly a Reward Model | LLMWay – The Way To LLM

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Milestone Papers

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

(2023-05) DPO by Stanford

Paper

(2023-05) DPO by Stanford

Relevant Sites

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

(2024-01) DeepSeek-v2 by DeepSeek

Leave a Reply Cancel reply