Instruction Tuning & RLHF

Post-training methods that tune language models to follow instructions and human preferences using demonstrations, rankings, and reward models.

Core metadata

ID: instruction_tuning_rlhf
Era: Modern
First known date: 2020 (exact)
Region: Global AI research community
Review status: source_checked
Maturity: established

Prerequisites

Dependents

AI Safety & Alignment Methods (ai_safety_alignment_methods)

Fields

Artificial Intelligence & Machine Learning

Field lanes

Artificial Intelligence & Machine Learning: Safety & Governance

Node sources

Learning to summarize with human feedback (arXiv / OpenAI, 2020, primary_paper) • Supports: node, maturity
Finetuned Language Models Are Zero-Shot Learners (arXiv / Google Research, 2021, primary_paper) • Supports: node, maturity
Training language models to follow instructions with human feedback (arXiv / OpenAI, 2022, primary_paper) • Supports: node, maturity

Prerequisite edge evidence

Edge/source evidence summary:

Prerequisite edges: 3
Average edge confidence: 80%
Prerequisite sources: 7
primary_source: 3

Prerequisite	Type	Confidence	Evidence level	Note	Sources
Large Language Models (large_language_models)	enabling	82%	primary_source	Instruction tuning and RLHF are post-training methods applied to pretrained language models; large language models are the main modern substrate, but the methods are not a hardware-like prerequisite.	Learning to summarize with human feedback (arXiv / OpenAI, 2020, primary_paper) • Supports: edge Finetuned Language Models Are Zero-Shot Learners (arXiv / Google Research, 2021, primary_paper) • Supports: edge Training language models to follow instructions with human feedback (arXiv / OpenAI, 2022, primary_paper) • Supports: edge
Reinforcement Learning (reinforcement_learning)	enabling	78%	primary_source	Reinforcement learning and reward modeling underpin the RLHF part of the bundled node, while instruction tuning can also use supervised demonstrations.	Learning to summarize with human feedback (arXiv / OpenAI, 2020, primary_paper) • Supports: edge Training language models to follow instructions with human feedback (arXiv / OpenAI, 2022, primary_paper) • Supports: edge
Supervised Learning Pipelines (supervised_learning_pipelines)	enabling	80%	primary_source	Instruction tuning and preference-model pipelines rely on curated demonstrations, rankings, and supervised fine-tuning workflows.	Finetuned Language Models Are Zero-Shot Learners (arXiv / Google Research, 2021, primary_paper) • Supports: edge Training language models to follow instructions with human feedback (arXiv / OpenAI, 2022, primary_paper) • Supports: edge

This page is generated from canonical era JSON and is indexable by URL.