
Instructor is an open-source Python library designed to extract structured, validated data from Large Language Models (LLMs). Built on top of Pydantic, it provides a type-safe interface for defining output schemas and ensuring that LLM responses conform to those schemas — with automatic validation and retry logic when outputs fall short.
At its core, Instructor patches existing LLM client libraries (such as the OpenAI SDK) to accept a response_model parameter. Developers define their expected output as a Pydantic model, pass it to the client, and Instructor handles the prompt engineering, parsing, and validation loop behind the scenes. If the LLM returns malformed or invalid data, Instructor automatically retries with contextual feedback until a valid response is produced or a retry limit is reached.
The library supports over 15 LLM providers, including OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama (for local/open-source models), DeepSeek, and others. Beyond Python, Instructor has ports available in TypeScript, Go, Ruby, Elixir, and Rust — making it one of the few structured output libraries with genuine multi-language coverage.
Instructor occupies a specific niche: fast, schema-first data extraction without the overhead of a full agent runtime. It is positioned as the tool to reach for when the goal is reliable extraction — pulling structured entities, classifications, or data from unstructured text. For teams that need agent orchestration, observability dashboards, or typed tool-use primitives, the Instructor documentation explicitly points users toward PydanticAI as a complementary runtime.
Compared to alternatives like LangChain's output parsers or raw JSON mode from OpenAI, Instructor's primary differentiator is the depth of Pydantic integration. Validation is not superficial — it leverages Pydantic's full validator ecosystem, including field-level constraints, custom validators, and nested model validation. This means extraction pipelines catch logical errors (e.g., a date range where end precedes start) not just schema mismatches.
Instructor also supports streaming, allowing partial structured objects to be consumed as they are generated — useful for progressive UI rendering or large extractions. Hooks allow developers to observe request and response cycles for logging and debugging without modifying core extraction logic.
With over 3 million monthly downloads, 11,000+ GitHub stars, and 100+ contributors, Instructor has become the de facto standard for structured LLM output in the Python ecosystem. It is well-documented with an extensive cookbook of real-world examples covering use cases from entity extraction and classification to complex nested schema generation.
Instructor is free and open-source under the MIT license. There is no paid tier or hosted service — it is a library installed via pip and used within the developer's own infrastructure.
Instructor is best suited for Python developers and data engineers who need to reliably extract structured information from LLM responses — such as entity extraction, document parsing, classification pipelines, or any workflow where unstructured text must be converted into typed, validated data objects. It is particularly well-matched for teams already using Pydantic who want to extend that validation discipline into their LLM integration layer.