Library for getting structured, validated data from LLMs. Built on Pydantic for type-safe agent outputs.

Instructor is an open-source Python library designed to extract structured, validated data from Large Language Models (LLMs). Built on top of Pydantic, it provides a type-safe interface for defining output schemas and ensuring that LLM responses conform to those schemas — with automatic validation and retry logic when outputs fall short.

At its core, Instructor patches existing LLM client libraries (such as the OpenAI SDK) to accept a response_model parameter. Developers define their expected output as a Pydantic model, pass it to the client, and Instructor handles the prompt engineering, parsing, and validation loop behind the scenes. If the LLM returns malformed or invalid data, Instructor automatically retries with contextual feedback until a valid response is produced or a retry limit is reached.

The library supports over 15 LLM providers, including OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama (for local/open-source models), DeepSeek, and others. Beyond Python, Instructor has ports available in TypeScript, Go, Ruby, Elixir, and Rust — making it one of the few structured output libraries with genuine multi-language coverage.

Instructor occupies a specific niche: fast, schema-first data extraction without the overhead of a full agent runtime. It is positioned as the tool to reach for when the goal is reliable extraction — pulling structured entities, classifications, or data from unstructured text. For teams that need agent orchestration, observability dashboards, or typed tool-use primitives, the Instructor documentation explicitly points users toward PydanticAI as a complementary runtime.

Compared to alternatives like LangChain's output parsers or raw JSON mode from OpenAI, Instructor's primary differentiator is the depth of Pydantic integration. Validation is not superficial — it leverages Pydantic's full validator ecosystem, including field-level constraints, custom validators, and nested model validation. This means extraction pipelines catch logical errors (e.g., a date range where end precedes start) not just schema mismatches.

Instructor also supports streaming, allowing partial structured objects to be consumed as they are generated — useful for progressive UI rendering or large extractions. Hooks allow developers to observe request and response cycles for logging and debugging without modifying core extraction logic.

With over 3 million monthly downloads, 11,000+ GitHub stars, and 100+ contributors, Instructor has become the de facto standard for structured LLM output in the Python ecosystem. It is well-documented with an extensive cookbook of real-world examples covering use cases from entity extraction and classification to complex nested schema generation.

Key Features

Pydantic-based output schemas with full type safety and automatic validation
Automatic retry logic when LLM outputs fail validation, with contextual error feedback
Support for 15+ LLM providers including OpenAI, Anthropic, Google Gemini, Ollama, and DeepSeek
Streaming support for partial structured object consumption during generation
Multi-language availability: Python, TypeScript, Go, Ruby, Elixir, and Rust
Hook system for observing request/response cycles for logging and debugging
Nested schema support for complex, hierarchical data extraction
CLI tooling and extensive cookbook with real-world extraction examples

Pros & Cons

Pros

Deep Pydantic integration enables field-level validation beyond simple JSON schema conformance
Minimal boilerplate — patches existing LLM clients rather than requiring a new abstraction layer
Broad provider support means switching between LLMs requires minimal code changes
Strong community with 3M+ monthly downloads and active contributor base
Multi-language ports extend structured extraction patterns beyond the Python ecosystem

Cons

Scoped to extraction — not designed for agent orchestration, tool-use pipelines, or multi-step workflows
Retry logic introduces latency and additional token consumption on validation failures
Pydantic knowledge is effectively required to use the library beyond basic schemas
Open-source library with no hosted service; infrastructure and observability are the developer's responsibility

Pricing

Instructor is free and open-source under the MIT license. There is no paid tier or hosted service — it is a library installed via pip and used within the developer's own infrastructure.

Who Is This For?

Instructor is best suited for Python developers and data engineers who need to reliably extract structured information from LLM responses — such as entity extraction, document parsing, classification pipelines, or any workflow where unstructured text must be converted into typed, validated data objects. It is particularly well-matched for teams already using Pydantic who want to extend that validation discipline into their LLM integration layer.

Categories:

Frameworks

Instructor

Library for getting structured, validated data from LLMs. Built on Pydantic for type-safe agent outputs.

Key Features

Pros & Cons

Pros

Cons

Pricing

Who Is This For?

Tags:

Similar to Instructor

OpenAI Agents SDK

Smolagents

Haystack

Similar to Instructor

Similar to Instructor

OpenAI Agents SDK

Smolagents

Haystack