Question 1

What are LLM observability and monitoring tools?

Accepted Answer

LLM observability tools give you visibility into what your AI application is doing at runtime — capturing traces of individual LLM calls, latency, token usage, costs, and evaluation scores. Unlike traditional APM, they understand prompt/response pairs as first-class objects, letting you replay sessions, debug chain failures, and track model drift over time. This directory lists 8 tools in this category, ranging from open-source platforms like Langfuse and Phoenix to managed cloud services like Arize AI and LangSmith.

Question 2

How do I choose the right LLM monitoring tool for my project?

Accepted Answer

Four criteria separate tools quickly: (1) Self-hosted vs. managed — Langfuse and Phoenix by Arize both offer self-hostable deployments if data residency matters; (2) Framework coupling — LangSmith is tightest with LangChain/LangGraph, while Helicone works as a proxy in front of any OpenAI-compatible endpoint with zero code changes; (3) Evaluation depth — Braintrust and Arize AI have stronger built-in eval frameworks for scoring outputs, whereas AgentOps focuses more on agent session replay and action traces; (4) Cost — Helicone and Langfuse have generous free tiers, while Weights & Biases pricing scales with team seats and data volume.

Question 3

What is the difference between LangSmith and Langfuse?

Accepted Answer

LangSmith is LangChain's native observability layer — it integrates with zero configuration if you're already using LangChain/LangGraph, but it's cloud-only and vendor-tied. Langfuse is framework-agnostic, supports self-hosting on your own infrastructure, and has a fully open-source core (MIT license) with an optional managed cloud. If you're not using LangChain or need on-prem deployment, Langfuse is the more flexible choice; if you're deep in the LangChain ecosystem and want the path of least resistance, LangSmith has the tighter integration.

Question 4

Are there free or open-source LLM monitoring tools?

Accepted Answer

Yes — several options in this list are open-source or have substantial free tiers. Langfuse is fully open-source (MIT) and self-hostable with no seat limits. Phoenix by Arize is also open-source (Apache 2.0) and runs locally or in your own cloud, with no data leaving your environment. Helicone offers a free tier up to 10,000 requests/month on its managed cloud. Weights & Biases has a free tier for individuals. AgentOps and Braintrust are closed-source SaaS but offer free tiers for low-volume usage.

Question 5

How do LLM monitoring tools calculate and track AI inference costs?

Accepted Answer

Most tools infer cost by multiplying token counts (prompt + completion tokens, reported by the model provider) against a per-token pricing table they maintain internally. Helicone and Langfuse both maintain up-to-date pricing tables for major providers (OpenAI, Anthropic, Mistral, etc.) and surface per-request and aggregated cost dashboards automatically. The catch: if you're using fine-tuned models, batch APIs, or providers with custom pricing, you may need to configure custom cost rates manually — all three major platforms support this but it requires setup.

Name	Best For	Pricing	Key Differentiator
AgentOps	Production agents, cost tracking, compliance	Free (5k events/mo), Pro ($40/mo), Enterprise	Session replay and compliance designed for agents
Arize AI	Enterprise teams, LLM pipelines, regression detection	See website	Enterprise-grade audit trails and systematic evaluation
Braintrust	Multi-model evaluation, regulated industries	See website	Integrated eval framework and prompt management
Helicone	Multi-provider setups, data privacy, cost tracking	Free trial, then paid	Open-source, one-line integration, strong privacy focus
Langfuse	Data residency, complex agent workflows	Free tier, self-hosted available	Open-source with evaluation and debugging tooling
LangSmith	LangChain/LangGraph users, prompt workflows	Free tier, then paid	Tight framework integration with LangChain ecosystem
Phoenix by Arize	RAG pipelines, open-source preference	Free cloud, self-hosted	Open-source, no feature gatekeeping, OpenTelemetry native
Weights & Biases	LLM training, experiment tracking	See website	Unified ML ops with LLM monitoring for research teams

8 Best LLM Monitoring & Observability Tools

How to Choose

Comparison

Phoenix by Arize

AgentOps

Weights & Biases

Helicone

LangSmith

Braintrust

Arize AI

Langfuse

Top Monitoring Experts

Frequently Asked Questions