Introduction
Kyro is a programmable evaluation layer for LLM applications. Define multi-agent pipelines in YAML, run judges as a DAG, and integrate AI behavioral testing into CI/CD.
What is Kyro?
Kyro is an open-source evaluation framework for LLM applications. It lets you define evaluation pipelines as YAML configs, execute them with intelligent DAG orchestration, and collect structured pass/fail results — all from your existing Node.js or Python codebase.
💡Quick install
Get started in seconds: npm install @kyro/judge
Core packages
| Package | Description |
|---|---|
@kyro/judge | Core evaluation engine — define and run judge pipelines |
@kyro/actor | Human conversation simulator for end-to-end agent testing |
@kyro/batch | Async batch evaluation at scale (50% cheaper via provider Batch APIs) |
@kyro/core | Convenience re-export of judge + shared |
@kyro/shared | AI provider abstractions (OpenAI, Anthropic, Gemini, Azure, Ollama) |
How it works
- Define your evaluation pipeline in a YAML config file
- Configure which AI provider and model to use as judges
- Run against a conversation transcript
- Receive structured results with pass/fail status, root cause, and token usage
Next steps
- Installation — add Kyro to your project
- Quick Start — run your first evaluation in 5 minutes
- KyroJudge — deep dive into the evaluation engine