Introduction

Kyro is a programmable evaluation layer for LLM applications. Define multi-agent pipelines in YAML, run judges as a DAG, and integrate AI behavioral testing into CI/CD.

What is Kyro?

Kyro is an open-source evaluation framework for LLM applications. It lets you define evaluation pipelines as YAML configs, execute them with intelligent DAG orchestration, and collect structured pass/fail results — all from your existing Node.js or Python codebase.

💡Quick install

Get started in seconds: npm install @kyro/judge

Core packages

Package	Description
`@kyro/judge`	Core evaluation engine — define and run judge pipelines
`@kyro/actor`	Human conversation simulator for end-to-end agent testing
`@kyro/batch`	Async batch evaluation at scale (50% cheaper via provider Batch APIs)
`@kyro/core`	Convenience re-export of judge + shared
`@kyro/shared`	AI provider abstractions (OpenAI, Anthropic, Gemini, Azure, Ollama)

How it works

Define your evaluation pipeline in a YAML config file
Configure which AI provider and model to use as judges
Run against a conversation transcript
Receive structured results with pass/fail status, root cause, and token usage

import { Judge } from '@kyro/judge';
import { ProviderFactory } from '@kyro/shared';
 
const provider = ProviderFactory.create({ provider: 'openai', model: 'gpt-4o' });
const judge = new Judge('./pipeline.yaml', provider);
 
const result = await judge.run(transcript);
// { status: 'SUCCESS', data: { relevance: { status: 'SUCCESS', ... } } }

Next steps

Installation — add Kyro to your project
Quick Start — run your first evaluation in 5 minutes
KyroJudge — deep dive into the evaluation engine

What is Kyro?

Core packages

How it works

Next steps

On this page