KyroKyro

Introduction

Kyro is a programmable evaluation layer for LLM applications. Define multi-agent pipelines in YAML, run judges as a DAG, and integrate AI behavioral testing into CI/CD.

What is Kyro?

Kyro is an open-source evaluation framework for LLM applications. It lets you define evaluation pipelines as YAML configs, execute them with intelligent DAG orchestration, and collect structured pass/fail results — all from your existing Node.js or Python codebase.

💡Quick install

Get started in seconds: npm install @kyro/judge

Core packages

PackageDescription
@kyro/judgeCore evaluation engine — define and run judge pipelines
@kyro/actorHuman conversation simulator for end-to-end agent testing
@kyro/batchAsync batch evaluation at scale (50% cheaper via provider Batch APIs)
@kyro/coreConvenience re-export of judge + shared
@kyro/sharedAI provider abstractions (OpenAI, Anthropic, Gemini, Azure, Ollama)

How it works

  1. Define your evaluation pipeline in a YAML config file
  2. Configure which AI provider and model to use as judges
  3. Run against a conversation transcript
  4. Receive structured results with pass/fail status, root cause, and token usage
import { Judge } from '@kyro/judge';
import { ProviderFactory } from '@kyro/shared';
 
const provider = ProviderFactory.create({ provider: 'openai', model: 'gpt-4o' });
const judge = new Judge('./pipeline.yaml', provider);
 
const result = await judge.run(transcript);
// { status: 'SUCCESS', data: { relevance: { status: 'SUCCESS', ... } } }

Next steps

On this page