EvalKit
EvalKit is an open-source SDK for adding LLM tracing and evaluation to your AI applications. Instrument your code in minutes — traces appear in the Syntropylabs dashboard automatically.
Distributed Tracing
Every LLM call, tool use, and HTTP request, as a span.
LLM Evaluation
Run custom prompt-based judges on any trace or dataset.
Auto-Instrument
Zero-config patching for OpenAI, Anthropic, Axios, and more.
Python SDK
Installation
Once published (coming soon):
pip install evalkitOr install from the repository directly:
pip install git+https://github.com/syntropylabs/evalkit-py.gitQuick Start
import evalkit
client = evalkit.init(
subscription_key="tk_live_...", # from Settings → Tracing
service_name="my-app",
environment="production",
debug=True,
)
# All OpenAI / Anthropic calls are now traced automatically
from openai import OpenAI
openai_client = OpenAI()
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Manual spans
with evalkit.start_span("my-operation") as span:
span.set_attribute("custom.key", "value")
result = do_something()
span.set_attribute("result.length", len(result))Configuration options
evalkit.init(
subscription_key="tk_live_...",
base_url="https://api.syntropylabs.ai", # default
service_name="my-service",
environment="production", # production | staging | development
debug=False, # log exports to stdout
scheduled_delay_millis=5000, # batch export delay (ms)
)TypeScript / Node.js SDK
Installation
Once published (coming soon):
npm install @evalkit/sdk # or yarn add @evalkit/sdkQuick Start
import * as evalkit from '@evalkit/sdk';
const client = evalkit.init({
subscriptionKey: 'tk_live_...', // from Settings → Tracing
serviceName: 'my-app',
environment: 'production',
debug: true,
});
// OpenAI is auto-patched — just use it normally
import OpenAI from 'openai';
const openai = new OpenAI();
const res = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(res.choices[0].message.content);Manual spans
import { startSpan } from '@evalkit/sdk';
const result = await startSpan('my-operation', async (span) => {
span.setAttribute('custom.key', 'value');
const data = await fetchData();
span.setAttribute('result.count', data.length);
return data;
});NestJS / Express
The SDK auto-instruments all incoming HTTP requests — no manual middleware needed. Just call evalkit.init() before your app bootstraps:
// main.ts (NestJS)
import * as evalkit from '@evalkit/sdk';
evalkit.init({
subscriptionKey: process.env.EVALKIT_SUBSCRIPTION_KEY!,
serviceName: 'my-nestjs-app',
environment: process.env.NODE_ENV ?? 'development',
});
// Then bootstrap normally
const app = await NestFactory.create(AppModule);
await app.listen(3000);Configuration options
evalkit.init({
subscriptionKey: 'tk_live_...',
baseUrl: 'https://api.syntropylabs.ai', // default
serviceName: 'my-service',
environment: 'production',
debug: false,
scheduledDelayMillis: 5000, // batch export delay (ms)
});Tracing
What gets traced automatically
W3C traceparent propagation
Pass the traceparent header from your frontend to backend to stitch spans into a single trace across services. EvalKit reads the header automatically and creates a child span.
// Frontend: propagate traceparent to your API
fetch('/api/chat', {
headers: {
'Content-Type': 'application/json',
traceparent: evalkit.getTraceparent(), // from @evalkit/sdk
},
body: JSON.stringify({ message }),
});Viewing traces
Go to Dashboard → Tracing. Select a Trace Project to see all traces. Click any trace to open the waterfall view — spans are colour-coded by type (LLM, tool, HTTP, DB).
Evaluation
Offline evaluation (manual)
Select one or more traces in the Tracing dashboard, click Evaluate, choose your evaluation rules and a judge model, then click Run. Results are saved and shown in the Evaluation tab of each trace.
Online evaluation (automatic)
Enable online evaluation per Trace Project to automatically evaluate every new trace as it arrives. Click Online Eval in the tracing toolbar, choose rules, a judge model, and a polling interval.
Evaluation rules
Rules are prompt templates that a judge LLM uses to score a trace. Go to Dashboard → Evaluators to create rules. Group related rules into Collections so you can apply them all at once.
Example rule prompt:
"Given the conversation below, score from 0.0 to 1.0 how well the
assistant stayed on topic and did not hallucinate.
Conversation:
{{trace}}
Return JSON: { "score": <float>, "passed": <bool>, "reasoning": "<str>" }"Key Concepts
EvalKit is built by Syntropylabs. SDK packages will be published soon.