Policy Experiments Kit

Safely test policy changes in controlled environments before full deployment.

Configuration

Policy Experiments Dashboard

Experiments

0/4

Positive

0

Rolled Back

0

Strict Refund Policy
Baseline: Auto-approve <$50New: Auto-approve <$30
+8%
Escalation Threshold
Baseline: Human at 3 retriesNew: Human at 2 retries
+12%
Timeout Policy
Baseline: 30s timeoutNew: 45s timeout
-3%
Retry Strategy
Baseline: Exponential backoffNew: Linear backoff
+5%

Integration Code

import { createPolicyExperiment } from 'agent-tools-kit/experimentation'

const exp = createPolicyExperiment({
  rollout: 'canary',
  autoRollback: true,
  metrics: ['accuracy', 'latency', 'cost'],
  minConfidence: 0.90,
})

await exp.deploy(newPolicy, {
  trafficPercent: 5,
  comparisonWindow: '48h',
  baseline: currentPolicy,
})

// Monitor experiment results
const results = await exp.waitForResults()
console.log('Improvement:', results.improvement)
console.log('Confidence:', results.confidence)

if (results.regression) {
  await exp.rollback()
}