Policy Experiments Kit
Safely test policy changes in controlled environments before full deployment.
Configuration
Policy Experiments Dashboard
Experiments
0/4
Positive
0
Rolled Back
0
Strict Refund Policy
Baseline: Auto-approve <$50New: Auto-approve <$30
+8%
Escalation Threshold
Baseline: Human at 3 retriesNew: Human at 2 retries
+12%
Timeout Policy
Baseline: 30s timeoutNew: 45s timeout
-3%
Retry Strategy
Baseline: Exponential backoffNew: Linear backoff
+5%
Integration Code
import { createPolicyExperiment } from 'agent-tools-kit/experimentation'
const exp = createPolicyExperiment({
rollout: 'canary',
autoRollback: true,
metrics: ['accuracy', 'latency', 'cost'],
minConfidence: 0.90,
})
await exp.deploy(newPolicy, {
trafficPercent: 5,
comparisonWindow: '48h',
baseline: currentPolicy,
})
// Monitor experiment results
const results = await exp.waitForResults()
console.log('Improvement:', results.improvement)
console.log('Confidence:', results.confidence)
if (results.regression) {
await exp.rollback()
}