benchmark
Compare Calor vs C# across evaluation metrics.
Bash
calor benchmark [project] [options]Overview
The benchmark command measures and compares Calor against C# across ten evaluation categories designed to assess AI agent effectiveness:
- Token Economics - Token count and density
- Generation Accuracy - Code correctness
- Comprehension - Understandability
- Edit Precision - Targeted modification accuracy
- Error Detection - Bug identification
- Information Density - Meaning per token
- Task Completion - End-to-end success
- Contract Verification - Z3 static verification (Calor only)
- Effect Soundness - Effect declaration accuracy (Calor only)
- Interop Coverage - BCL effect manifest coverage (Calor only)
Quick Start
Bash
# Compare two files
calor benchmark --calor Calculator.calr --csharp Calculator.cs
# Benchmark entire project
calor benchmark ./src
# Quick token-only comparison
calor benchmark --calor file.calr --csharp file.cs --quick
# Generate markdown report
calor benchmark ./src --format markdown --output report.mdOptions
| Option | Short | Default | Description |
|---|---|---|---|
--calor | None | Calor file to benchmark | |
--csharp, --cs | None | C# file to benchmark | |
--category | -c | All | Filter by category |
--format | -f | console | Output format: console, markdown, json |
--output | -o | stdout | Save results to file |
--verbose | -v | false | Show detailed per-metric breakdown |
--quick | -q | false | Quick token-only benchmark |
File-Level Benchmark
Compare a specific Calor file against its C# equivalent:
Bash
calor benchmark --calor PaymentService.calr --csharp PaymentService.csOutput:
Plain Text
=== Calor vs C# Benchmark ===
┌─────────────────────┬────────┬────────┬───────────┐
│ Category │ Calor │ C# │ Advantage │
├─────────────────────┼────────┼────────┼───────────┤
│ Token Economics │ 82.4 │ 58.2 │ 1.42x │
│ Generation Accuracy │ 91.2 │ 76.5 │ 1.19x │
│ Overall │ 87.4 │ 68.9 │ 1.27x │
└─────────────────────┴────────┴────────┴───────────┘
Calor shows 1.27x overall advantage for AI agent tasks.Quick Benchmark
For fast token/line comparison without the full 10-metric evaluation:
Bash
calor benchmark --calor file.calr --csharp file.cs --quickOutput:
Plain Text
┌─────────────────┬────────┬────────┬──────────┐
│ Metric │ Calor │ C# │ Savings │
├─────────────────┼────────┼────────┼──────────┤
│ Tokens │ 842 │ 1,245 │ 32.4% │
│ Lines │ 98 │ 156 │ 37.2% │
└─────────────────┴────────┴────────┴──────────┘Exit Codes
| Code | Meaning |
|---|---|
0 | Benchmark completed successfully |
1 | Benchmark completed but Calor showed no advantage |
2 | Error - files not found, invalid arguments, etc. |
See Also
- calor analyze - Score files for migration potential
- Benchmarking Methodology - Detailed methodology
- Benchmark Results - Published results