benchmark

Compare Calor vs C# across evaluation metrics.

Bash

calor benchmark [project] [options]

Overview

The benchmark command measures and compares Calor against C# across ten evaluation categories designed to assess AI agent effectiveness:

Token Economics - Token count and density
Generation Accuracy - Code correctness
Comprehension - Understandability
Edit Precision - Targeted modification accuracy
Error Detection - Bug identification
Information Density - Meaning per token
Task Completion - End-to-end success
Contract Verification - Z3 static verification (Calor only)
Effect Soundness - Effect declaration accuracy (Calor only)
Interop Coverage - BCL effect manifest coverage (Calor only)

Quick Start

Bash

# Compare two files
calor benchmark --calor Calculator.calr --csharp Calculator.cs

# Benchmark entire project
calor benchmark ./src

# Quick token-only comparison
calor benchmark --calor file.calr --csharp file.cs --quick

# Generate markdown report
calor benchmark ./src --format markdown --output report.md

Options

Option	Short	Default	Description
`--calor`		None	Calor file to benchmark
`--csharp`, `--cs`		None	C# file to benchmark
`--category`	`-c`	All	Filter by category
`--format`	`-f`	`console`	Output format: `console`, `markdown`, `json`
`--output`	`-o`	stdout	Save results to file
`--verbose`	`-v`	`false`	Show detailed per-metric breakdown
`--quick`	`-q`	`false`	Quick token-only benchmark

File-Level Benchmark

Compare a specific Calor file against its C# equivalent:

Bash

calor benchmark --calor PaymentService.calr --csharp PaymentService.cs

Output:

Plain Text

=== Calor vs C# Benchmark ===

┌─────────────────────┬────────┬────────┬───────────┐
│ Category            │ Calor   │ C#     │ Advantage │
├─────────────────────┼────────┼────────┼───────────┤
│ Token Economics     │ 82.4   │ 58.2   │ 1.42x     │
│ Generation Accuracy │ 91.2   │ 76.5   │ 1.19x     │
│ Overall             │ 87.4   │ 68.9   │ 1.27x     │
└─────────────────────┴────────┴────────┴───────────┘

Calor shows 1.27x overall advantage for AI agent tasks.

Quick Benchmark

For fast token/line comparison without the full 10-metric evaluation:

Bash

calor benchmark --calor file.calr --csharp file.cs --quick

Output:

Plain Text

┌─────────────────┬────────┬────────┬──────────┐
│ Metric          │ Calor   │ C#     │ Savings  │
├─────────────────┼────────┼────────┼──────────┤
│ Tokens          │ 842    │ 1,245  │ 32.4%    │
│ Lines           │ 98     │ 156    │ 37.2%    │
└─────────────────┴────────┴────────┴──────────┘

Exit Codes

Code	Meaning
`0`	Benchmark completed successfully
`1`	Benchmark completed but Calor showed no advantage
`2`	Error - files not found, invalid arguments, etc.

benchmark

Overview

Quick Start

Options

File-Level Benchmark

Quick Benchmark

Exit Codes

See Also