benchmark

Compare Calor vs C# across evaluation metrics.

Bash
calor benchmark [project] [options]

Overview

The benchmark command measures and compares Calor against C# across ten evaluation categories designed to assess AI agent effectiveness:

  1. Token Economics - Token count and density
  2. Generation Accuracy - Code correctness
  3. Comprehension - Understandability
  4. Edit Precision - Targeted modification accuracy
  5. Error Detection - Bug identification
  6. Information Density - Meaning per token
  7. Task Completion - End-to-end success
  8. Contract Verification - Z3 static verification (Calor only)
  9. Effect Soundness - Effect declaration accuracy (Calor only)
  10. Interop Coverage - BCL effect manifest coverage (Calor only)

Quick Start

Bash
# Compare two files
calor benchmark --calor Calculator.calr --csharp Calculator.cs

# Benchmark entire project
calor benchmark ./src

# Quick token-only comparison
calor benchmark --calor file.calr --csharp file.cs --quick

# Generate markdown report
calor benchmark ./src --format markdown --output report.md

Options

OptionShortDefaultDescription
--calorNoneCalor file to benchmark
--csharp, --csNoneC# file to benchmark
--category-cAllFilter by category
--format-fconsoleOutput format: console, markdown, json
--output-ostdoutSave results to file
--verbose-vfalseShow detailed per-metric breakdown
--quick-qfalseQuick token-only benchmark

File-Level Benchmark

Compare a specific Calor file against its C# equivalent:

Bash
calor benchmark --calor PaymentService.calr --csharp PaymentService.cs

Output:

Plain Text
=== Calor vs C# Benchmark ===

┌─────────────────────┬────────┬────────┬───────────┐
│ Category            │ Calor   │ C#     │ Advantage │
├─────────────────────┼────────┼────────┼───────────┤
│ Token Economics     │ 82.4   │ 58.2   │ 1.42x     │
│ Generation Accuracy │ 91.2   │ 76.5   │ 1.19x     │
│ Overall             │ 87.4   │ 68.9   │ 1.27x     │
└─────────────────────┴────────┴────────┴───────────┘

Calor shows 1.27x overall advantage for AI agent tasks.

Quick Benchmark

For fast token/line comparison without the full 10-metric evaluation:

Bash
calor benchmark --calor file.calr --csharp file.cs --quick

Output:

Plain Text
┌─────────────────┬────────┬────────┬──────────┐
│ Metric          │ Calor   │ C#     │ Savings  │
├─────────────────┼────────┼────────┼──────────┤
│ Tokens          │ 842    │ 1,245  │ 32.4%    │
│ Lines           │ 98     │ 156    │ 37.2%    │
└─────────────────┴────────┴────────┴──────────┘

Exit Codes

CodeMeaning
0Benchmark completed successfully
1Benchmark completed but Calor showed no advantage
2Error - files not found, invalid arguments, etc.

See Also