Results
Live Benchmark Results
Evaluated across 207 programs with 8 metrics
Agent Refactoring Benchmark
Measures Claude Code agent success rates on real refactoring tasks (rename, extract, inline, move, add contracts, change signature).
View category breakdown
Metric Breakdown
Explicit structure aids understanding
Contracts surface invariant violations
More stable during refactoring
Unique IDs enable targeted changes
Contracts help prevent edge case bugs
More semantic content per token
Better code generation from prompts
Calor's explicit syntax uses more tokens
Current Status
Calor leads in 7 of 8 metrics, demonstrating advantages in areas where explicitness matters.
Per-Program Results
Program | Lvl | Status | Adv | Tokens | Gen Acc | Comp | Edit | Err Det | Info Den | Refactor | Correct |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Abs | 1 | 1.31 | 0.43 | 1.00 | 1.72 | 1.34 | 2.00 | 0.97 | 1.44 | 1.60 | |
| AbsoluteContracts | 2 | 1.68 | 0.39 | 1.14 | 4.04 | 1.42 | 2.43 | 0.90 | 1.52 | 1.58 | |
| AbstractClass | 2 | 1.13 | 0.83 | 1.00 | 1.54 | 1.36 | 1.14 | 0.71 | 1.49 | 1.00 | |
| Adapter | 2 | 1.48 | 0.91 | 1.00 | 1.87 | 1.70 | 2.17 | 1.11 | 1.69 | 1.40 | |
| AreaOfCircle | 1 | 1.46 | 0.72 | 1.00 | 2.38 | 1.34 | 2.43 | 1.06 | 1.44 | 1.33 | |
| ArrayContracts | 3 | 1.51 | 1.03 | 1.03 | 1.72 | 1.38 | 2.43 | 1.15 | 1.55 | 1.80 | |
| ArraySlice | 2 | 1.57 | 0.63 | 1.00 | 3.20 | 1.46 | 2.43 | 0.94 | 1.44 | 1.50 | |
| ArraySum | 2 | 1.23 | 0.57 | 1.00 | 1.98 | 1.25 | 1.33 | 1.43 | 1.46 | 0.83 | |
| AsyncChain | 3 | 1.19 | 0.83 | 1.00 | 1.44 | 1.46 | 1.33 | 0.98 | 1.49 | 1.00 | |
| AsyncEffect | 3 | 1.29 | 0.94 | 1.00 | 1.55 | 1.04 | 1.67 | 1.49 | 1.62 | 1.00 | |
| AsyncErrorHandling | 3 | 1.43 | 0.81 | 1.00 | 2.10 | 1.51 | 1.67 | 1.86 | 1.62 | 0.91 | |
| AsyncLoop | 3 | 1.43 | 0.93 | 1.00 | 2.49 | 1.46 | 1.33 | 1.79 | 1.46 | 1.00 | |
| AsyncPure | 3 | 1.06 | 0.79 | 1.00 | 1.01 | 1.00 | 1.33 | 0.92 | 1.46 | 1.00 | |
| AsyncReturn | 3 | 1.17 | 0.70 | 1.00 | 1.58 | 1.34 | 1.33 | 0.99 | 1.44 | 1.00 | |
| AutoProps | 2 | 1.27 | 0.53 | 1.00 | 2.17 | 1.36 | 1.33 | 1.25 | 1.49 | 1.00 | |
| Average | 2 | 1.42 | 0.66 | 1.00 | 2.32 | 1.34 | 1.86 | 1.73 | 1.44 | 1.00 | |
| BankAccount | 3 | 1.69 | 0.72 | 1.00 | 3.31 | 1.68 | 2.43 | 1.37 | 1.52 | 1.50 | |
| BasicTryCatch | 2 | 1.14 | 0.55 | 1.00 | 1.41 | 1.34 | 1.33 | 0.88 | 1.49 | 1.08 | |
| BclCoverage | 2 | 1.35 | 1.25 | 1.14 | 1.78 | 1.46 | 1.43 | 1.12 | 1.62 | 1.00 | |
| BinarySearch | 3 | 1.17 | 0.96 | 1.00 | 1.49 | 1.38 | 1.33 | 0.78 | 1.55 | 0.86 | |
| BinaryTree | 4 | 1.35 | 1.90 | 1.00 | 1.39 | 1.40 | 1.14 | 1.00 | 2.04 | 0.92 | |
| BitContracts | 3 | 1.51 | 0.51 | 1.14 | 2.97 | 1.42 | 2.17 | 0.72 | 1.52 | 1.60 | |
| BitSet | 3 | 1.62 | 0.36 | 1.00 | 3.19 | 1.34 | 2.83 | 0.74 | 1.56 | 1.90 | |
| BMICalculator | 2 | 1.51 | 0.50 | 1.00 | 2.41 | 1.34 | 2.83 | 0.66 | 1.44 | 1.90 | |
| BreadthFirstSearch | 3 | 1.36 | 1.47 | 1.00 | 1.85 | 1.38 | 1.33 | 1.43 | 1.55 | 0.86 | |
| BubbleSort | 2 | 1.53 | 0.81 | 1.00 | 2.94 | 1.29 | 2.17 | 1.30 | 1.52 | 1.17 | |
| BuggyContracts | 2 | 1.38 | 0.66 | 1.14 | 1.76 | 1.42 | 2.00 | 1.23 | 1.61 | 1.25 | |
| Builder | 3 | 1.26 | 1.12 | 1.00 | 1.38 | 1.40 | 1.14 | 1.22 | 1.65 | 1.20 | |
| Calculator | 2 | 1.13 | 0.48 | 1.00 | 1.73 | 1.38 | 1.33 | 0.61 | 1.52 | 1.00 | |
| Calendar | 3 | 1.54 | 0.36 | 1.00 | 2.65 | 1.34 | 2.83 | 0.73 | 1.49 | 1.90 | |
| CancellableTask | 3 | 1.29 | 0.79 | 1.00 | 2.07 | 1.46 | 1.33 | 1.22 | 1.46 | 1.00 | |
| Capitalize | 2 | 1.38 | 0.47 | 1.00 | 2.18 | 1.34 | 2.17 | 0.81 | 1.44 | 1.60 | |
| CelsiusToKelvin | 1 | 1.47 | 0.71 | 1.00 | 2.38 | 1.34 | 2.43 | 1.09 | 1.44 | 1.33 | |
| ChainOfResponsibility | 3 | 1.28 | 0.95 | 1.00 | 1.57 | 1.40 | 1.86 | 0.78 | 1.58 | 1.08 | |
| CircularBuffer | 3 | 1.57 | 0.90 | 1.00 | 2.72 | 1.36 | 2.43 | 1.38 | 1.49 | 1.29 | |
| Clamp | 2 | 1.39 | 1.12 | 1.00 | 1.42 | 1.38 | 1.89 | 0.99 | 1.38 | 1.90 | |
| CollectionLib | 2 | 1.49 | 0.69 | 1.10 | 2.81 | 1.56 | 2.00 | 0.97 | 1.55 | 1.25 | |
| Command | 3 | 1.34 | 1.12 | 1.00 | 2.06 | 1.71 | 1.33 | 1.13 | 1.55 | 0.83 | |
| CompactClass | 2 | 1.49 | 0.43 | 1.00 | 2.82 | 1.34 | 2.33 | 0.89 | 1.53 | 1.60 | |
| ComposedEffects | 3 | 1.44 | 0.91 | 1.14 | 2.33 | 1.42 | 1.67 | 1.64 | 1.72 | 0.71 | |
| Composite | 3 | 1.69 | 1.23 | 1.00 | 2.06 | 1.71 | 2.83 | 1.22 | 1.65 | 1.80 | |
| Composition | 3 | 1.48 | 1.06 | 1.00 | 1.85 | 1.40 | 2.17 | 1.23 | 1.72 | 1.40 | |
| CompoundInterest | 2 | 1.48 | 0.67 | 1.00 | 2.51 | 1.34 | 2.43 | 0.94 | 1.46 | 1.50 | |
| ConstructorInit | 2 | 1.59 | 0.69 | 1.00 | 3.17 | 1.36 | 2.43 | 1.27 | 1.49 | 1.33 | |
| Contains | 2 | 1.20 | 0.47 | 1.00 | 1.88 | 1.34 | 1.33 | 0.87 | 1.46 | 1.20 | |
| ContractedDivide | 3 | 1.26 | 0.93 | 1.00 | 0.99 | 1.34 | 1.89 | 0.82 | 1.31 | 1.80 | |
| CorrectEffects | 2 | 1.36 | 0.76 | 1.14 | 2.13 | 1.42 | 1.67 | 1.04 | 1.75 | 1.00 | |
| CountOccurrences | 2 | 1.41 | 0.44 | 1.00 | 2.25 | 1.34 | 2.00 | 1.15 | 1.46 | 1.60 | |
| CountVowels | 2 | 1.12 | 0.61 | 1.00 | 1.50 | 1.25 | 1.33 | 0.75 | 1.49 | 1.00 | |
| CsvParser | 2 | 1.60 | 0.69 | 1.00 | 2.48 | 1.34 | 2.00 | 2.86 | 1.46 | 1.00 | |
| CurrencyConverter | 2 | 1.55 | 0.34 | 1.00 | 3.13 | 1.34 | 2.83 | 0.77 | 1.43 | 1.60 | |
| CustomException | 2 | 1.38 | 0.76 | 1.00 | 1.79 | 1.38 | 1.86 | 1.05 | 1.58 | 1.60 | |
| DatabaseEffect | 3 | 1.44 | 0.70 | 1.14 | 2.67 | 1.42 | 1.67 | 1.27 | 1.65 | 1.00 | |
| DateDiff | 2 | 1.66 | 0.46 | 1.00 | 3.63 | 1.34 | 2.83 | 0.71 | 1.43 | 1.90 | |
| Decorator | 3 | 1.64 | 0.82 | 1.00 | 2.40 | 1.75 | 2.83 | 0.94 | 1.58 | 1.80 | |
| DelayedResult | 3 | 1.44 | 0.62 | 1.00 | 2.09 | 1.34 | 2.50 | 1.05 | 1.53 | 1.40 | |
| DepthFirstSearch | 3 | 1.37 | 1.30 | 1.00 | 2.02 | 1.34 | 1.33 | 1.26 | 1.49 | 1.20 | |
| Deque | 3 | 1.61 | 1.06 | 1.00 | 2.76 | 1.36 | 2.43 | 1.33 | 1.46 | 1.50 | |
| DictOps | 2 | 1.55 | 0.88 | 1.00 | 2.94 | 1.46 | 2.00 | 1.38 | 1.41 | 1.33 | |
| DigitCount | 2 | 1.62 | 0.24 | 1.00 | 3.10 | 1.34 | 2.83 | 0.98 | 1.56 | 1.90 | |
| Dijkstra | 4 | 1.80 | 1.39 | 1.00 | 2.96 | 1.37 | 2.83 | 1.61 | 1.46 | 1.80 | |
| DisjointSet | 3 | 1.32 | 1.30 | 1.00 | 1.62 | 1.55 | 1.33 | 1.04 | 1.52 | 1.20 | |
| DistanceCalculator | 3 | 1.25 | 0.37 | 1.00 | 1.78 | 1.34 | 2.00 | 0.55 | 1.49 | 1.50 | |
| DivisionContracts | 3 | 1.58 | 0.65 | 1.14 | 2.78 | 1.42 | 2.43 | 0.82 | 1.61 | 1.80 | |
| EditDistance | 3 | 1.29 | 1.31 | 1.00 | 1.80 | 1.25 | 1.33 | 1.16 | 1.49 | 1.00 | |
| EmailValidator | 2 | 1.18 | 0.54 | 1.00 | 2.05 | 1.34 | 1.33 | 0.71 | 1.44 | 1.00 | |
| Encapsulation | 3 | 1.20 | 1.22 | 1.00 | 1.31 | 1.40 | 1.33 | 0.82 | 1.52 | 1.00 | |
| EnumMatch | 2 | 1.36 | 0.62 | 1.00 | 2.07 | 1.34 | 1.33 | 1.88 | 1.44 | 1.20 | |
| EnumType | 3 | 1.34 | 0.66 | 1.00 | 2.37 | 1.34 | 1.33 | 1.18 | 1.46 | 1.40 | |
| EnumWithMethods | 3 | 1.29 | 0.48 | 1.00 | 2.37 | 1.34 | 1.33 | 0.94 | 1.44 | 1.40 | |
| ErrorPropagation | 2 | 1.31 | 0.56 | 1.00 | 2.21 | 1.34 | 1.86 | 0.92 | 1.44 | 1.17 | |
| ExceptionChain | 2 | 1.44 | 0.77 | 1.00 | 2.11 | 1.34 | 1.71 | 1.72 | 1.49 | 1.36 | |
| ExhaustiveMatch | 2 | 1.37 | 0.68 | 1.00 | 2.41 | 1.34 | 1.14 | 1.75 | 1.46 | 1.17 | |
| ExpressionBodied | 1 | 1.26 | 0.64 | 1.00 | 2.31 | 1.34 | 1.33 | 1.02 | 1.41 | 1.00 | |
| Factorial | 2 | 1.13 | 0.50 | 1.00 | 1.31 | 1.34 | 1.33 | 0.82 | 1.51 | 1.20 | |
| Factory | 3 | 1.57 | 1.00 | 1.00 | 2.38 | 1.38 | 2.43 | 0.97 | 2.10 | 1.33 | |
| Fibonacci | 2 | 1.14 | 0.38 | 1.00 | 1.41 | 1.34 | 1.33 | 0.77 | 1.51 | 1.40 | |
| FileEffects | 3 | 1.47 | 0.92 | 1.10 | 2.72 | 1.42 | 1.67 | 1.30 | 1.65 | 1.00 | |
| Filter | 2 | 1.35 | 0.94 | 1.00 | 2.69 | 1.46 | 1.33 | 0.73 | 1.41 | 1.20 | |
| FizzBuzz | 2 | 1.16 | 0.96 | 1.00 | 1.06 | 1.28 | 1.50 | 0.36 | 1.72 | 1.40 | |
| Flyweight | 3 | 1.62 | 0.88 | 1.00 | 2.25 | 1.40 | 2.83 | 1.52 | 1.61 | 1.50 | |
| GCD | 2 | 1.10 | 0.49 | 1.00 | 1.32 | 1.34 | 1.33 | 0.78 | 1.51 | 1.00 | |
| GenericClass | 2 | 1.44 | 1.15 | 1.00 | 1.98 | 1.36 | 1.71 | 1.38 | 1.46 | 1.50 | |
| GenericConstraints | 3 | 1.40 | 0.79 | 1.00 | 2.82 | 1.46 | 1.33 | 0.96 | 1.41 | 1.40 | |
| GenericFunction | 2 | 1.36 | 0.67 | 1.00 | 2.90 | 1.46 | 1.33 | 1.10 | 1.41 | 1.00 | |
| GradeCalculator | 2 | 1.34 | 0.47 | 1.00 | 1.98 | 1.34 | 2.17 | 0.62 | 1.44 | 1.70 | |
| Graph | 3 | 1.71 | 0.90 | 1.00 | 3.15 | 1.64 | 2.83 | 1.18 | 1.46 | 1.50 | |
| GroupBy | 2 | 1.56 | 1.23 | 1.00 | 2.86 | 1.46 | 2.17 | 0.96 | 1.41 | 1.40 | |
| GuardClause | 2 | 1.38 | 0.71 | 1.00 | 2.50 | 1.34 | 1.86 | 0.82 | 1.49 | 1.33 | |
| GuardMatch | 2 | 1.35 | 0.80 | 1.00 | 1.97 | 1.34 | 1.33 | 1.53 | 1.44 | 1.40 | |
| HashMap | 3 | 1.67 | 1.34 | 1.00 | 2.28 | 1.30 | 2.83 | 1.51 | 1.57 | 1.50 | |
| HelloWorld | 1 | 1.37 | 0.72 | 1.00 | 2.61 | 1.33 | 1.50 | 1.28 | 1.53 | 1.00 | |
| HiddenNetworkEffect | 3 | 1.56 | 1.29 | 1.14 | 2.35 | 1.46 | 1.43 | 2.08 | 1.75 | 1.00 | |
| Hypotenuse | 2 | 1.49 | 0.44 | 1.00 | 3.04 | 1.34 | 2.43 | 0.92 | 1.44 | 1.33 | |
| Inheritance | 3 | 1.33 | 0.69 | 1.00 | 1.74 | 1.51 | 1.33 | 1.49 | 1.52 | 1.40 | |
| InlineSigs | 2 | 1.22 | 0.46 | 1.00 | 2.30 | 1.34 | 1.33 | 0.71 | 1.41 | 1.20 | |
| InsertionSort | 2 | 1.33 | 0.83 | 1.00 | 2.61 | 1.25 | 1.33 | 1.28 | 1.49 | 0.83 | |
| InterfaceImpl | 3 | 1.56 | 1.06 | 1.00 | 2.16 | 1.71 | 2.17 | 1.21 | 1.76 | 1.40 | |
| InterpolationSearch | 3 | 1.37 | 0.89 | 1.00 | 1.92 | 1.34 | 2.17 | 0.98 | 1.51 | 1.14 | |
| Inventory | 3 | 1.61 | 0.87 | 1.00 | 3.21 | 1.36 | 2.43 | 1.08 | 1.46 | 1.50 | |
| IsAlpha | 1 | 1.21 | 0.50 | 1.00 | 2.38 | 1.34 | 1.33 | 0.53 | 1.41 | 1.20 | |
| IsEven | 1 | 1.13 | 0.60 | 1.00 | 1.45 | 1.34 | 1.33 | 0.68 | 1.44 | 1.20 | |
| IsOdd | 1 | 1.10 | 0.60 | 1.00 | 1.45 | 1.34 | 1.33 | 0.68 | 1.44 | 1.00 | |
| IsPrime | 3 | 1.19 | 0.65 | 1.00 | 1.18 | 1.38 | 1.44 | 0.70 | 1.44 | 1.70 | |
| Iterator | 2 | 1.47 | 0.70 | 1.00 | 2.25 | 1.37 | 2.17 | 1.43 | 1.46 | 1.40 | |
| Knapsack | 4 | 1.68 | 0.90 | 1.00 | 3.15 | 1.25 | 2.83 | 1.24 | 1.49 | 1.58 | |
| LCS | 3 | 1.31 | 1.02 | 1.00 | 2.03 | 1.25 | 1.33 | 1.21 | 1.44 | 1.17 | |
| LeapYear | 2 | 1.15 | 0.71 | 1.00 | 1.36 | 1.38 | 1.33 | 0.49 | 1.55 | 1.40 | |
| LinearSearch | 2 | 1.18 | 0.52 | 1.00 | 2.01 | 1.25 | 1.33 | 0.97 | 1.46 | 0.86 | |
| LinkedList | 4 | 1.43 | 2.14 | 1.00 | 1.25 | 1.40 | 1.14 | 1.23 | 1.84 | 1.40 | |
| LinqPipeline | 2 | 1.27 | 0.83 | 1.00 | 2.13 | 1.34 | 1.33 | 0.99 | 1.41 | 1.17 | |
| ListInvariant | 3 | 1.71 | 0.35 | 1.14 | 4.08 | 1.42 | 2.43 | 0.91 | 1.52 | 1.80 | |
| ListOps | 2 | 1.44 | 0.58 | 1.00 | 2.61 | 1.34 | 2.00 | 1.22 | 1.41 | 1.33 | |
| Map | 2 | 1.49 | 1.04 | 1.00 | 3.69 | 1.46 | 1.33 | 1.01 | 1.41 | 1.00 | |
| MathLib | 2 | 1.49 | 0.48 | 1.10 | 3.14 | 1.42 | 2.00 | 0.83 | 1.52 | 1.40 | |
| MathOperations | 3 | 1.25 | 1.14 | 1.00 | 1.02 | 1.38 | 1.33 | 1.00 | 1.52 | 1.60 | |
| MatrixMultiply | 3 | 1.49 | 0.99 | 1.00 | 2.45 | 1.37 | 1.86 | 1.49 | 1.44 | 1.33 | |
| MaxHeap | 3 | 1.53 | 1.59 | 1.00 | 2.14 | 1.40 | 1.86 | 1.50 | 1.58 | 1.17 | |
| MaxTwo | 1 | 1.27 | 0.39 | 1.00 | 1.79 | 1.34 | 2.00 | 0.81 | 1.44 | 1.40 | |
| MaxValue | 2 | 1.12 | 0.53 | 1.00 | 1.63 | 1.25 | 0.89 | 1.31 | 1.36 | 1.00 | |
| Mediator | 3 | 1.31 | 1.45 | 1.00 | 1.78 | 1.40 | 1.14 | 1.36 | 1.52 | 0.83 | |
| MergeSort | 3 | 1.53 | 1.90 | 1.00 | 1.82 | 1.38 | 2.17 | 1.15 | 1.52 | 1.33 | |
| MethodOverloading | 3 | 1.17 | 0.47 | 1.00 | 2.18 | 1.34 | 1.33 | 0.66 | 1.41 | 1.00 | |
| MethodOverriding | 3 | 1.32 | 0.81 | 1.00 | 1.67 | 1.45 | 1.17 | 1.98 | 1.46 | 1.00 | |
| MinStack | 3 | 1.33 | 1.25 | 1.00 | 1.76 | 1.64 | 1.14 | 1.21 | 1.46 | 1.17 | |
| MinTwo | 1 | 1.27 | 0.39 | 1.00 | 1.79 | 1.34 | 2.00 | 0.81 | 1.44 | 1.40 | |
| MissingEffects | 2 | 1.42 | 0.89 | 1.14 | 2.38 | 1.42 | 1.67 | 1.18 | 1.72 | 1.00 | |
| MixedContracts | 3 | 1.52 | 0.85 | 1.14 | 1.97 | 1.42 | 2.43 | 1.20 | 1.65 | 1.50 | |
| MixedSyntax | 2 | 1.44 | 0.63 | 1.00 | 2.45 | 1.34 | 2.43 | 0.88 | 1.46 | 1.33 | |
| ModuloContracts | 3 | 1.76 | 0.29 | 1.14 | 4.19 | 1.42 | 2.83 | 0.74 | 1.54 | 1.90 | |
| MultipleCatch | 2 | 1.19 | 0.58 | 1.00 | 1.62 | 1.38 | 1.33 | 0.96 | 1.52 | 1.09 | |
| NestedMatch | 2 | 1.35 | 0.91 | 1.00 | 1.97 | 1.34 | 1.33 | 1.65 | 1.44 | 1.20 | |
| NetworkEffect | 3 | 1.45 | 0.70 | 1.14 | 2.64 | 1.42 | 1.67 | 1.37 | 1.68 | 1.00 | |
| NullCheck | 2 | 1.15 | 0.66 | 1.00 | 2.01 | 1.34 | 1.14 | 0.79 | 1.41 | 0.80 | |
| Observer | 3 | 1.30 | 1.10 | 1.00 | 1.79 | 1.40 | 1.33 | 1.07 | 1.55 | 1.20 | |
| OptionalIds | 2 | 1.17 | 0.47 | 1.00 | 2.15 | 1.34 | 1.33 | 0.69 | 1.41 | 1.00 | |
| OptionType | 2 | 1.58 | 0.69 | 1.10 | 3.05 | 1.44 | 1.86 | 1.41 | 1.58 | 1.50 | |
| OverflowSafe | 3 | 1.56 | 1.13 | 1.03 | 1.76 | 1.38 | 2.43 | 1.40 | 1.55 | 1.80 | |
| OverflowUnsafe | 3 | 1.69 | 0.98 | 1.03 | 1.96 | 1.34 | 2.83 | 2.20 | 1.38 | 1.80 | |
| Palindrome | 2 | 1.21 | 0.54 | 1.00 | 1.83 | 1.34 | 1.33 | 1.14 | 1.49 | 1.00 | |
| ParallelTasks | 3 | 1.38 | 0.67 | 1.00 | 2.83 | 1.46 | 1.33 | 1.29 | 1.44 | 1.00 | |
| PasswordValidator | 2 | 1.20 | 0.67 | 1.00 | 2.00 | 1.34 | 1.33 | 0.81 | 1.46 | 1.00 | |
| PhoneBook | 2 | 1.59 | 0.75 | 1.00 | 2.83 | 1.36 | 2.83 | 1.04 | 1.41 | 1.50 | |
| Polymorphism | 3 | 1.31 | 1.28 | 1.00 | 1.31 | 1.71 | 1.33 | 0.88 | 1.58 | 1.40 | |
| Power | 2 | 1.21 | 0.82 | 1.00 | 1.13 | 1.38 | 1.44 | 0.94 | 1.41 | 1.60 | |
| PriorityQueue | 3 | 1.53 | 1.73 | 1.00 | 2.12 | 1.40 | 1.86 | 1.19 | 1.65 | 1.33 | |
| Properties | 2 | 1.44 | 0.72 | 1.10 | 2.22 | 1.51 | 1.86 | 1.22 | 1.55 | 1.33 | |
| PropertyAccess | 1 | 1.99 | 0.78 | 1.10 | 2.09 | 1.51 | 2.17 | 5.09 | 1.58 | 1.60 | |
| PropertyMatch | 2 | 1.37 | 0.80 | 1.00 | 1.89 | 1.68 | 1.33 | 1.54 | 1.52 | 1.20 | |
| ProvableContracts | 2 | 1.55 | 0.59 | 1.14 | 2.52 | 1.42 | 2.43 | 1.05 | 1.65 | 1.58 | |
| Proxy | 3 | 1.33 | 0.83 | 1.00 | 1.74 | 1.71 | 1.43 | 1.05 | 1.68 | 1.20 | |
| PureComputation | 2 | 1.46 | 0.50 | 1.14 | 2.73 | 1.42 | 2.00 | 0.88 | 1.52 | 1.50 | |
| PureFunctions | 1 | 1.44 | 0.46 | 1.14 | 1.98 | 1.46 | 2.17 | 0.83 | 1.75 | 1.70 | |
| Queue | 3 | 1.26 | 1.09 | 1.00 | 1.52 | 1.40 | 1.14 | 1.25 | 1.55 | 1.17 | |
| QuickSort | 4 | 1.51 | 2.16 | 1.00 | 1.85 | 1.29 | 1.86 | 1.07 | 1.72 | 1.14 | |
| RangeContracts | 3 | 1.59 | 0.47 | 1.14 | 2.85 | 1.42 | 2.83 | 0.79 | 1.61 | 1.58 | |
| RangeMatch | 2 | 1.56 | 0.60 | 1.00 | 2.32 | 1.34 | 2.17 | 1.95 | 1.44 | 1.70 | |
| RecordType | 3 | 1.56 | 0.28 | 1.10 | 3.61 | 1.74 | 2.00 | 0.87 | 1.57 | 1.30 | |
| Reduce | 2 | 1.32 | 0.80 | 1.00 | 2.58 | 1.34 | 1.33 | 0.90 | 1.41 | 1.20 | |
| ResultType | 2 | 1.28 | 1.10 | 1.00 | 1.85 | 1.40 | 1.33 | 1.15 | 1.52 | 0.92 | |
| ReverseArray | 2 | 1.64 | 0.51 | 1.00 | 3.50 | 1.34 | 2.17 | 1.81 | 1.46 | 1.33 | |
| ReverseString | 2 | 1.23 | 0.75 | 1.00 | 1.79 | 1.34 | 1.33 | 1.31 | 1.46 | 0.83 | |
| ScoreBoard | 2 | 1.65 | 0.81 | 1.00 | 2.92 | 1.36 | 2.83 | 1.10 | 1.41 | 1.80 | |
| SealedClass | 2 | 1.28 | 1.30 | 1.00 | 1.70 | 1.36 | 1.33 | 1.03 | 1.49 | 1.00 | |
| SearchContracts | 3 | 1.67 | 0.39 | 1.14 | 3.39 | 1.42 | 2.83 | 0.83 | 1.55 | 1.80 | |
| SelectionSort | 2 | 1.52 | 0.88 | 1.00 | 2.73 | 1.25 | 2.17 | 1.49 | 1.49 | 1.17 | |
| SetOps | 2 | 1.60 | 1.18 | 1.00 | 2.77 | 1.46 | 2.00 | 1.38 | 1.49 | 1.50 | |
| ShoppingCart | 4 | 1.64 | 1.77 | 1.00 | 2.25 | 1.30 | 2.43 | 1.09 | 2.13 | 1.19 | |
| Sign | 1 | 1.09 | 0.56 | 1.00 | 1.38 | 1.34 | 1.33 | 0.68 | 1.44 | 1.00 | |
| SimpleAsync | 3 | 1.34 | 0.78 | 1.00 | 1.96 | 1.33 | 1.50 | 1.59 | 1.53 | 1.00 | |
| SimpleClass | 2 | 1.68 | 0.74 | 1.00 | 2.80 | 1.68 | 2.83 | 1.03 | 1.58 | 1.80 | |
| SimpleMatch | 2 | 1.42 | 0.60 | 1.00 | 2.15 | 1.34 | 1.33 | 2.09 | 1.44 | 1.40 | |
| Singleton | 2 | 1.18 | 0.72 | 1.00 | 1.68 | 1.40 | 1.33 | 1.01 | 1.52 | 0.77 | |
| Sort | 2 | 1.24 | 0.98 | 1.00 | 2.10 | 1.25 | 1.33 | 0.99 | 1.44 | 0.83 | |
| SortedList | 2 | 1.43 | 0.86 | 1.00 | 2.07 | 1.63 | 2.00 | 1.18 | 1.44 | 1.25 | |
| SortingContracts | 3 | 1.69 | 0.43 | 1.14 | 3.69 | 1.42 | 2.83 | 0.66 | 1.52 | 1.80 | |
| Stack | 3 | 1.28 | 1.18 | 1.00 | 1.44 | 1.40 | 1.14 | 1.14 | 1.55 | 1.40 | |
| State | 3 | 1.55 | 0.59 | 1.00 | 2.60 | 1.40 | 1.33 | 2.54 | 1.52 | 1.40 | |
| StateEffect | 3 | 1.47 | 0.65 | 1.14 | 2.30 | 1.44 | 1.67 | 1.88 | 1.65 | 1.00 | |
| StaticMembers | 3 | 1.30 | 0.74 | 1.10 | 1.69 | 1.46 | 1.33 | 1.59 | 1.46 | 1.00 | |
| Strategy | 3 | 1.27 | 1.14 | 1.00 | 1.74 | 1.53 | 1.33 | 0.91 | 1.55 | 1.00 | |
| StringContracts | 3 | 1.43 | 0.89 | 1.03 | 1.76 | 1.34 | 2.43 | 1.22 | 1.46 | 1.29 | |
| StringLib | 2 | 1.43 | 0.46 | 1.10 | 2.87 | 1.42 | 2.00 | 0.78 | 1.52 | 1.25 | |
| StringUtils | 2 | 1.11 | 0.52 | 1.00 | 1.69 | 1.38 | 1.33 | 0.65 | 1.52 | 0.83 | |
| SumDigits | 2 | 1.65 | 0.19 | 1.00 | 3.37 | 1.34 | 2.83 | 1.02 | 1.56 | 1.90 | |
| SumRange | 2 | 1.28 | 0.88 | 1.00 | 1.51 | 1.25 | 1.86 | 1.10 | 1.49 | 1.17 | |
| SwitchExpression | 2 | 1.64 | 0.72 | 1.00 | 2.26 | 1.34 | 2.17 | 2.45 | 1.49 | 1.70 | |
| TaxCalculator | 3 | 1.48 | 0.38 | 1.00 | 2.63 | 1.34 | 2.83 | 0.66 | 1.44 | 1.58 | |
| TemperatureConverter | 1 | 1.13 | 0.54 | 1.00 | 1.59 | 1.34 | 1.33 | 0.73 | 1.49 | 1.00 | |
| TemplateMethod | 3 | 1.03 | 0.83 | 0.76 | 0.54 | 0.91 | 1.14 | 1.02 | 1.92 | 1.10 | |
| TernaryChain | 2 | 1.27 | 0.62 | 1.00 | 1.94 | 1.34 | 1.33 | 1.15 | 1.41 | 1.40 | |
| Timer | 2 | 1.62 | 0.66 | 1.00 | 3.03 | 1.40 | 2.43 | 1.37 | 1.55 | 1.50 | |
| TimerEffect | 3 | 1.71 | 0.42 | 1.10 | 3.71 | 1.42 | 2.83 | 0.78 | 1.54 | 1.90 | |
| TodoList | 2 | 1.61 | 0.92 | 1.00 | 2.51 | 1.63 | 2.43 | 1.44 | 1.46 | 1.50 | |
| TowerOfHanoi | 3 | 1.56 | 0.81 | 1.00 | 2.65 | 1.38 | 2.43 | 0.97 | 1.65 | 1.58 | |
| Trie | 4 | 1.62 | 1.36 | 1.00 | 2.09 | 1.40 | 2.83 | 1.30 | 1.61 | 1.39 | |
| TruncateString | 2 | 1.31 | 0.53 | 1.00 | 2.17 | 1.34 | 1.86 | 1.05 | 1.44 | 1.07 | |
| TryFinally | 2 | 1.22 | 0.56 | 1.00 | 1.82 | 1.34 | 1.33 | 1.20 | 1.49 | 1.00 | |
| TupleMatch | 2 | 1.31 | 0.61 | 1.00 | 2.02 | 1.34 | 1.33 | 1.31 | 1.44 | 1.40 | |
| TypeAlias | 3 | 1.53 | 0.75 | 1.10 | 3.08 | 1.44 | 1.63 | 1.63 | 1.44 | 1.17 | |
| TypeMatch | 2 | 1.44 | 0.78 | 1.00 | 1.75 | 1.38 | 1.33 | 2.39 | 1.52 | 1.40 | |
| UnitConverter | 2 | 1.42 | 0.38 | 1.00 | 2.94 | 1.34 | 2.17 | 0.75 | 1.41 | 1.40 | |
| UrlParser | 2 | 1.25 | 0.64 | 1.00 | 1.97 | 1.34 | 1.33 | 0.91 | 1.41 | 1.40 | |
| Visitor | 3 | 1.08 | 0.94 | 0.76 | 0.61 | 0.94 | 1.33 | 0.81 | 2.23 | 1.00 | |
| VotingSystem | 2 | 1.47 | 0.64 | 1.00 | 2.50 | 1.36 | 2.17 | 1.09 | 1.44 | 1.60 | |
| WildcardMatch | 2 | 1.27 | 0.46 | 1.00 | 2.09 | 1.34 | 1.33 | 1.11 | 1.44 | 1.40 | |
| Zip | 2 | 1.39 | 0.94 | 1.00 | 3.14 | 1.34 | 1.33 | 0.96 | 1.41 | 1.00 |
Showing 207 of 207 programs. Values above 1.0 favor Calor (highlighted in pink), values below 1.0 favor C# (highlighted in teal).
The Tradeoff
The benchmark results reveal a fundamental tradeoff in language design for AI agents:
Explicitness vs. Efficiency
Calor's design prioritizes explicit semantics—contracts, effect annotations, unique IDs—that enable better reasoning about program invariants. This comes at the cost of token efficiency, as explicit syntax requires more tokens than implicit conventions.
C#'s ecosystem maturity gives it advantages in areas like task completion and generation accuracy, where LLMs benefit from extensive training data. However, Calor's explicit contracts provide measurable benefits for error detection.
When to Use Calor
Based on results, Calor is most valuable when:
- Contract verification matters — The error detection advantage validates explicit contracts
- Edit precision is important — Unique IDs enable targeted modifications
- Agent comprehension is critical — Explicit structure provides clear signals
- Token budget is flexible — You can afford the overhead of explicitness
Use C# when:
- Token efficiency is paramount — Context window is limited
- Ecosystem libraries are needed — Leverage existing tooling
- Human readability is priority — Familiar syntax for human developers
Methodology
All benchmarks are automated and reproducible. Each program is evaluated across 8 metrics comparing Calor and C# implementations.
The evaluation framework:
- Generates prompts for each program/metric combination
- Records LLM responses for both Calor and C# versions
- Evaluates responses against ground truth
- Computes ratios (greater than 1.0 favors Calor, less than 1.0 favors C#)
See Methodology for full details on the evaluation framework.
Running Benchmarks
To regenerate benchmark results:
# Run the evaluation framework
dotnet run --project tests/Calor.Evaluation -- run -f website -o website/public/data/benchmark-results.json
# Start the website to view results
cd website && npm run devResults are automatically loaded from benchmark-results.json and displayed in the dashboard above.