Effect Discipline

Effect Discipline Benchmark

The Effect Discipline benchmark measures how well code prevents real-world bugs caused by hidden side effects.

Fair Outcome-Based Scoring

This benchmark uses outcome-based scoring to ensure fair comparison between Calor and C#:

Scoring Principle	Description
Tests Pass = Full Score	If all tests pass, both languages score 1.0
No Syntax Bias	No bonus for `§E{}` or `[Pure]` - only outcomes matter
Equal Opportunity	Both languages CAN achieve determinism with good practices

Why This Matters

These are actual bugs that have caused production incidents:

Bug Type	Real-World Impact	How Often
Flaky Tests	Random CI failures, wasted developer time	Daily
Security Violations	Data leaks, compliance failures	Critical
Hidden Side Effects	"Why did this happen?" debugging	Common
Cache Bugs	Wrong results, hard to reproduce	Subtle

The Four Categories

1. Flaky Test Prevention

Real Bug: Tests pass locally but fail in CI because they use DateTime.Now.

Plain Text

// BAD: Non-deterministic
public string GenerateReport(string title) {
    return $"Report: {title} (Generated: {DateTime.Now})";
}

// GOOD: Deterministic
public string GenerateReport(string title, long timestamp) {
    return $"Report: {title} (Generated: {timestamp})";
}

How Both Languages Prevent It:

Calor: Effect signature prevents time effects at compile time
C#: Pass dependencies as parameters (DI pattern), use ITimeProvider

2. Security Boundaries

Real Bug: A config file parser secretly phones home to validate licenses.

Plain Text

// BAD: Hidden network call
public Config ParseConfig(string json) {
    ValidateLicenseOnline(); // SECURITY VIOLATION
    return JsonParse(json);
}

// GOOD: Offline only
public Config ParseConfig(string json) {
    return JsonParse(json);
}

How Both Languages Prevent It:

Calor: Effect signature prevents network effects
C#: Interface boundaries for I/O, code review

3. Side Effect Transparency

Real Bug: A "utility" function writes to a log file, causing disk space issues.

Plain Text

// BAD: Hidden logging
public string Sanitize(string input) {
    Logger.Log($"Sanitizing: {input}"); // SIDE EFFECT
    return input.Replace("<", "");
}

// GOOD: Pure function
public string Sanitize(string input) {
    return input.Replace("<", "");
}

How Both Languages Prevent It:

Calor: Pure functions cannot have I/O effects
C#: [Pure] attribute, static methods with no dependencies

4. Cache Safety

Real Bug: Memoized function returns stale exchange rates because it secretly fetched live data.

Plain Text

// BAD: Not actually cacheable
[Memoized]
public decimal CalculatePrice(decimal base, int qty) {
    var rate = FetchExchangeRate(); // BREAKS MEMOIZATION
    return base * qty * rate;
}

// GOOD: Safe to memoize
[Memoized]
public decimal CalculatePrice(decimal base, int qty, decimal rate) {
    return base * qty * rate;
}

How Both Languages Prevent It:

Calor: Pure functions are safe to memoize by definition
C#: Careful parameter design, all inputs explicit

Scoring Methodology

Component	Weight	What It Measures
Functional Correctness	50%	Do all tests pass?
Bug Prevention	50%	Does the code produce deterministic results?

Outcome-Based Bug Prevention

Both languages achieve the same score when tests pass:

Tests Pass = Bug Prevention 1.0 for BOTH languages
Tests Fail = Bug Prevention 0.0

No bonus for:

Calor effect annotations (§E{})
C# [Pure] attributes
Any syntax-based patterns

What matters is the OUTCOME, not the mechanism.

Expected Results

Category	Result	Analysis
Flaky Test Prevention	Tie (1.0x)	Both solve with good practices
Security Boundaries	Tie (1.0x)	Both can use interface boundaries
Side Effect Transparency	Tie (1.0x)	Both can write pure functions
Cache Safety	Tie (1.0x)	Both can design deterministic APIs
Overall	Tie (1.0x)	Fair outcome-based comparison

Why Equal?

Both languages CAN write deterministic, side-effect-free code
The benchmark tests OUTCOMES (do tests pass?), not mechanisms
Calor's advantage is enforcement (compile-time errors), but both can achieve the same results

Non-Determinism Warnings

While scoring is outcome-based, the benchmark detects and warns about patterns that could cause issues:

Pattern	Warning
`DateTime.Now/UtcNow`	Time-dependent code detected
`new Random()`	Unseeded random detected
`Guid.NewGuid()`	Non-deterministic GUID usage
`HttpClient`, `WebRequest`	Network call detected
`File.Read/Write`	File I/O detected
`Console.Write`	Console output detected

These are informational only and do not affect scores.

Running the Benchmark

Bash

# Run all effect discipline tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline

# Run specific category
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
  --category flaky-test-prevention

# Sample 10 tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
  --sample 10 --verbose

Learn More

Safety Benchmark - Contract enforcement
Correctness Benchmark - Edge case handling
Methodology - How benchmarks work