Effect Discipline

Effect Discipline Benchmark

The Effect Discipline benchmark measures how well code prevents real-world bugs caused by hidden side effects.


Fair Outcome-Based Scoring

This benchmark uses outcome-based scoring to ensure fair comparison between Calor and C#:

Scoring PrincipleDescription
Tests Pass = Full ScoreIf all tests pass, both languages score 1.0
No Syntax BiasNo bonus for §E{} or [Pure] - only outcomes matter
Equal OpportunityBoth languages CAN achieve determinism with good practices

Why This Matters

These are actual bugs that have caused production incidents:

Bug TypeReal-World ImpactHow Often
Flaky TestsRandom CI failures, wasted developer timeDaily
Security ViolationsData leaks, compliance failuresCritical
Hidden Side Effects"Why did this happen?" debuggingCommon
Cache BugsWrong results, hard to reproduceSubtle

The Four Categories

1. Flaky Test Prevention

Real Bug: Tests pass locally but fail in CI because they use DateTime.Now.

Plain Text
// BAD: Non-deterministic
public string GenerateReport(string title) {
    return $"Report: {title} (Generated: {DateTime.Now})";
}

// GOOD: Deterministic
public string GenerateReport(string title, long timestamp) {
    return $"Report: {title} (Generated: {timestamp})";
}

How Both Languages Prevent It:

  • Calor: Effect signature prevents time effects at compile time
  • C#: Pass dependencies as parameters (DI pattern), use ITimeProvider

2. Security Boundaries

Real Bug: A config file parser secretly phones home to validate licenses.

Plain Text
// BAD: Hidden network call
public Config ParseConfig(string json) {
    ValidateLicenseOnline(); // SECURITY VIOLATION
    return JsonParse(json);
}

// GOOD: Offline only
public Config ParseConfig(string json) {
    return JsonParse(json);
}

How Both Languages Prevent It:

  • Calor: Effect signature prevents network effects
  • C#: Interface boundaries for I/O, code review

3. Side Effect Transparency

Real Bug: A "utility" function writes to a log file, causing disk space issues.

Plain Text
// BAD: Hidden logging
public string Sanitize(string input) {
    Logger.Log($"Sanitizing: {input}"); // SIDE EFFECT
    return input.Replace("<", "");
}

// GOOD: Pure function
public string Sanitize(string input) {
    return input.Replace("<", "");
}

How Both Languages Prevent It:

  • Calor: Pure functions cannot have I/O effects
  • C#: [Pure] attribute, static methods with no dependencies

4. Cache Safety

Real Bug: Memoized function returns stale exchange rates because it secretly fetched live data.

Plain Text
// BAD: Not actually cacheable
[Memoized]
public decimal CalculatePrice(decimal base, int qty) {
    var rate = FetchExchangeRate(); // BREAKS MEMOIZATION
    return base * qty * rate;
}

// GOOD: Safe to memoize
[Memoized]
public decimal CalculatePrice(decimal base, int qty, decimal rate) {
    return base * qty * rate;
}

How Both Languages Prevent It:

  • Calor: Pure functions are safe to memoize by definition
  • C#: Careful parameter design, all inputs explicit

Scoring Methodology

ComponentWeightWhat It Measures
Functional Correctness50%Do all tests pass?
Bug Prevention50%Does the code produce deterministic results?

Outcome-Based Bug Prevention

Both languages achieve the same score when tests pass:

  • Tests Pass = Bug Prevention 1.0 for BOTH languages
  • Tests Fail = Bug Prevention 0.0

No bonus for:

  • Calor effect annotations (§E{})
  • C# [Pure] attributes
  • Any syntax-based patterns

What matters is the OUTCOME, not the mechanism.


Expected Results

CategoryResultAnalysis
Flaky Test PreventionTie (1.0x)Both solve with good practices
Security BoundariesTie (1.0x)Both can use interface boundaries
Side Effect TransparencyTie (1.0x)Both can write pure functions
Cache SafetyTie (1.0x)Both can design deterministic APIs
OverallTie (1.0x)Fair outcome-based comparison

Why Equal?

  • Both languages CAN write deterministic, side-effect-free code
  • The benchmark tests OUTCOMES (do tests pass?), not mechanisms
  • Calor's advantage is enforcement (compile-time errors), but both can achieve the same results

Non-Determinism Warnings

While scoring is outcome-based, the benchmark detects and warns about patterns that could cause issues:

PatternWarning
DateTime.Now/UtcNowTime-dependent code detected
new Random()Unseeded random detected
Guid.NewGuid()Non-deterministic GUID usage
HttpClient, WebRequestNetwork call detected
File.Read/WriteFile I/O detected
Console.WriteConsole output detected

These are informational only and do not affect scores.


Running the Benchmark

Bash
# Run all effect discipline tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline

# Run specific category
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
  --category flaky-test-prevention

# Sample 10 tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
  --sample 10 --verbose

Learn More