Effect Discipline
Effect Discipline Benchmark
The Effect Discipline benchmark measures how well code prevents real-world bugs caused by hidden side effects.
Fair Outcome-Based Scoring
This benchmark uses outcome-based scoring to ensure fair comparison between Calor and C#:
| Scoring Principle | Description |
|---|---|
| Tests Pass = Full Score | If all tests pass, both languages score 1.0 |
| No Syntax Bias | No bonus for §E{} or [Pure] - only outcomes matter |
| Equal Opportunity | Both languages CAN achieve determinism with good practices |
Why This Matters
These are actual bugs that have caused production incidents:
| Bug Type | Real-World Impact | How Often |
|---|---|---|
| Flaky Tests | Random CI failures, wasted developer time | Daily |
| Security Violations | Data leaks, compliance failures | Critical |
| Hidden Side Effects | "Why did this happen?" debugging | Common |
| Cache Bugs | Wrong results, hard to reproduce | Subtle |
The Four Categories
1. Flaky Test Prevention
Real Bug: Tests pass locally but fail in CI because they use DateTime.Now.
// BAD: Non-deterministic
public string GenerateReport(string title) {
return $"Report: {title} (Generated: {DateTime.Now})";
}
// GOOD: Deterministic
public string GenerateReport(string title, long timestamp) {
return $"Report: {title} (Generated: {timestamp})";
}How Both Languages Prevent It:
- Calor: Effect signature prevents time effects at compile time
- C#: Pass dependencies as parameters (DI pattern), use
ITimeProvider
2. Security Boundaries
Real Bug: A config file parser secretly phones home to validate licenses.
// BAD: Hidden network call
public Config ParseConfig(string json) {
ValidateLicenseOnline(); // SECURITY VIOLATION
return JsonParse(json);
}
// GOOD: Offline only
public Config ParseConfig(string json) {
return JsonParse(json);
}How Both Languages Prevent It:
- Calor: Effect signature prevents network effects
- C#: Interface boundaries for I/O, code review
3. Side Effect Transparency
Real Bug: A "utility" function writes to a log file, causing disk space issues.
// BAD: Hidden logging
public string Sanitize(string input) {
Logger.Log($"Sanitizing: {input}"); // SIDE EFFECT
return input.Replace("<", "");
}
// GOOD: Pure function
public string Sanitize(string input) {
return input.Replace("<", "");
}How Both Languages Prevent It:
- Calor: Pure functions cannot have I/O effects
- C#:
[Pure]attribute, static methods with no dependencies
4. Cache Safety
Real Bug: Memoized function returns stale exchange rates because it secretly fetched live data.
// BAD: Not actually cacheable
[Memoized]
public decimal CalculatePrice(decimal base, int qty) {
var rate = FetchExchangeRate(); // BREAKS MEMOIZATION
return base * qty * rate;
}
// GOOD: Safe to memoize
[Memoized]
public decimal CalculatePrice(decimal base, int qty, decimal rate) {
return base * qty * rate;
}How Both Languages Prevent It:
- Calor: Pure functions are safe to memoize by definition
- C#: Careful parameter design, all inputs explicit
Scoring Methodology
| Component | Weight | What It Measures |
|---|---|---|
| Functional Correctness | 50% | Do all tests pass? |
| Bug Prevention | 50% | Does the code produce deterministic results? |
Outcome-Based Bug Prevention
Both languages achieve the same score when tests pass:
- Tests Pass = Bug Prevention 1.0 for BOTH languages
- Tests Fail = Bug Prevention 0.0
No bonus for:
- Calor effect annotations (
§E{}) - C#
[Pure]attributes - Any syntax-based patterns
What matters is the OUTCOME, not the mechanism.
Expected Results
| Category | Result | Analysis |
|---|---|---|
| Flaky Test Prevention | Tie (1.0x) | Both solve with good practices |
| Security Boundaries | Tie (1.0x) | Both can use interface boundaries |
| Side Effect Transparency | Tie (1.0x) | Both can write pure functions |
| Cache Safety | Tie (1.0x) | Both can design deterministic APIs |
| Overall | Tie (1.0x) | Fair outcome-based comparison |
Why Equal?
- Both languages CAN write deterministic, side-effect-free code
- The benchmark tests OUTCOMES (do tests pass?), not mechanisms
- Calor's advantage is enforcement (compile-time errors), but both can achieve the same results
Non-Determinism Warnings
While scoring is outcome-based, the benchmark detects and warns about patterns that could cause issues:
| Pattern | Warning |
|---|---|
DateTime.Now/UtcNow | Time-dependent code detected |
new Random() | Unseeded random detected |
Guid.NewGuid() | Non-deterministic GUID usage |
HttpClient, WebRequest | Network call detected |
File.Read/Write | File I/O detected |
Console.Write | Console output detected |
These are informational only and do not affect scores.
Running the Benchmark
# Run all effect discipline tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline
# Run specific category
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
--category flaky-test-prevention
# Sample 10 tasks
dotnet run --project tests/Calor.Evaluation -- effect-discipline \
--sample 10 --verboseLearn More
- Safety Benchmark - Contract enforcement
- Correctness Benchmark - Edge case handling
- Methodology - How benchmarks work