Skip to content

Test Runner Specification

Status: Canonical Reference
Scope: runner/test_runner.py - Main test execution engine
Last Updated: January 9, 2026 12:55:38
Created: December 24, 2025 12:39:00
Related: Runner README (Quick Reference)

The Test Runner is the core testing engine for executing solutions against test cases. It supports multi-solution benchmarking, random test generation, and complexity estimation.


Quick Start

# Run default solution
python runner/test_runner.py 0001_two_sum

# Run specific solution method
python runner/test_runner.py 0023 --method heap

# Compare all solutions with timing
python runner/test_runner.py 0023 --all --benchmark

Command Reference

python runner/test_runner.py <problem> [OPTIONS]

Solution Selection

Option Description
(none) Run "default" solution
--method NAME Run specific solution
--all Run all solutions in SOLUTIONS

Test Generation

Option Description
--generate N Static tests + N generated cases
--generate-only N Skip static, generate N cases only
--seed N Reproducible generation
--save-failed Save failed cases to tests/

πŸ“– Requires generator file. See Generator Contract.

Analysis

Option Description
--benchmark Show execution time per case (includes memory metrics if psutil installed)
--estimate Estimate time complexity

πŸ“– --estimate requires generate_for_complexity(n) and pip install big-O.

Memory Profiling

Option Description
--memory-trace Show run-level memory traces (sparklines) per method
--trace-compare Multi-method memory comparison with ranking table
--memory-per-case Debug: Top-K cases by peak RSS

πŸ“– Memory profiling requires pip install psutil. Without it, memory columns show "Unavailable".

Other

Option Description
--tests-dir DIR Custom tests directory (default: tests)

Usage Examples

Basic Testing

python runner/test_runner.py 0001_two_sum
python runner/test_runner.py 0023 --method heap
python runner/test_runner.py 0023 --all --benchmark

Random Testing

python runner/test_runner.py 0215 --generate 10
python runner/test_runner.py 0215 --generate 10 --seed 12345
python runner/test_runner.py 0215 --generate 100 --save-failed

Complexity Estimation

python runner/test_runner.py 0322 --estimate
python runner/test_runner.py 0215 --all --estimate

This section shows actual output from various test runs to help you understand how to interpret results.

Example 1: Multi-Solution Benchmark (Trapping Rain Water)

Command:

python runner/test_runner.py 0042_trapping --all --benchmark

Output:

╔═════════════════════════════════════════╗
β•‘ 0042_trapping_rain_water - Performance  β•‘
╠═════════════════════════════════════════╣
β•‘ default:    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  106ms β•‘
β•‘ stack:      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  104ms β•‘
β•‘ twopointer: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  102ms β•‘
β•‘ dp:         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  100ms β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Method        Avg Time   Pass Rate  Complexity              Peak RSS
----------  ----------  ----------  --------------------  ----------
default       106.07ms         2/2  O(n) time, O(n) space      4.8MB
stack         103.54ms         2/2  O(n) time, O(n) space      4.7MB
twopointer    102.15ms         2/2  O(n) time, O(1) space      4.6MB
dp            100.35ms         2/2  O(n) time, O(n) space      4.6MB

How to Interpret: - Bar length is proportional to execution time (longest = full bar) - twopointer uses O(1) space while others use O(n) β€” a key insight for interviews - All approaches are O(n) time but have different constant factors


Example 2: Four Solutions Comparison (3Sum)

Command:

python runner/test_runner.py 0015_3sum --all --benchmark

Output:

╔═══════════════════════════════════════════╗
β•‘          0015_3sum - Performance          β•‘
╠═══════════════════════════════════════════╣
β•‘ default:      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  103ms β•‘
β•‘ two_pointers: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  109ms β•‘
β•‘ hashset:      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  102ms β•‘
β•‘ hash:         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  102ms β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Method          Avg Time   Pass Rate  Complexity
------------  ----------  ----------  ---------------------------------
default         102.81ms         3/3  O(nΒ²) time, O(1) extra space
two_pointers    108.52ms         3/3  O(nΒ²) time, O(1) extra space
hashset         102.21ms         3/3  O(nΒ²) time, O(n) space for set
hash            102.39ms         3/3  O(nΒ²) time, O(n) space

How to Interpret: - All four approaches have similar O(nΒ²) time complexity - hashset and hash trade space for simpler deduplication logic - Similar times indicate the test cases may be small β€” use --generate for stress testing


Example 3: Random Test Generation

Command:

python runner/test_runner.py 0215_kth_largest --generate 5 --seed 42

Output:

🎲 Generator: 5 cases, seed: 42

   --- tests/ (static) ---
   0215_kth_largest_element_in_an_array_1: βœ… PASS [judge]
   0215_kth_largest_element_in_an_array_2: βœ… PASS [judge]
   0215_kth_largest_element_in_an_array_3: βœ… PASS [judge]

   --- generators/ (5 cases, seed: 42) ---
   gen_1: βœ… PASS [generated]
   gen_2: βœ… PASS [generated]
   gen_3: βœ… PASS [generated]
   gen_4: βœ… PASS [generated]
   gen_5: βœ… PASS [generated]

   Result: 8 / 8 cases passed.
      β”œβ”€ Static: 3/3
      └─ Generated: 5/5

How to Interpret: - Static tests run first (from tests/ directory) - Generated tests use JUDGE_FUNC for validation (no expected output file) - The seed 42 makes tests reproducible β€” same seed = same test cases - Use --save-failed to capture failing generated cases for debugging


Example 4: Memory Trace Visualization

Command:

python runner/test_runner.py 0042_trapping --memory-trace

Output:

Memory Trace (Run-level RSS)

default:
β–β–‚β–ƒβ–ƒβ–„β–…β–†β–†β–‡β–ˆ
Peak 4.8MB | P95 4.8MB

How to Interpret: - Sparkline shows memory usage progression over test cases - Peak RSS is the maximum memory used across all runs - P95 RSS is the 95th percentile β€” useful for identifying outliers - Compare across methods with --trace-compare (requires multiple solutions)


Example 5: Complexity Estimation (O(n) vs O(nΒ²))

This is the most impressive demonstration β€” showing the dramatic difference between O(n) and O(nΒ²) algorithms.

Command:

python runner/test_runner.py 0011_container --all --estimate

Output (O(n) Two Pointers):

πŸ“Œ Estimating: two_pointers
   n=  500: 0.34ms
   n= 1000: 0.51ms
   n= 2000: 1.24ms
   n= 5000: 2.78ms

βœ… Estimated: O(n)
   Confidence: 1.00

Output (O(nΒ²) Brute Force):

πŸ“Œ Estimating: bruteforce
   n=  500: 554ms
   n= 1000: 2,544ms
   n= 2000: 10,697ms
   n= 5000: 68,291ms  ← 68 seconds!

βœ… Estimated: O(nΒ²)
   Confidence: 1.00

The Dramatic Difference:

n O(n) Two Pointers O(nΒ²) Brute Force Ratio
500 0.27ms 554ms 2,052x
1000 0.52ms 2,544ms 4,892x
5000 2.78ms 68,291ms 24,565x

How to Interpret: - O(n): Time doubles when n doubles (linear growth) - O(nΒ²): Time quadruples when n doubles (quadratic growth) - At n=5000, the O(nΒ²) algorithm is 1,818x slower - This is why algorithm complexity matters for large inputs!

Estimation Tips: - Works best when algorithm time dominates constant overhead - For fast algorithms, use larger n values (5000+) for accurate estimation - If estimated β‰  declared, the algorithm may have optimizations or the test sizes are too small


Output Format

Test Results

   case_1: βœ… PASS [exact]
   case_2: βœ… PASS (12.34ms) [judge]
   case_3: ❌ FAIL [exact]
      Expected: [0, 1]...
      Actual:   [1, 0]...
   case_4: ⚠️ SKIP (missing .out, no JUDGE_FUNC)

Multi-Solution Comparison

When running --all --benchmark, the test runner displays a visual bar chart followed by a detailed comparison table:

Visual Bar Chart with Approach Legend:

   ╔═══════════════════════════════════════════════════════════════════════════════╗
   β•‘                  0131_palindrome_partitioning - Performance                   β•‘
   ╠═══════════════════════════════════════════════════════════════════════════════╣
   β•‘ default: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  158ms                                          β•‘
   β•‘ naive:   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  152ms                                          β•‘
   β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

   default  β†’ Backtracking with DP-Precomputed Palindrome Table
   naive    β†’ Backtracking with On-the-Fly Checking

The bar length is proportional to execution time (longest time = full bar). The approach descriptions are shown in a legend below the chart, parsed from class header comments.

Enhanced Method Header:

──────────────────────────────────────────────────
πŸ“Œ Shorthand: default
   Approach: Backtracking with DP-Precomputed Palindrome Table
   Complexity: O(n Γ— 2^n) time, O(n^2) space
──────────────────────────────────────────────────

Note: On terminals that don't support Unicode, ASCII fallback characters are used.

Detailed Table:

======================================================================
Performance Comparison (Details)
======================================================================

Method         Avg Time   Pass Rate  Complexity
-----------  ----------  ----------  --------------------
default        158.17ms         2/2  O(n Γ— 2^n) time, O(n^2) space
naive          152.00ms         2/2  O(n Γ— 2^n Γ— n) time, O(n) space

default      β†’ Backtracking with DP-Precomputed Palindrome Table
naive        β†’ Backtracking with On-the-Fly Checking

======================================================================

The approach descriptions are shown in a legend below the table, matching the format used in the visual bar chart.

Multi-Solution Benchmark with Visual Charts

Use --all --benchmark to compare all solutions with visual performance charts:

python runner/test_runner.py 0215 --all --benchmark

This displays:

  1. Visual bar chart with execution times
  2. Approach legend (method β†’ approach name)
  3. Detailed table showing pass rate and complexity

Example Output:

   ╔════════════════════════════════════════════════════╗
   β•‘ 0215_kth_largest_element_in_an_array - Performance β•‘
   ╠════════════════════════════════════════════════════╣
   β•‘ default:     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  114ms           β•‘
   β•‘ quickselect: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘   96ms           β•‘
   β•‘ heap:        β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  107ms           β•‘
   β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

   default      β†’ Quickselect Algorithm
   quickselect  β†’ Quickselect Algorithm
   heap         β†’ Heap-Based Solution

======================================================================
Performance Comparison (Details)
======================================================================

Method         Avg Time   Pass Rate  Complexity
-----------  ----------  ----------  --------------------
default        113.51ms         3/3  O(n) average time, O(1) space
quickselect     96.06ms         3/3  O(n) average time, O(1) space
heap           107.34ms         3/3  O(n log k) time, O(k) space

default      β†’ Quickselect Algorithm
quickselect  β†’ Quickselect Algorithm
heap         β†’ Heap-Based Solution

======================================================================

Requirements for Complexity Estimation:

  • Generator must provide generate_for_complexity(n) function
  • Install pip install big-O package

Standalone Complexity Estimation

For complexity estimation without benchmark comparison:

python runner/test_runner.py 0239_sliding_window --estimate
πŸ“Œ Estimating: default

   πŸ“ˆ Running complexity estimation...
      Mode: Direct call (Mock stdin, no subprocess overhead)
      Sizes: [10, 20, 50, 100, 200, 500, 1000, 2000, 5000]
      Runs per size: 3
      n=  500: 0.32ms (avg of 3 runs)
      n= 1000: 0.69ms (avg of 3 runs)
      n= 2000: 1.12ms (avg of 3 runs)
      n= 5000: 2.78ms (avg of 3 runs)

   βœ… Estimated: O(n)
      Confidence: 1.00
      Details: Linear: time = 0.059 + 0.00054*n (sec)

More Complexity Examples

Problem Algorithm Declared Estimated Confidence
0239_sliding_window Monotonic Deque O(n) O(n) 1.00
0011_container (two_pointers) Two Pointers O(n) O(n) 1.00
0011_container (bruteforce) Brute Force O(nΒ²) O(nΒ²) 1.00
0042_trapping (twopointer) Two Pointers O(n) O(n) 1.00

Note: The estimator uses sizes up to n=5000, which provides accurate results for distinguishing O(n) from O(nΒ²). At n=5000, an O(nΒ²) algorithm takes ~24,000x longer than O(n)!


Validation Modes

Mode When Used
[judge] JUDGE_FUNC + .out exists
[judge-only] JUDGE_FUNC, no .out (generated tests)
[exact] Default string comparison
[sorted] COMPARE_MODE="sorted"
[set] COMPARE_MODE="set"
[skip] No .out, no JUDGE_FUNC

πŸ“– See Solution Contract Β§ Validation for JUDGE_FUNC and COMPARE_MODE details.


Troubleshooting

Error Fix
No test input files found Add tests/{problem}_*.in or use --generate
Solution method 'X' not found Check SOLUTIONS dict in solution file
Generator requires JUDGE_FUNC Add JUDGE_FUNC to solution
No generator found Create generators/{problem}.py
big-O package not installed pip install big-O

Complete Reference

All Options

Option Short Description
--method NAME -m Run specific solution
--all -a Run all solutions in SOLUTIONS
--benchmark -b Show execution time per case
--tests-dir DIR -t Custom tests directory (default: tests)
--generate N -g Static tests + N generated cases
--generate-only N β€” Skip static, generate N cases only
--seed N -s Reproducible generation
--save-failed β€” Save failed cases to tests/
--estimate -e Estimate time complexity

Advanced Combinations

# Full comparison: all methods, benchmarked, with generated tests
python runner/test_runner.py 0023 -a -b -g 50 -s 12345

# Stress test only (skip static tests)
python runner/test_runner.py 0023 --generate-only 100 --all

# Estimate complexity for all solutions
python runner/test_runner.py 0023 --all --estimate

# Full benchmark with complexity estimation (visual charts)
python runner/test_runner.py 0215 --all --benchmark --estimate

# Debug failed case with saved input
python runner/test_runner.py 0023 --generate 100 --save-failed

Output Details

Failed Generated Case Box:

   gen_3: ❌ FAIL [generated]
      β”Œβ”€ Input ─────────────────────────────────
      β”‚ [1,3,5,7]
      β”‚ [2,4,6,8]
      β”œβ”€ Actual ────────────────────────────────
      β”‚ 4.5
      └─────────────────────────────────────────
      πŸ’Ύ Saved to: tests/0004_failed_1.in

Reproduction Hint (when using --seed):

πŸ’‘ To reproduce: python runner/test_runner.py 0004 --generate 10 --seed 12345

Summary Breakdown (static + generated):

Summary: 15 / 15 cases passed.
   β”œβ”€ Static (tests/): 5/5
   └─ Generated: 10/10

Internal Behaviors

Behavior Description
Failed file exclusion Files matching *_failed_*.in are excluded from normal test runs
Legacy mode When no SOLUTIONS dict exists, runs single default solution
Exit codes Exits with code 1 on missing tests, invalid method, or missing generator

Case Runner

case_runner.py runs a single test case without comparison β€” ideal for debugging.

python runner/case_runner.py <problem> <case_number>

Example:

python runner/case_runner.py 0001_two_sum 1

This runs solutions/0001_two_sum.py with input from tests/0001_two_sum_1.in and displays output directly (no pass/fail comparison).


VSCode Integration

Pre-configured tasks and debug configurations are provided in .vscode/.

  • Ctrl+Shift+B: Run all tests for current problem (default build task)
  • F5: Debug with breakpoints

πŸ“– See VSCode Setup Guide for complete task/debug configuration reference.


Architecture

test_runner.py (CLI)
β”œβ”€β”€ module_loader.py      # Load solution/generator modules
β”œβ”€β”€ executor.py           # Execute test cases
β”œβ”€β”€ reporter.py           # Format results
β”œβ”€β”€ compare.py            # Output validation
└── complexity_estimator.py  # Big-O estimation

Execution Methods

The test runner supports two execution methods:

Use the project's virtual environment for isolated dependencies:

# Windows (PowerShell/CMD)
leetcode\Scripts\python.exe runner/test_runner.py 0023 --all --benchmark

# Linux/macOS
./leetcode/bin/python runner/test_runner.py 0023 --all --benchmark

Method 2: System Python

Use system Python directly (requires dependencies installed globally):

python runner/test_runner.py 0023 --all --benchmark

Dependencies

Required

  • Python 3.11 (matching LeetCode official environment)
  • Solution files in solutions/
  • Test files in tests/ (or use generators)

Optional Packages

Package Feature Install
big-O Complexity estimation (--estimate) pip install big-O
psutil RSS memory profiling (--memory-trace, --trace-compare, --memory-per-case) pip install psutil
sparklines Memory trace visualization (sparkline charts) pip install sparklines
tabulate CLI table formatting pip install tabulate

Install all optional packages:

pip install big-O psutil sparklines tabulate

Memory Measurement Types

Type Source Method Description
RSS Static/Generated tests psutil (subprocess) Full process memory including interpreter
Alloc --estimate runs tracemalloc (in-process) Python allocations only

Note: RSS and Alloc metrics are displayed separately in --memory-per-case output because they measure different things and are not directly comparable.

Graceful Degradation

Missing Package Behavior
big-O --estimate ignored, complexity shown as "Unknown"
psutil RSS memory columns show "Unavailable", warning displayed
sparklines Falls back to simple ASCII visualization
tabulate Falls back to manual column formatting

Document Content
Test File Format Canonical .in/.out format specification
Solution Contract SOLUTIONS, JUDGE_FUNC, COMPARE_MODE, file structure
Generator Contract generate(), generate_for_complexity(), edge cases
Runner README Quick reference (in-module)
VSCode Setup Guide Tasks, debug configurations, workflow examples

Documentation Maintenance

When modifying test_runner.py:

  1. Update this spec (docs/runner/README.md)
  2. Update quick reference (runner/README.md)
  3. Update docstring (runner/test_runner.py)

Maintainer: See Contributors