Skip to content

CodeGen

Status: Canonical Reference
Scope: src/codegen/ - Solution skeleton generation with test extraction
Related: Package README

CodeGen generates solution and practice skeleton files for LeetCode problems, providing the infrastructure needed for a LeetCode-like practice experience. It also extracts example test cases from problem descriptions and validates test file consistency.


Table of Contents

  1. Overview
  2. Scope
  3. Interfaces
  4. CLI Reference
  5. How It Fits in the System
  6. Typical Workflows
  7. Test Generation
  8. IO Schema Inference
  9. Format Migration
  10. Key Design Decisions
  11. Configuration
  12. Failure Modes and Constraints
  13. Related Documentation

Overview

CodeGen serves as the code generation engine for the NeetCode practice framework. Its primary purpose is to:

  • Generate reference skeleton files to solutions/
  • Generate practice skeleton files to practices/
  • Extract example test cases from LeetCode problem descriptions
  • Provide a consistent structure that integrates with runner/ for testing

The module is designed as a stateless generator - it produces output based on input without maintaining internal state.

Goals

Goal Description
Reference Generation Generate solution skeletons conforming to Solution Contract
Practice Generation Generate practice skeletons that reuse reference infrastructure
Test Extraction Extract example input/output from LeetCode HTML
solve() Inference Auto-generate solve() based on method signature
Focus on Solution Users only write class Solution; infrastructure is provided
Reusable Components solution_header, Helper Catalog available for other modules

Non-Goals

Non-Goal Reason
❌ Auto-generate complete solutions Only generates skeleton; users implement solutions
❌ Execute tests Handled by runner/
❌ Manage practice history Handled by practice_workspace
❌ Fetch problem data Uses leetcode_datasource

Scope

What this module handles

  • βœ… Rendering file-level docstrings (solution_header)
  • βœ… Parsing LeetCode code stubs
  • βœ… Detecting and emitting helper classes (ListNode, TreeNode, etc.)
  • βœ… Assembling complete module files
  • βœ… Generating SOLUTIONS dict structure
  • βœ… Creating solve() interface (placeholder or inferred)
  • βœ… Extracting examples from HTML (example_parser)
  • βœ… Inferring IO schema from signatures (io_schema)
  • βœ… Generating test files (test_generator)
  • βœ… Checking test consistency (checker)
  • βœ… Migrating test formats (migrator)

What this module explicitly avoids

  • ❌ Test execution (handled by runner/)
  • ❌ Practice file versioning (handled by practice_workspace)
  • ❌ Network requests for problem data (handled by leetcode_datasource)
  • ❌ CLI argument parsing for tools (handled by tools/)

Interfaces

High-level summary of public APIs. For complete API reference, see Package README.

Core Generation

Interface Purpose
generate_reference_skeleton() Generate skeleton to solutions/
generate_practice_skeleton() Generate skeleton to practices/
render_solution_header() Render file-level docstring
parse_code_stub() Parse LeetCode code stub
assemble_module() Assemble complete file from parts
detect_required_helpers() Detect needed helper classes

Test Generation

Interface Purpose
generate_tests_from_datasource() Generate .in/.out files from examples
parse_examples() Extract examples from HTML
infer_io_schema() Infer IO format from signature
generate_solve_function() Auto-generate solve() code

Validation & Migration

Interface Purpose
TestChecker Check test consistency
migrate_problem() Migrate single problem's tests
migrate_all() Migrate all tests

CLI Reference

Generate Reference Skeleton

# Basic generation
python -m codegen new <problem_id>

# With test files from examples
python -m codegen new <problem_id> --with-tests

# With auto-generated solve()
python -m codegen new <problem_id> --solve-mode infer

# Combined
python -m codegen new <problem_id> --with-tests --solve-mode infer --force

# Preview without writing
python -m codegen new <problem_id> --dry-run
Flag Description
--with-tests Generate .in/.out files from LeetCode examples
--solve-mode placeholder (default), infer (auto-generate), or tiered (for Tier-1/1.5 problems)
--force Overwrite existing test files
--dry-run Preview without writing files
--header-level minimal, standard, or full

Generate Practice Skeleton

python -m codegen practice <problem_id>
python -m codegen practice <problem_id> --all-solutions

Check Test Consistency

# Check single problem
python -m codegen check <problem_id>
python -m codegen check <problem_id> -v

# Check all problems
python -m codegen check --all
python -m codegen check --all --limit 10

# JSON output
python -m codegen check --all --report json
Status Meaning
match Test files match examples
mismatch Test files differ from parsed examples
missing_tests No test files exist
parse_error Could not parse examples from HTML

Migrate Test Format

# Preview migration
python -m codegen migrate <problem_id> --dry-run -v

# Migrate single problem
python -m codegen migrate <problem_id>

# Migrate all problems
python -m codegen migrate --all --dry-run

# Migrate without backup
python -m codegen migrate --all --no-backup

How It Fits in the System

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  leetcode_datasource   β”‚  ← Problem metadata + HTML
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       codegen          β”‚ ──► β”‚   practice_workspace  β”‚
β”‚                        β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚              β”‚
β”‚  β”‚ test_generator   β”‚  β”‚              β”‚ manages history
β”‚  β”‚ solve_generator  β”‚  β”‚              β–Ό
β”‚  β”‚ checker          β”‚  β”‚         practices/_history/
β”‚  β”‚ migrator         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚ generates
            β–Ό
       solutions/
       practices/
       tests/
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        runner          β”‚  ← Executes tests
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Relationships

Module Relationship
leetcode_datasource Uses - Fetches problem metadata and HTML
practice_workspace Uses - Calls save_to_history() when practice exists
runner Used by - Runs generated files and tests
tools/ Used by - CLI wrappers invoke codegen

Typical Workflows

Workflow: Generate Reference with Tests

When codegen new <problem_id> --with-tests is invoked:

  1. Check existence - If solutions/<id>_<slug>.py exists, stop
  2. Fetch metadata - Get problem info and HTML from leetcode_datasource
  3. Parse stub - Extract method signature, parameters, return type
  4. Detect helpers - Determine if ListNode, TreeNode, etc. are needed
  5. Infer IO schema - Map parameter types to input formats
  6. Generate solve() - Based on --solve-mode (placeholder or infer)
  7. Assemble module - Combine header, imports, helpers, SOLUTIONS, Solution, solve()
  8. Write solution - Output to solutions/<id>_<slug>.py
  9. Parse examples - Extract examples from HTML
  10. Generate tests - Create .in/.out files for each example

Workflow: Check and Migrate

# 1. Check current state
python -m codegen check --all

# 2. Preview migration
python -m codegen migrate --all --dry-run

# 3. Migrate with backup
python -m codegen migrate --all

# 4. Verify
python -m codegen check --all

Test Generation

Canonical Test Format

All generated test files use the JSON literal format:

Input File (.in):

[2,7,11,15]
9

Output File (.out):

[0,1]

Format Rules

Type Format Example
Integer Plain number 42
Float Plain number 3.14
Boolean Lowercase true, false
String Quoted "hello"
Array JSON literal [1,2,3]
2D Array JSON literal [[1,2],[3,4]]

Type Support Tiers

Tier Types solve() Generation
Tier 0 int, str, List[int], List[str] βœ… Fully auto-generated
Tier 1 List[List[int]], float βœ… Fully auto-generated
Tier 2 ListNode, TreeNode ⚠️ Placeholder with TODOs

IO Schema Inference

Data Flow

Question.Code (stub) 
  β†’ parse_code_stub() β†’ StubInfo 
  β†’ infer_io_schema() β†’ IOSchema
  β†’ generate_solve_function() β†’ solve() code

IOSchema Structure

@dataclass
class IOSchema:
    method_name: str
    params: List[ParamSchema]  # [(name, type, format, separators)]
    return_type: str
    return_format: ParamFormat  # SCALAR, ARRAY_1D, ARRAY_2D, etc.
    needs_helpers: Set[str]     # {"ListNode", "TreeNode"}

ParamFormat Types

Format Type Hints Description
SCALAR int, float, bool Single value
STRING str String value
ARRAY_1D List[int], List[str] 1D array
ARRAY_2D List[List[int]] 2D matrix
LINKED_LIST Optional[ListNode] Linked list
TREE Optional[TreeNode] Binary tree

Format Migration

Purpose

The migrator converts existing test files from legacy formats (space-separated, comma-separated) to the canonical JSON literal format.

Detected Formats

Format Example Converted To
space_sep 1 2 3 4 [1,2,3,4]
comma_sep 1,2,3,4 [1,2,3,4]
canonical [1,2,3,4] (no change)

Migration Report

============================================================
MIGRATION REPORT
============================================================
Problems processed: 45
Total files: 218
  Migrated: 93
  Skipped (already canonical): 125
  Errors: 0

Key Design Decisions

Decision Rationale
Stateless design CodeGen has no internal state; outputs depend purely on inputs
Parser doesn't guess stub_parser.py only parses; detection logic is separate
Centralized assembly assemble.py handles file composition to avoid duplication
Inline helpers by default Helper classes embedded in file for portability
No template engine Pure Python string composition; no Jinja2 dependency
Reuse over regenerate Practice skeletons reuse reference infrastructure when available
JSON literal format Unambiguous, parseable, compatible with LeetCode examples
Tiered type support Start with simple types, add complex types incrementally

Design Philosophy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  codegen = stateless      β†’ Only generates, no state    β”‚
β”‚  workspace = stateful     β†’ Manages history/restore     β”‚
β”‚  runner = execution       β†’ Runs tests, no generation   β”‚
β”‚                                                         β”‚
β”‚  stub_parser: parse only  β†’ Separation of concerns      β”‚
β”‚  io_schema: infer format  β†’ Type-driven generation      β”‚
β”‚  helpers: centralized     β†’ Single source of truth      β”‚
β”‚  assemble.py: unified     β†’ Avoid duplication           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

Config File Location

.neetcode/codegen.toml

Configuration Options

Setting Default Description
header.level "full" Header detail: minimal, standard, full
helpers.mode "inline" Helper emit: inline, import, none
skeleton.solve_mode "placeholder" solve() mode: placeholder, infer, tiered
practice.multi_solution_mode "single" Practice mode: single, all

Priority Order

CLI flag > .neetcode/codegen.toml > package defaults

Failure Modes and Constraints

Constraint Behavior
Problem not found Raises exception from leetcode_datasource
Reference already exists Returns early with message (for codegen new)
Invalid code stub Raises ParseError with details
Missing TOML config Uses defaults
Example parse failure Skips example, logs warning, continues
Test file exists Skips (unless --force specified)
Unsupported type Generates placeholder solve() with TODOs

Exit Codes

Code Condition
0 Success
1 Metadata fetch failed or validation error
2 --strict-tests enabled + 0 tests generated (reserved)

Document Content
Package README Quick reference, API details
Solution Contract Output file requirements
LeetCode DataSource Problem data source
Practice Workspace History management
Test Generation Spec Feature specification

Appendix: Output File Structure

Reference skeleton output follows the Solution Contract:

"""
Problem: Two Sum
Link: https://leetcode.com/problems/two-sum/
...
"""
from typing import List, Optional
from _runner import get_solver

# Helper classes (if detected)
class ListNode:
    ...

# SOLUTIONS dict
SOLUTIONS = {
    "default": {
        "class": "Solution",
        "method": "twoSum",
        "complexity": "TODO: O(?)",
        "description": "TODO: describe your approach",
    },
}

# Solution class
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        # TODO: Implement your solution
        pass

# solve() interface (auto-generated with --solve-mode infer)
def solve():
    """
    Input format (JSON literal, one per line):
        nums: List[int]
        target: int

    Output: List[int]
    """
    import sys
    import json

    data = sys.stdin.read().strip().split('\n')

    nums = json.loads(data[0].strip())
    target = int(data[1].strip())

    solver = get_solver(SOLUTIONS)
    result = solver.twoSum(nums, target)

    print(json.dumps(result, separators=(',', ':')))


if __name__ == "__main__":
    solve()

Appendix: Module Structure

codegen/
β”œβ”€β”€ __init__.py              # Public API re-exports
β”œβ”€β”€ __main__.py              # python -m codegen
β”œβ”€β”€ cli.py                   # CLI: new / practice / check / migrate
β”œβ”€β”€ checker.py               # Test consistency checker
β”œβ”€β”€ analyzer.py              # Mismatch analysis and reporting
β”œβ”€β”€ migrator.py              # Format migration tool
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ solution_header.py   # Header rendering
β”‚   β”œβ”€β”€ stub_parser.py       # LeetCode stub parsing
β”‚   β”œβ”€β”€ assemble.py          # Module assembly
β”‚   β”œβ”€β”€ config.py            # Configuration management
β”‚   β”œβ”€β”€ io_schema.py         # IO format inference
β”‚   β”œβ”€β”€ example_parser.py    # HTML example extraction
β”‚   β”œβ”€β”€ solve_generator.py   # solve() auto-generation
β”‚   β”œβ”€β”€ tiered_solve_generator.py  # Tiered solve() for Tier-1/1.5 problems
β”‚   β”œβ”€β”€ problem_support.py   # Problem-specific config loading
β”‚   β”œβ”€β”€ test_generator.py    # Test file generation
β”‚   β”œβ”€β”€ catalog/             # Problem catalog utilities
β”‚   β”‚   └── __init__.py
β”‚   └── helpers/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ catalog.py       # Canonical helper definitions
β”‚       β”œβ”€β”€ detect.py        # Helper detection logic
β”‚       └── emit.py          # Helper code emission
β”œβ”€β”€ reference/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── generator.py         # Reference skeleton generation
└── practice/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ generator.py         # Practice skeleton generation
    └── reuse.py             # Reuse from reference

Tiered Solve Generation

For problems involving complex types (TreeNode, ListNode), use --solve-mode tiered:

python -m codegen new 104 --solve-mode tiered

This generates solve() functions with codec support for serialization/deserialization of complex types. See Problem Support Boundary for tier definitions.