Skip to content

LeetCode DataSource

Status: Canonical Reference
Scope: src/leetcode_datasource/ - LeetCode problem data layer
Related: Package README

LeetCode DataSource provides a unified data layer for accessing LeetCode problem metadata with caching, persistent storage, and pluggable network fetching.


Table of Contents

  1. Overview
  2. Scope
  3. Interfaces
  4. How It Fits in the System
  5. Architecture
  6. Typical Workflows
  7. Key Design Decisions
  8. Data Directory Strategy
  9. Configuration
  10. Failure Modes and Constraints
  11. Related Documentation

Overview

LeetCode DataSource is the data foundation for the NeetCode practice framework. It provides:

  • Clean API for accessing LeetCode problem data
  • Multi-layer caching (memory β†’ SQLite β†’ network)
  • Problem index for ID resolution (frontend_id ↔ slug)
  • Structured data models for consistent access

Goals

Goal Description
Unified Access Single API for all problem data needs
Global Importability from leetcode_datasource import ... anywhere in repo
No sys.path Hacks Proper package setup via pyproject.toml
Clear Dependencies tools/ β†’ packages/ only, never reverse
Incremental Migration Works alongside existing tools

Non-Goals

Non-Goal Reason
❌ Replace tools/leetcode-api/ immediately Gradual migration planned
❌ Generate solution files Handled by codegen
❌ Execute tests Handled by runner/
❌ Implement testgen Future work

Scope

What this module handles

  • βœ… Fetching LeetCode question metadata (title, description, examples)
  • βœ… Caching for fast repeated access
  • βœ… SQLite storage for persistence
  • βœ… ID resolution (frontend_id ↔ slug)
  • βœ… Problem index synchronization
  • βœ… Structured data models

What this module explicitly avoids

  • ❌ Solution generation (handled by codegen)
  • ❌ Test execution (handled by runner/)
  • ❌ History management (handled by practice_workspace)
  • ❌ CLI interfaces (handled by tools/)

Interfaces

High-level summary of public APIs. For complete API reference, see Package README.

Interface Purpose
LeetCodeDataSource Main data source class
get_by_slug() Get question by URL slug
get_by_frontend_id() Get question by problem number
sync_problem_list() Sync problem index from LeetCode API
get_slug() / get_frontend_id() Quick ID lookups
Question Question data model
ProblemInfo Minimal problem metadata

How It Fits in the System

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Dependency Direction                      β”‚
β”‚                                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚   β”‚  tools/  β”‚ ──────► β”‚       src/               β”‚         β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚  └─ leetcode_datasource  β”‚         β”‚
β”‚        β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚        β”‚                      β”‚                              β”‚
β”‚        β”‚                      β–Ό                              β”‚
β”‚        β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚        └─────────────► β”‚   runner/    β”‚                     β”‚
β”‚                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚                                                              β”‚
β”‚   βœ… tools β†’ leetcode_datasource                            β”‚
β”‚   βœ… codegen β†’ leetcode_datasource                          β”‚
β”‚   ❌ leetcode_datasource β†’ tools  (FORBIDDEN)               β”‚
β”‚   ❌ leetcode_datasource β†’ runner (FORBIDDEN)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Relationships

Module Relationship
codegen Used by - Fetches problem metadata for generation
practice_workspace No direct dependency
runner No direct dependency
Tool Relationship
tools/leetcode-api/ Legacy API layer (uses SQLite cache)
tools/docstring/ HTML parser for docstrings (consumes this package)
tools/review-code/ Docstring fixer (consumes this package)

Architecture

Module Structure

src/leetcode_datasource/
β”œβ”€β”€ __init__.py              # Public API exports
β”œβ”€β”€ datasource.py            # LeetCodeDataSource main class
β”œβ”€β”€ config.py                # DataSourceConfig
β”œβ”€β”€ exceptions.py            # Custom exceptions
β”‚
β”œβ”€β”€ models/                  # Data models
β”‚   β”œβ”€β”€ question.py          # Question dataclass
β”‚   β”œβ”€β”€ problem_info.py      # ProblemInfo dataclass
β”‚   └── schema.py            # Schema versioning
β”‚
β”œβ”€β”€ storage/                 # Storage layer
β”‚   β”œβ”€β”€ cache.py             # Memory/JSON cache
β”‚   └── store.py             # SQLite persistent store
β”‚
β”œβ”€β”€ serialization/           # Serialization
β”‚   └── question_serializer.py
β”‚
└── fetchers/                # Network layer (pluggable)
    └── leetscrape_fetcher.py

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LeetCodeDataSource                        β”‚
β”‚                                                              β”‚
β”‚   get_by_slug("two-sum")                                    β”‚
β”‚         β”‚                                                    β”‚
β”‚         β–Ό                                                    β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚   Cache     β”‚ ──► β”‚   Store     β”‚ ──► β”‚   Fetcher   β”‚   β”‚
β”‚   β”‚  (memory)   β”‚     β”‚  (SQLite)   β”‚     β”‚  (network)  β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚                   β”‚                   β”‚            β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                             β”‚                                β”‚
β”‚                             β–Ό                                β”‚
β”‚                      Question object                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Database Schema

leetcode.sqlite3
β”œβ”€β”€ questions          # Full question data (~2224 rows)
β”‚   β”œβ”€β”€ id (PK)
β”‚   β”œβ”€β”€ qid            # frontend_question_id
β”‚   β”œβ”€β”€ titleSlug
β”‚   β”œβ”€β”€ title
β”‚   β”œβ”€β”€ Body, Code, Hints...
β”‚   └── ...
β”‚
└── problem_index      # ID mappings (~3778 rows)
    β”œβ”€β”€ frontend_question_id (PK)
    β”œβ”€β”€ title_slug (UNIQUE)
    β”œβ”€β”€ title
    β”œβ”€β”€ difficulty
    β”œβ”€β”€ paid_only
    └── ...

Note: problem_index contains minimal metadata for all problems; questions contains full details for fetched problems only.


Typical Workflows

Workflow: Fetch Question by Slug

ds.get_by_slug("two-sum")
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Check cache (memory)                                β”‚
β”‚     └── Hit? Return cached Question                     β”‚
β”‚                                                         β”‚
β”‚  2. Check store (SQLite)                                β”‚
β”‚     └── Hit? Cache it, return Question                  β”‚
β”‚                                                         β”‚
β”‚  3. Fetch from network (LeetScrape)                     β”‚
β”‚     └── Store to SQLite, cache it, return Question      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow: Resolve Frontend ID

ds.get_slug(frontend_id=1)
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Query problem_index table by frontend_question_id   β”‚
β”‚     └── Return title_slug                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow: Sync Problem Index

ds.sync_problem_list()
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Fetch from https://leetcode.com/api/problems/all/   β”‚
β”‚  2. UPSERT all problems to problem_index table          β”‚
β”‚  3. Update cache/problem_list.json (fallback)           β”‚
β”‚  4. Return count of problems synced                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions

Decision Rationale
SQLite as canonical store Single source of truth, indexed, transactional
No standalone mapping files All ID resolution through Store API
Pluggable fetcher design Allow future replacement without core changes
LeetScrape field names Compatibility with existing data
Schema versioning Forward/backward compatible data evolution
Lazy loading for Body Large fields loaded on access only

Problem Index Design

No standalone mapping files are considered canonical.
All identifier resolution must go through the Store-backed problem_index.

Decision Rationale
❌ No standalone JSON mapping Avoid bypassing, hard to maintain consistency
βœ… Unified in SQLite Single source of truth, indexable
βœ… Consistent API All ID resolution through Store

Data Directory Strategy

Principle

Runtime data stays outside repo and package directories

Data Type Location Git Status Can Delete?
Cache .neetcode/leetcode_datasource/cache/ gitignored βœ… Yes
Store .neetcode/leetcode_datasource/store/ optional ⚠️ Careful

Directory Structure

.neetcode/
└── leetcode_datasource/
    β”œβ”€β”€ cache/                        # Ephemeral, rebuildable
    β”‚   β”œβ”€β”€ problem_list.json         # Internal-only (sync fallback)
    β”‚   └── leetcode_cache_meta.json
    β”‚
    └── store/                        # Persistent (canonical)
        └── leetcode.sqlite3

Configuration

DataSourceConfig Options

Option Default Description
data_dir Auto-detected Root directory for data storage
cache_enabled True Enable memory/file cache
cache_ttl_hours 168 (1 week) Cache time-to-live
fetch_timeout 30 Network timeout in seconds
rate_limit_delay 0.5 Delay between requests

Data Directory Resolution Priority

  1. Explicit: DataSourceConfig(data_dir=Path("/custom"))
  2. Environment: NEETCODE_DATA_DIR=/path/to/data
  3. Repo Local: .neetcode/ in repo root
  4. platformdirs: ~/.local/share/neetcode/ (Linux)

Failure Modes and Constraints

Constraint Behavior
Question not found (cache + network) Raises QuestionNotFoundError
Network failure Raises NetworkError
Data parsing failure Raises ParseError
Configuration error Raises ConfigError
Cache error Non-fatal (logged, not raised)

Exception Hierarchy

LeetCodeDataSourceError (base)
β”œβ”€β”€ QuestionNotFoundError
β”œβ”€β”€ NetworkError
β”œβ”€β”€ ParseError
└── ConfigError

Document Content
Package README Quick reference, API details
CodeGen Spec Consumer of this data
Architecture Overview System architecture

Appendix: Data Model

Question Fields

Field Type Description
frontend_question_id int Problem number (1, 2, 922...)
titleSlug str URL slug ("two-sum")
title str Display title ("Two Sum")
difficulty str "Easy", "Medium", "Hard"
Body str HTML problem description
Code str Code template/stubs
Hints List[str] Hint strings
topicTags str Comma-separated tags

ProblemInfo Fields

Field Type Description
frontend_question_id int Problem number
title_slug str URL slug
title str Display title
difficulty str Difficulty level
paid_only bool Premium flag