LeetCode DataSource Status : Canonical Reference Scope : src/leetcode_datasource/ - LeetCode problem data layer Related : Package README
LeetCode DataSource provides a unified data layer for accessing LeetCode problem metadata with caching, persistent storage, and pluggable network fetching.
Table of Contents Overview Scope Interfaces How It Fits in the System Architecture Typical Workflows Key Design Decisions Data Directory Strategy Configuration Failure Modes and Constraints Related Documentation Overview LeetCode DataSource is the data foundation for the NeetCode practice framework. It provides:
Clean API for accessing LeetCode problem data Multi-layer caching (memory β SQLite β network) Problem index for ID resolution (frontend_id β slug) Structured data models for consistent access Goals Goal Description Unified Access Single API for all problem data needs Global Importability from leetcode_datasource import ... anywhere in repo No sys.path Hacks Proper package setup via pyproject.toml Clear Dependencies tools/ β packages/ only, never reverse Incremental Migration Works alongside existing tools
Non-Goals Non-Goal Reason β Replace tools/leetcode-api/ immediately Gradual migration planned β Generate solution files Handled by codegen β Execute tests Handled by runner/ β Implement testgen Future work
Scope What this module handles β
Fetching LeetCode question metadata (title, description, examples) β
Caching for fast repeated access β
SQLite storage for persistence β
ID resolution (frontend_id β slug) β
Problem index synchronization β
Structured data models What this module explicitly avoids β Solution generation (handled by codegen) β Test execution (handled by runner/) β History management (handled by practice_workspace) β CLI interfaces (handled by tools/) Interfaces High-level summary of public APIs. For complete API reference, see Package README .
Interface Purpose LeetCodeDataSource Main data source class get_by_slug() Get question by URL slug get_by_frontend_id() Get question by problem number sync_problem_list() Sync problem index from LeetCode API get_slug() / get_frontend_id() Quick ID lookups Question Question data model ProblemInfo Minimal problem metadata
How It Fits in the System βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Dependency Direction β
β β
β ββββββββββββ ββββββββββββββββββββββββββββ β
β β tools/ β βββββββΊ β src/ β β
β ββββββββββββ β ββ leetcode_datasource β β
β β ββββββββββββββββββββββββββββ β
β β β β
β β βΌ β
β β ββββββββββββββββ β
β βββββββββββββββΊ β runner/ β β
β ββββββββββββββββ β
β β
β β
tools β leetcode_datasource β
β β
codegen β leetcode_datasource β
β β leetcode_datasource β tools (FORBIDDEN) β
β β leetcode_datasource β runner (FORBIDDEN) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Module Relationships Module Relationship codegen Used by - Fetches problem metadata for generation practice_workspace No direct dependency runner No direct dependency
Tool Relationship tools/leetcode-api/ Legacy API layer (uses SQLite cache) tools/docstring/ HTML parser for docstrings (consumes this package) tools/review-code/ Docstring fixer (consumes this package)
Architecture Module Structure src/leetcode_datasource/
βββ __init__.py # Public API exports
βββ datasource.py # LeetCodeDataSource main class
βββ config.py # DataSourceConfig
βββ exceptions.py # Custom exceptions
β
βββ models/ # Data models
β βββ question.py # Question dataclass
β βββ problem_info.py # ProblemInfo dataclass
β βββ schema.py # Schema versioning
β
βββ storage/ # Storage layer
β βββ cache.py # Memory/JSON cache
β βββ store.py # SQLite persistent store
β
βββ serialization/ # Serialization
β βββ question_serializer.py
β
βββ fetchers/ # Network layer (pluggable)
βββ leetscrape_fetcher.py
Data Flow βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LeetCodeDataSource β
β β
β get_by_slug("two-sum") β
β β β
β βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Cache β βββΊ β Store β βββΊ β Fetcher β β
β β (memory) β β (SQLite) β β (network) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β β
β βββββββββββββββββββββ΄ββββββββββββββββββββ β
β β β
β βΌ β
β Question object β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Database Schema leetcode.sqlite3
βββ questions # Full question data (~2224 rows)
β βββ id (PK)
β βββ qid # frontend_question_id
β βββ titleSlug
β βββ title
β βββ Body, Code, Hints...
β βββ ...
β
βββ problem_index # ID mappings (~3778 rows)
βββ frontend_question_id (PK)
βββ title_slug (UNIQUE)
βββ title
βββ difficulty
βββ paid_only
βββ ...
Note : problem_index contains minimal metadata for all problems; questions contains full details for fetched problems only.
Typical Workflows Workflow: Fetch Question by Slug ds.get_by_slug("two-sum")
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Check cache (memory) β
β βββ Hit? Return cached Question β
β β
β 2. Check store (SQLite) β
β βββ Hit? Cache it, return Question β
β β
β 3. Fetch from network (LeetScrape) β
β βββ Store to SQLite, cache it, return Question β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Workflow: Resolve Frontend ID ds.get_slug(frontend_id=1)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Query problem_index table by frontend_question_id β
β βββ Return title_slug β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Workflow: Sync Problem Index ds.sync_problem_list()
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Fetch from https://leetcode.com/api/problems/all/ β
β 2. UPSERT all problems to problem_index table β
β 3. Update cache/problem_list.json (fallback) β
β 4. Return count of problems synced β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Design Decisions Decision Rationale SQLite as canonical store Single source of truth, indexed, transactional No standalone mapping files All ID resolution through Store API Pluggable fetcher design Allow future replacement without core changes LeetScrape field names Compatibility with existing data Schema versioning Forward/backward compatible data evolution Lazy loading for Body Large fields loaded on access only
Problem Index Design No standalone mapping files are considered canonical. All identifier resolution must go through the Store-backed problem_index.
Decision Rationale β No standalone JSON mapping Avoid bypassing, hard to maintain consistency β
Unified in SQLite Single source of truth, indexable β
Consistent API All ID resolution through Store
Data Directory Strategy Principle Runtime data stays outside repo and package directories
Data Type Location Git Status Can Delete? Cache .neetcode/leetcode_datasource/cache/ gitignored β
Yes Store .neetcode/leetcode_datasource/store/ optional β οΈ Careful
Directory Structure .neetcode/
βββ leetcode_datasource/
βββ cache/ # Ephemeral, rebuildable
β βββ problem_list.json # Internal-only (sync fallback)
β βββ leetcode_cache_meta.json
β
βββ store/ # Persistent (canonical)
βββ leetcode.sqlite3
Configuration DataSourceConfig Options Option Default Description data_dir Auto-detected Root directory for data storage cache_enabled True Enable memory/file cache cache_ttl_hours 168 (1 week) Cache time-to-live fetch_timeout 30 Network timeout in seconds rate_limit_delay 0.5 Delay between requests
Data Directory Resolution Priority Explicit : DataSourceConfig(data_dir=Path("/custom")) Environment : NEETCODE_DATA_DIR=/path/to/data Repo Local : .neetcode/ in repo root platformdirs : ~/.local/share/neetcode/ (Linux) Failure Modes and Constraints Constraint Behavior Question not found (cache + network) Raises QuestionNotFoundError Network failure Raises NetworkError Data parsing failure Raises ParseError Configuration error Raises ConfigError Cache error Non-fatal (logged, not raised)
Exception Hierarchy LeetCodeDataSourceError (base)
βββ QuestionNotFoundError
βββ NetworkError
βββ ParseError
βββ ConfigError
Appendix: Data Model Question Fields Field Type Description frontend_question_id int Problem number (1, 2, 922...) titleSlug str URL slug ("two-sum") title str Display title ("Two Sum") difficulty str "Easy", "Medium", "Hard" Body str HTML problem description Code str Code template/stubs Hints List[str] Hint strings topicTags str Comma-separated tags
ProblemInfo Fields Field Type Description frontend_question_id int Problem number title_slug str URL slug title str Display title difficulty str Difficulty level paid_only bool Premium flag
January 9, 2026 19:05:52 December 31, 2025 11:05:44