Docstring Formatter¶
Status: Canonical Reference
Scope: Docstring formatter module in tools/docstring/
Last Updated: December 28, 2025 20:39:00
Created: December 28, 2025 19:58:33
Purpose: Transform LeetCode Question objects into structured docstring specifications
Location:tools/docstring/formatter.py
This module extracts and normalizes docstring-relevant content from LeetCode problems, including descriptions, examples, constraints, topics, hints, and follow-up questions. It provides a unified API for generating file-level docstrings according to the project specification.
Features¶
- HTML Parsing: Extracts structured content from LeetCode HTML problem descriptions
- Multiple Formats: Handles both new and legacy LeetCode HTML formats
- Complete Data: Extracts all docstring components (description, examples, constraints, topics, hints, follow-ups, notes)
- Normalized Output: Produces spec-aligned, consistent format for docstring generation
- Cache Integration: Uses SQLite-backed cache from
tools/leetcode-apifor performance - Backward Compatible: Maintains legacy API for existing code
Quick Start¶
Basic Usage¶
from tools.docstring.formatter import get_full_docstring_data
# Get all docstring data for a problem
data = get_full_docstring_data("two-sum")
# Access structured components
print(data['title']) # "Two Sum"
print(data['description']) # List of description lines
print(data['examples']) # List of example dicts
print(data['constraints']) # List of constraint lines
print(data['topics']) # "Array, Hash Table"
print(data['hints']) # ["Hint 1: ...", "Hint 2: ..."]
print(data['follow_ups']) # ["Follow-up question text"]
print(data['note']) # Optional note string or None
Legacy API (Backward Compatible)¶
from tools.docstring.formatter import get_description_and_constraints
# Get only description and constraints
desc, constraints = get_description_and_constraints("two-sum")
Direct Question Access¶
from tools.docstring.formatter import get_question_data
# Get complete Question object
question = get_question_data("two-sum")
print(question.title)
print(question.difficulty)
print(question.Body) # Raw HTML
API Reference¶
get_full_docstring_data(slug: str) -> dict¶
Get all data needed for generating a complete file-level docstring.
Parameters: - slug (str): LeetCode problem slug (e.g., "two-sum")
Returns: Dictionary with the following structure:
{
'title': str, # Problem title (e.g., "Two Sum")
'url': str, # LeetCode problem URL
'description': List[str], # List of description lines (plain text)
'examples': List[dict], # List of example dictionaries
'constraints': List[str], # List of constraint lines (each starts with "- ")
'topics': str, # Formatted topics (e.g., "Array, Hash Table")
'hints': List[str], # Formatted hints (e.g., ["Hint 1: ...", "Hint 2: ..."])
'follow_ups': List[str], # List of follow-up question strings
'note': Optional[str], # Note text or None
}
Example:
data = get_full_docstring_data("two-sum")
# Example structure
example = data['examples'][0]
# {
# 'number': 1,
# 'img': None or '<img src="...">',
# 'input': 'nums = [2,7,11,15], target = 9',
# 'output': '[0,1]',
# 'explanation': 'Because nums[0] + nums[1] == 9, we return [0, 1].'
# }
Returns empty structure if question not found:
{
'title': '',
'url': '',
'description': [],
'examples': [],
'constraints': [],
'topics': '',
'hints': [],
'follow_ups': [],
'note': None,
}
get_description_and_constraints(slug: str) -> Tuple[List[str], List[str]]¶
Legacy backward-compatible API for fetching description and constraints only.
Parameters: - slug (str): LeetCode problem slug (e.g., "combinations")
Returns: Tuple of (description_lines, constraint_lines) where: - description_lines: List of description strings (plain text, no examples/constraints) - constraint_lines: List of constraint strings (each starts with "- ")
Example:
desc, constraints = get_description_and_constraints("two-sum")
# desc: ["Given an array of integers nums...", "You may assume that..."]
# constraints: ["- 2 <= nums.length <= 10^4", "- -10^9 <= nums[i] <= 10^9", ...]
Returns: ([], []) if question not found or has no Body
get_question_data(slug: str, force_refresh: bool = False) -> Optional[Question]¶
Get complete Question object for direct access to all question data.
Parameters: - slug (str): LeetCode problem slug (e.g., "two-sum") - force_refresh (bool): If True, bypass cache and fetch fresh data from network
Returns: Question object or None if not found
Question Object Attributes:
| Attribute | Type | Description |
|---|---|---|
QID | int | Question ID |
title | str | Problem title |
titleSlug | str | URL slug |
difficulty | str | "Easy", "Medium", or "Hard" |
topicTags | str | Comma-separated tags (e.g., "array,hash-table") |
Body | str | HTML problem description |
Hints | List[str] | List of hint strings |
Code | str | Code template |
SimilarQuestions | List[int] | Related question IDs |
Example:
q = get_question_data("two-sum")
if q:
print(q.title) # "Two Sum"
print(q.difficulty) # "Easy"
print(q.topicTags) # "array,hash-table"
print(q.Body) # HTML content
print(q.Hints) # ["hint1", "hint2"]
Data Format Details¶
Description¶
- Format: List of plain text lines
- Content: Problem statement only (excludes Examples, Constraints, Follow-up, Note sections)
- Stops at: First occurrence of "Example", "Constraints", "Follow-up", "Note", or "Custom Judge"
Examples¶
- Format: List of dictionaries with keys:
number,img,input,output,explanation - Number: Example number (1, 2, 3, ...)
- Image: Preserved
<img>tag string orNone - Input/Output: Plain text (HTML tags removed)
- Explanation: Plain text, may span multiple lines
- Supports: Both new format (
<pre>blocks) and old format (separate<p>elements)
Constraints¶
- Format: List of strings, each starting with "- "
- Source: Extracted from
<ul><li>tags in HTML - Superscript:
<sup>n</sup>converted to^nnotation - Example:
["- 2 <= nums.length <= 10^4", "- -10^9 <= nums[i] <= 10^9"]
Topics¶
- Format: Comma-separated string with capitalized words
- Input:
"array,hash-table"(fromQuestion.topicTags) - Output:
"Array, Hash Table" - Transformation: Hyphens removed, words capitalized
Hints¶
- Format: List of strings with numbered format
- Input:
["Try a hash table", "A really brute force way..."] - Output:
["Hint 1: Try a hash table", "Hint 2: A really brute force way..."]
Follow-ups¶
- Format: List of plain text strings
- Handles: Multiple HTML formats (with/without
<p>tags, with ) - Content: Full follow-up question text (HTML tags removed)
Note¶
- Format: Optional string or
None - Content: Note section text (if present in problem)
Architecture¶
tools/docstring/
βββ formatter.py # Main module
βββ Public API
β βββ get_full_docstring_data() # Complete docstring data
β βββ get_description_and_constraints() # Legacy API
β βββ get_question_data() # Direct Question access
β
βββ Internal Extractors
βββ _extract_brief_description() # Description only
βββ _extract_examples() # All examples
βββ _extract_constraints() # Constraints list
βββ _extract_follow_up() # Follow-up questions
βββ _extract_note() # Note section
βββ _format_topics() # Topic formatting
βββ _format_hints() # Hint formatting
βββ _extract_text_from_html() # HTML β text utility
Data Flow¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β get_full_docstring_data(slug) β
β β β
β βΌ β
β get_question(slug) [from leetcode-api] β
β β β
β βΌ β
β Question Object β
β (with Body, Hints, topicTags, etc.) β
β β β
β βββββββββββββββββββΌββββββββββββββββββ β
β βΌ βΌ βΌ β
β _extract_*() _format_topics() _format_hints() β
β β β β β
β βββββββββββββββββββΌββββββββββββββββββ β
β βΌ β
β Structured Dictionary β
β {title, url, description, examples, constraints, ...} β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dependencies¶
tools/leetcode-api: Providesget_question()andQuestionobject- Standard Library:
re,html,typing,pathlib
Usage Examples¶
Generate Complete Docstring Data¶
from tools.docstring.formatter import get_full_docstring_data
data = get_full_docstring_data("two-sum")
# Build docstring from components
docstring = f"""
{data['title']}
{chr(10).join(data['description'])}
Examples:
"""
for ex in data['examples']:
docstring += f"""
Example {ex['number']}:
Input: {ex['input']}
Output: {ex['output']}
Explanation: {ex['explanation']}
"""
docstring += f"""
Constraints:
{chr(10).join(data['constraints'])}
Topics: {data['topics']}
"""
Extract Only Description and Constraints¶
from tools.docstring.formatter import get_description_and_constraints
desc, constraints = get_description_and_constraints("two-sum")
print("Description:")
for line in desc:
print(f" {line}")
print("\nConstraints:")
for constraint in constraints:
print(f" {constraint}")
Access Raw Question Data¶
from tools.docstring.formatter import get_question_data
# Get from cache (default)
q = get_question_data("two-sum")
# Force refresh from network
q = get_question_data("two-sum", force_refresh=True)
if q:
# Access raw HTML
print(q.Body)
# Access all metadata
print(f"ID: {q.QID}")
print(f"Difficulty: {q.difficulty}")
print(f"Acceptance: {q.acceptanceRate}%")
print(f"Similar: {q.SimilarQuestions}")
Interactive Testing¶
Run the module directly for interactive testing:
Enter a problem slug when prompted to see both API outputs.
HTML Parsing Details¶
Supported Formats¶
The module handles multiple LeetCode HTML formats:
-
New Format (Example in
<pre>block): -
Old Format (Separate elements):
HTML Cleaning¶
- Removes:
<script>,<style>, all HTML tags - Preserves: Text content,
<img>tags (in examples) - Converts:
<sup>n</sup>β^n(for constraints) - Normalizes: Multiple spaces/newlines β single space/double newline
Stop Keywords¶
Description extraction stops at: - "example" (case-insensitive) - "constraints:" - "follow-up:" or "follow-up" - "note:" - "custom judge"
Integration with Other Tools¶
fix_docstring.py¶
The tools/review-code/fix_docstring.py tool uses this module to generate docstrings:
from tools.docstring.formatter import get_full_docstring_data
data = get_full_docstring_data(slug)
# ... format into docstring and write to file
leetcode-api Module¶
This module depends on tools/leetcode-api for: - Caching: SQLite-backed cache reduces network requests - Unified API: Same Question interface whether from cache or network - Performance: Fast batch operations using local cache
See leetcode-api README for details.
Error Handling¶
Question Not Found¶
All functions handle missing questions gracefully:
# Returns empty structure
data = get_full_docstring_data("non-existent-slug")
# data = {'title': '', 'url': '', 'description': [], ...}
# Returns empty tuple
desc, constraints = get_description_and_constraints("non-existent-slug")
# desc = [], constraints = []
# Returns None
q = get_question_data("non-existent-slug")
# q = None
Missing Data¶
- Empty Body: Returns empty lists/None for all components
- No Examples:
examples = [] - No Constraints:
constraints = [] - No Hints:
hints = [] - No Follow-ups:
follow_ups = [] - No Note:
note = None
Related Documentation¶
- LeetCode API - SQLite cache and Question API
- Review Code Tool - File-level docstring generator
- Main Tools README - Tools overview
Module Structure¶
tools/docstring/
βββ formatter.py # Main module (503 lines)
β βββ Public Functions
β β βββ get_full_docstring_data()
β β βββ get_description_and_constraints()
β β βββ get_question_data()
β β
β βββ Internal Functions
β βββ _extract_text_from_html()
β βββ _extract_brief_description()
β βββ _extract_constraints()
β βββ _extract_examples()
β βββ _extract_follow_up()
β βββ _extract_note()
β βββ _format_topics()
β βββ _format_hints()
β
βββ README.md # This file
Testing¶
Run the module interactively:
Enter a problem slug to test extraction:
Enter LeetCode problem slug (e.g. two-sum): two-sum
=== Using get_description_and_constraints (backward-compatible) ===
Description:
Given an array of integers nums and an integer target...
...
=== Using get_full_docstring_data (new API) ===
Title: Two Sum
URL: https://leetcode.com/problems/two-sum/
Examples: 2
Example 1:
Input: nums = [2,7,11,15], target = 9
Output: [0,1]
...
Notes¶
- HTML Format Changes: LeetCode occasionally updates HTML structure; the module handles both old and new formats
- Performance: Uses SQLite cache from
leetcode-apifor fast repeated access - Offline Support: Works with cached data without network connection
- Backward Compatibility:
get_description_and_constraints()maintained for legacy code