TechCyclopedia
High-Fidelity Agent Data Pipeline

An intelligent MCP server that provides advanced web crawling capabilities with interactive configuration, live progress tracking, and background task management for technical documentation.

Get Started View on GitHub

60-80% Token Reduction

500+ Max Pages

5 Crawl Depth

TechCyclopedia Server

                            # Start crawling documentation
                        
                            await crawl_tech_docs(
                        
                            urls: ["https://docs.python.org/"],
                        
                            output_dir: "results"
                        
                            )
                        
                            # ✅ Crawled 47 pages successfully

Powerful Features

TechCyclopedia provides comprehensive web crawling capabilities with intelligent content processing

Interactive Configuration

Smart user preference system with persistent storage and "don't ask again" functionality

Live Progress Tracking

Real-time updates on crawling progress with detailed status and background task management

Organized Output

Clean, structured file organization by domain with intelligent content filtering

Content Filtering

Automatic boilerplate removal and content optimization for clean, LLM-ready markdown

Persistent Preferences

SQLite-based user preference storage with customizable crawl strategies

Deep Crawling

BFS strategy for comprehensive documentation extraction with configurable depth limits

How TechCyclopedia Works

An intelligent workflow that transforms web documentation into clean, organized content

User Request

User provides URLs through MCP client

Interactive Config

Smart configuration with user preferences

Deep Crawling

BFS strategy with intelligent link discovery

Content Processing

Boilerplate removal and content optimization

Organized Output

Clean markdown files organized by domain

User

MCP Client

TechCyclopedia

Crawler

Markdown

Live Progress Tracking

Active

Crawling https://docs.python.org/3/

75%

Processing tutorial.html

100%

Saving to results/python/tutorial.md

90%

Quick Start

Get up and running with TechCyclopedia in minutes

Install Dependencies

pip install -r requirements.txt

Start the Server

python server/server.py

Configure MCP Client

                                {
  "mcpServers": {
    "techcyclopedia": {
      "command": "python",
      "args": ["server/server.py"]
    }
  }
}
                            

TechCyclopedia Server

$ python server/server.py

TechCyclopedia: High-Fidelity Agent Data Pipeline

Server running on stdio transport...

Ready for MCP tool calls!

$ _

MCP Tools

TechCyclopedia provides several powerful MCP tools for technical documentation crawling

crawl_tech_docs

Interactive

Crawl technical documentation with interactive configuration and progress tracking

urls List[str]

output_dir str

user_id str

start_background_crawl

Background

Start a background crawling task that can run while continuing to chat

urls List[str]

output_dir str

check_task_status

Monitor

Check the status of a background crawling task with detailed progress information

task_id str

get_all_tasks

Management

Get all crawling tasks and their status information

set_dont_ask_again

Preferences

Set the 'don't ask again' flag for user preferences

user_id str

Open Source & Community

TechCyclopedia is open source and welcomes contributions from the community. Join us in building the future of intelligent documentation crawling.

Star on GitHub

Fork & Contribute

Report Issues

View on GitHub

NoManNayeem / TechCyclopedia

High-Fidelity Agent Data Pipeline for Technical Documentation Crawling

python mcp crawling documentation