Parse, chunk, search, and summarize documents through a single MCP interface. Built for Claude, compatible with any MCP host.
A complete document analysis pipeline exposed as MCP tools. From raw file parsing to semantic search, every capability is one tool call away.
Extract structured text from PDF, DOCX, and plaintext files with metadata preservation. Page boundaries, word counts, and authorship extracted automatically.
Split documents into overlapping chunks optimized for RAG retrieval. Configurable size and overlap with page-aware boundary detection.
Index chunks with Cohere Embed v3 and search by meaning, not keywords. Cosine similarity ranking with document-scoped filtering.
Generate brief, standard, or detailed summaries using Cohere Command R+. Extractive fallback when API is unavailable.
Title, author, page count, word count, file type, and size. Everything you need for document management and filtering.
Full Model Context Protocol compliance. JSON-RPC over stdio with proper tool definitions, error handling, and resource exposure.
A clean three-layer architecture connecting MCP clients to document intelligence.
Claude Desktop, IDE extension, or any MCP-compatible host
Stdio-based communication with tool discovery and invocation
Routes tool calls to extract_text, chunk_document, search_chunks, summarize_document, get_metadata
PDFParser, DocxParser, TextParser — extensible format support with caching
Embed v3 for semantic search, Command R+ for summarization
Each tool is discoverable via the MCP tools/list endpoint and callable via tools/call.
# Clone and set up git clone https://github.com/BabyChrist666/mcp-document-server.git cd mcp-document-server python -m venv venv source venv/bin/activate pip install -r requirements.txt # Set your Cohere API key export COHERE_API_KEY="your-key-here" # Run tests pytest tests/ -v # 50 tests passing
Clone the repo, add your Cohere key, and connect to Claude in under a minute.