Model Context Protocol

Document intelligence
for every AI agent

Parse, chunk, search, and summarize documents through a single MCP interface. Built for Claude, compatible with any MCP host.

View Source → Quick Start
5
MCP Tools
3
File Formats
50
Tests Passing
<1s
Parse Latency
Capabilities

Everything your agent needs to
understand documents

A complete document analysis pipeline exposed as MCP tools. From raw file parsing to semantic search, every capability is one tool call away.

📄

Multi-Format Parsing

Extract structured text from PDF, DOCX, and plaintext files with metadata preservation. Page boundaries, word counts, and authorship extracted automatically.

🧩

Intelligent Chunking

Split documents into overlapping chunks optimized for RAG retrieval. Configurable size and overlap with page-aware boundary detection.

🔍

Semantic Search

Index chunks with Cohere Embed v3 and search by meaning, not keywords. Cosine similarity ranking with document-scoped filtering.

📝

Summarization

Generate brief, standard, or detailed summaries using Cohere Command R+. Extractive fallback when API is unavailable.

🏷

Metadata Extraction

Title, author, page count, word count, file type, and size. Everything you need for document management and filtering.

🔗

MCP Native

Full Model Context Protocol compliance. JSON-RPC over stdio with proper tool definitions, error handling, and resource exposure.

System Design

How it works

A clean three-layer architecture connecting MCP clients to document intelligence.

🤖

MCP Client

Claude Desktop, IDE extension, or any MCP-compatible host

JSON-RPC Transport

Stdio-based communication with tool discovery and invocation

🔧

Tool Router

Routes tool calls to extract_text, chunk_document, search_chunks, summarize_document, get_metadata

📚

Parser Registry

PDFParser, DocxParser, TextParser — extensible format support with caching

🧠

Cohere AI Layer

Embed v3 for semantic search, Command R+ for summarization

API Reference

Available tools

Each tool is discoverable via the MCP tools/list endpoint and callable via tools/call.

extract_text Extract full text content from PDF, DOCX, or TXT files Read
chunk_document Split documents into overlapping chunks with optional search indexing Transform
search_chunks Semantic search across indexed chunks using Cohere embeddings Search
summarize_document Generate brief, standard, or detailed document summaries Generate
get_metadata Extract title, author, page count, word count, and file metadata Read
Getting Started

Up and running in 60 seconds

# Clone and set up
git clone https://github.com/BabyChrist666/mcp-document-server.git
cd mcp-document-server

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

# Set your Cohere API key
export COHERE_API_KEY="your-key-here"

# Run tests
pytest tests/ -v  # 50 tests passing
Built With

Tech stack

Python 3.10+ MCP SDK Cohere Embed v3 Cohere Command R+ PyPDF2 python-docx Pydantic asyncio pytest

Ready to give your agent
document superpowers?

Clone the repo, add your Cohere key, and connect to Claude in under a minute.