MCP Document Server — AI-Powered Document Analysis

Capabilities

Everything your agent needs to
understand documents

A complete document analysis pipeline exposed as MCP tools. From raw file parsing to semantic search, every capability is one tool call away.

📄

Multi-Format Parsing

Extract structured text from PDF, DOCX, and plaintext files with metadata preservation. Page boundaries, word counts, and authorship extracted automatically.

🧩

Intelligent Chunking

Split documents into overlapping chunks optimized for RAG retrieval. Configurable size and overlap with page-aware boundary detection.

🔍

Semantic Search

Index chunks with Cohere Embed v3 and search by meaning, not keywords. Cosine similarity ranking with document-scoped filtering.

📝

Summarization

Generate brief, standard, or detailed summaries using Cohere Command R+. Extractive fallback when API is unavailable.

🏷

Metadata Extraction

Title, author, page count, word count, file type, and size. Everything you need for document management and filtering.

🔗

MCP Native

Full Model Context Protocol compliance. JSON-RPC over stdio with proper tool definitions, error handling, and resource exposure.

System Design

How it works

A clean three-layer architecture connecting MCP clients to document intelligence.

🤖

MCP Client

Claude Desktop, IDE extension, or any MCP-compatible host

⚡

JSON-RPC Transport

Stdio-based communication with tool discovery and invocation

🔧

Tool Router

Routes tool calls to extract_text, chunk_document, search_chunks, summarize_document, get_metadata

📚

Parser Registry

PDFParser, DocxParser, TextParser — extensible format support with caching

🧠

Cohere AI Layer

Embed v3 for semantic search, Command R+ for summarization

API Reference

Available tools

Each tool is discoverable via the MCP tools/list endpoint and callable via tools/call.

extract_text Extract full text content from PDF, DOCX, or TXT files Read

chunk_document Split documents into overlapping chunks with optional search indexing Transform

search_chunks Semantic search across indexed chunks using Cohere embeddings Search

summarize_document Generate brief, standard, or detailed document summaries Generate

get_metadata Extract title, author, page count, word count, and file metadata Read

Getting Started

Up and running in 60 seconds

# Clone and set up
git clone https://github.com/BabyChrist666/mcp-document-server.git
cd mcp-document-server

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

# Set your Cohere API key
export COHERE_API_KEY="your-key-here"

# Run tests
pytest tests/ -v  # 50 tests passing

// Add to claude_desktop_config.json
{
  "mcpServers": {
    "document-analysis": {
      "command": "python",
      "args": ["-m", "mcp_doc_server"],
      "cwd": "/path/to/mcp-document-server"
    }
  }
}

# Use with Claude Code CLI
claude --mcp-server "python -m mcp_doc_server"

# Now Claude can analyze your documents
# Example: "Summarize the contract in ~/docs/contract.pdf"

Document intelligence
for every AI agent

Everything your agent needs to
understand documents

Multi-Format Parsing

Intelligent Chunking

Semantic Search

Summarization

Metadata Extraction

MCP Native

How it works

MCP Client

JSON-RPC Transport

Tool Router

Parser Registry

Cohere AI Layer

Available tools

Up and running in 60 seconds

Tech stack

Ready to give your agent
document superpowers?

Document intelligencefor every AI agent

Everything your agent needs tounderstand documents

Multi-Format Parsing

Intelligent Chunking

Semantic Search

Summarization

Metadata Extraction

MCP Native

How it works

MCP Client

JSON-RPC Transport

Tool Router

Parser Registry

Cohere AI Layer

Available tools

Up and running in 60 seconds

Tech stack

Ready to give your agentdocument superpowers?

Document intelligence
for every AI agent

Everything your agent needs to
understand documents

Ready to give your agent
document superpowers?