Mulberry Documentation

Welcome to the Mulberry documentation. Mulberry is a self-hosted web crawling platform that integrates with AI agents via MCP (Model Context Protocol). This documentation will help you get started and make the most of the platform.

Quick Links

Core Concepts

Mulberry is built around a few key concepts:

  • Crawls - Jobs that fetch and process web pages. Each crawl targets a URL or list of URLs and extracts content in your chosen format.
  • API Keys - Authentication tokens for accessing the API. Private keys (sk_) have full access, public keys (pk_) are read-only.
  • Webhooks - HTTP callbacks that notify your systems when crawl events occur (started, completed, failed).
  • MCP Server - A Model Context Protocol server that allows AI agents to interact with Mulberry directly.

Output Formats

Mulberry can extract content in multiple formats:

  • HTML - Raw HTML content as crawled
  • Text - Plain text with HTML tags stripped
  • Markdown - Structured markdown, ideal for AI consumption
  • JSON - Structured data with metadata

Need Help?

If you run into issues or have questions, check the GitHub Issues page or open a new issue.