Mulberry Documentation

Welcome to Mulberry, a self-hosted web crawling platform that integrates with AI agents via MCP (Model Context Protocol). This documentation will help you get started and make the most of the platform.

Core Concepts

Mulberry is built around a few key concepts:

Core

Crawls

Jobs that fetch and process web pages. Each crawl targets a URL or list of URLs and extracts content in your chosen format.

Auth

API Keys

Authentication tokens for accessing the API. Private keys (sk_) have full access, public keys (pk_) are read-only.

Events

Webhooks

HTTP callbacks that notify your systems when crawl events occur (started, completed, failed).

AI

MCP Server

A Model Context Protocol server that allows AI agents to interact with Mulberry directly.

Output Formats

Mulberry can extract content in multiple formats to suit your needs:

html Raw HTML content as crawled
text Plain text with HTML tags stripped
markdown Structured markdown, ideal for AI consumption Recommended
json Structured data with metadata

Additional Resources