Free Software

Web Crawling for the AI Era

Self-hosted platform that crawls websites, extracts data in any format, and integrates directly with AI agents. Your infrastructure, your data, your control.

Built with battle-tested technologies

Elixir/Phoenix PostgreSQL Docker
mulberry-api
# Create a crawl job
curl -X POST \
-H "Authorization: Bearer sk_..." \
-d '{"url": "https://example.com", "depth": 2}' \
https://your-server/api/crawls
# Response
{
"id": "crawl_abc123",
"status": "running",
"pages_crawled": 47,
"format": "markdown"
}
MCP Ready

Everything You Need to Crawl the Web

A complete platform for web data extraction, event processing, and AI integration—all self-hosted on your infrastructure.

Web Crawling Engine

Powerful, configurable crawling with multiple modes and real-time monitoring.

  • Website traversal & URL lists
  • Configurable depth & workers
  • Regex include/exclude patterns
  • HTML, Text, Markdown, JSON output
  • Real-time progress monitoring
  • Web UI for management

MCP Server

Native AI agent integration.

  • Bearer token auth
  • crawl_list, crawl_get, crawl_create
  • Scope-based permissions

Authentication

Secure, flexible auth options.

  • Magic link (passwordless)
  • Optional password auth
  • Sudo mode for sensitive ops

API Keys

Granular access control.

  • Private (sk_) & Public (pk_) keys
  • Hashed storage, expiration
  • Last-used tracking

Webhooks & Events

Real-time notifications.

  • Lifecycle events (started, completed)
  • Wildcard patterns (crawl.*)
  • Auto-retry with backoff

Multi-Tenant

Team-ready organization.

  • Accounts & organizations
  • Role-based access (owner, admin)
  • Account-level settings

Data Retention

Configurable 1-7 day retention

Rate Limiting

100 req/60s with standard headers

CLI Tools

Command-line subscription management

Up and Running in Minutes

Deploy Mulberry on your VM and start crawling. No complex setup, no vendor dependencies.

1

Deploy to Your VM

Clone the repo and run a single Docker command. Mulberry comes with everything pre-configured—PostgreSQL, reverse proxy, and SSL.

2

Create Your Account

Sign up with magic link or password authentication. Set up your organization and invite team members with role-based access.

3

Generate API Keys

Create private keys for full access or public keys for read-only operations. Configure expiration and track usage automatically.

4

Start Crawling

Use the REST API, Web UI, or MCP tools to create crawls. Get results in HTML, Markdown, or JSON. Set up webhooks for real-time notifications.

Quick Start
# Clone and deploy
$ git clone https://github.com/mulberry/mulberry
$ cd mulberry && docker compose up -d
# Create your first crawl
$ curl -X POST \
-H "Authorization: Bearer $API_KEY" \
-d '{"url": "https://docs.example.com"}' \
https://your-mulberry-server/api/crawls
# Connect AI agent via MCP
{
"mcpServers": {
"mulberry": {
"url": "https://your-server/mcp",
"token": "sk_..."
}
}
}

Built for Real Workflows

From AI agents to data pipelines, Mulberry powers production workloads at any scale.

AI Agent Integration

Give your AI agents the ability to crawl and understand any website. Native MCP support means Claude, GPT, and other agents can request crawls and access results directly.

MCP Tools RAG Pipelines Research Agents

Data Pipelines

Build automated data collection workflows. Webhooks notify your systems when crawls complete, and the REST API integrates with any ETL tool or data platform.

Webhooks REST API JSON Export

Content Aggregation

Monitor documentation sites, news sources, or competitor pages. Regex filtering lets you extract exactly what you need, and Markdown output is perfect for content systems.

Markdown URL Filtering Scheduling

Documentation Search

Index your own docs or crawl external documentation. Perfect for building searchable knowledge bases or feeding context to AI assistants.

Full Text Site Traversal Depth Control

Website Monitoring

Track changes across websites over time. Combine with webhooks to get notified when content changes, prices update, or new pages appear.

Change Detection Alerts History

Market Research

Gather competitive intelligence and market data. URL list mode lets you crawl specific pages across multiple sites in a single job.

URL Lists Batch Jobs Multi-site

Your Data. Your Infrastructure.

In a world of SaaS sprawl and data concerns, Mulberry puts you back in control. Run it on your own servers, keep your data private, and never worry about vendor lock-in or surprise pricing.

Complete Data Privacy

Your crawled data never leaves your infrastructure. No third-party access, no data processing concerns, full GDPR compliance.

Zero Recurring Costs

Free software forever. Pay only for your server costs. No per-crawl pricing, no API call limits, no surprise bills.

Full Customization

Modify the source code, add custom features, integrate with internal systems. It's your software to extend as needed.

No Vendor Lock-in

Standard APIs, portable data formats, open protocols. Switch, fork, or modify without losing your work.

$0
License Cost
Crawls/Month
100%
Data Ownership
0
External Dependencies

Runs on any VM with Docker. Recommended: 2 vCPU, 4GB RAM.

Ready to Take Control?

Deploy Mulberry on your VM today. Free software, ready for production in minutes.

Quick Install
$ git clone https://github.com/mulberry/mulberry
$ cd mulberry && docker compose up -d