Free Software

Web Data Platform for the AI Era

Self-hosted platform that crawls websites, monitors business listings, extracts structured data, and integrates with AI agents. Your infrastructure, your data, your control.

Built with battle-tested technologies

Elixir/Phoenix PostgreSQL Docker
mulberry-api
# Create a crawl job
curl -X POST \
-H "Authorization: Bearer sk_..." \
-d '{"url": "https://example.com", "depth": 2}' \
https://your-server/api/crawls
# Response
{
"id": "crawl_abc123",
"status": "running",
"pages_crawled": 47,
"format": "markdown"
}
MCP Ready

Everything You Need to Own Your Web Data

A complete platform for web data extraction, event processing, and AI integration—all self-hosted on your infrastructure.

Web Crawling Engine

Powerful, configurable crawling with multiple modes and real-time monitoring.

  • Website traversal & URL lists
  • Configurable depth & workers
  • Regex include/exclude patterns
  • HTML, Text, Markdown, JSON output
  • Real-time progress monitoring
  • Web UI for management

MCP Server

Native AI agent integration.

  • Bearer token auth
  • crawl_list, crawl_get, crawl_create
  • Scope-based permissions

Authentication

Secure, flexible auth options.

  • Magic link (passwordless)
  • Optional password auth
  • Sudo mode for sensitive ops

API Keys

Granular access control.

  • Private (sk_) & Public (pk_) keys
  • Hashed storage, expiration
  • Last-used tracking

Webhooks & Events

Real-time notifications.

  • Lifecycle events (started, completed)
  • Wildcard patterns (crawl.*)
  • Auto-retry with backoff

Multi-Tenant

Team-ready organization.

  • Accounts & organizations
  • Role-based access (owner, admin)
  • Account-level settings

Live Databases

Monitor business listings and track changes across Google Maps in real time.

  • Google Maps business monitoring
  • Automated change detection
  • Review tracking & alerts
  • Configurable watched fields
  • Event notifications (listing.*, review.*)
  • Web UI for management

Page Data Extraction

Structured data from any page.

  • Markdown, metadata, or both
  • Schema-based structured extraction
  • Custom LLM providers (OpenAI, Anthropic, Google)
  • REST API & MCP tool

Data Retention

Configurable 1-7 day retention

Rate Limiting

100 req/60s with standard headers

Web Dashboard

Manage crawls, listings & keys from the browser

Up and Running in Minutes

Deploy Mulberry on your VM and start crawling. No complex setup, no vendor dependencies.

1

Deploy to Your VM

Clone the repo and run a single Docker command. Mulberry comes with everything pre-configured—PostgreSQL, reverse proxy, and SSL.

2

Create Your Account

Sign up with magic link or password authentication. Set up your organization and invite team members with role-based access.

3

Generate API Keys

Create private keys for full access or public keys for read-only operations. Configure expiration and track usage automatically.

4

Start Working

Start crawling, monitor business listings with Live Databases, extract structured data from pages, or connect AI agents via MCP. Set up webhooks for real-time notifications.

Quick Start
# Pull and deploy
$ docker pull agoodway/mulberry_bot
$ docker run -d -p 4000:4000 agoodway/mulberry_bot
# Create your first crawl
$ curl -X POST \
-H "Authorization: Bearer $API_KEY" \
-d '{"url": "https://docs.example.com"}' \
https://your-mulberry-server/api/crawls
# Connect AI agent via MCP
{
"mcpServers": {
"mulberry": {
"url": "https://your-server/mcp",
"token": "sk_..."
}
}
}

Built for Real Workflows

From AI agents to data pipelines, Mulberry powers production workloads at any scale.

AI Agent Integration

Give your AI agents the ability to crawl and understand any website. Native MCP support means Claude, GPT, and other agents can request crawls and access results directly.

MCP Tools RAG Pipelines Research Agents

Data Pipelines

Build automated data collection workflows. Webhooks notify your systems when crawls complete, and the REST API integrates with any ETL tool or data platform.

Webhooks REST API JSON Export

Content Aggregation

Monitor documentation sites, news sources, or competitor pages. Regex filtering lets you extract exactly what you need, and Markdown output is perfect for content systems.

Markdown URL Filtering Scheduling

Documentation Search

Index your own docs or crawl external documentation. Perfect for building searchable knowledge bases or feeding context to AI assistants.

Full Text Site Traversal Depth Control

Website Monitoring

Track changes across websites and business listings over time. Get notified when content changes, prices update, new reviews appear, or listing details are modified.

Change Detection Listing Monitoring Alerts

Reputation Monitoring

Track your business reviews and competitor listings across Google Maps. Get alerted to new reviews, rating changes, and listing modifications as they happen.

Review Tracking Local SEO Competitive Intel

Lead Generation

Build prospect lists from Google Maps listings and business directories. Extract contact details, hours, and ratings with Live Databases, then enrich with Page Extraction.

Live Databases Data Enrichment Prospect Lists

Market Research

Gather competitive intelligence and market data. URL list mode lets you crawl specific pages across multiple sites in a single job.

URL Lists Batch Jobs Multi-site

Your Data. Your Infrastructure.

In a world of SaaS sprawl and data concerns, Mulberry puts you back in control. Run it on your own servers, keep your data private, and never worry about vendor lock-in or surprise pricing.

Complete Data Privacy

Your crawled data never leaves your infrastructure. No third-party access, no data processing concerns, full GDPR compliance.

Zero Recurring Costs

Free software forever. Pay only for your server costs. No per-crawl pricing, no API call limits, no surprise bills.

Full Customization

Modify the source code, add custom features, integrate with internal systems. It's your software to extend as needed.

No Vendor Lock-in

Standard APIs, portable data formats, open protocols. Switch, fork, or modify without losing your work.

Free
Free core forever
Crawls/Month
100%
Data Ownership

Runs on any VM with Docker. Recommended: 2 vCPU, 4GB RAM.

Choose Your Plan

Start free with the full crawling platform. Add Pro features when you need business monitoring and structured extraction.

Free

Open source core platform, free forever.

  • Web crawling engine
  • Webhooks & event system
  • MCP server for AI agents
  • REST API & Web Dashboard
  • Multi-tenant & API keys
Get Started
PRO

Pro

Business monitoring, extraction & higher limits.

  • Everything in Free
  • Live Databases & listing monitoring
  • Page Data Extraction (LLM-powered)
  • Review tracking & alerts
  • Higher rate limits
Contact Us
Mulberry Dashboard
Active Crawls
3
Listings Tracked
127
Pages Crawled
4,291
docs.example.com 247 pages Complete
blog.example.com 82 pages Running
Joe's Coffee — Google Maps 3 reviews Monitoring

Ready to Take Control?

Deploy Mulberry on your VM today. Free software, ready for production in minutes.

Quick Install
$ docker pull agoodway/mulberry_bot
$ docker run -d -p 4000:4000 agoodway/mulberry_bot