Self-hosted platform that crawls websites, extracts data in any format, and integrates directly with AI agents. Your infrastructure, your data, your control.
Built with battle-tested technologies
A complete platform for web data extraction, event processing, and AI integration—all self-hosted on your infrastructure.
Powerful, configurable crawling with multiple modes and real-time monitoring.
Native AI agent integration.
Secure, flexible auth options.
Granular access control.
Real-time notifications.
Team-ready organization.
Configurable 1-7 day retention
100 req/60s with standard headers
Command-line subscription management
Deploy Mulberry on your VM and start crawling. No complex setup, no vendor dependencies.
Clone the repo and run a single Docker command. Mulberry comes with everything pre-configured—PostgreSQL, reverse proxy, and SSL.
Sign up with magic link or password authentication. Set up your organization and invite team members with role-based access.
Create private keys for full access or public keys for read-only operations. Configure expiration and track usage automatically.
Use the REST API, Web UI, or MCP tools to create crawls. Get results in HTML, Markdown, or JSON. Set up webhooks for real-time notifications.
From AI agents to data pipelines, Mulberry powers production workloads at any scale.
Give your AI agents the ability to crawl and understand any website. Native MCP support means Claude, GPT, and other agents can request crawls and access results directly.
Build automated data collection workflows. Webhooks notify your systems when crawls complete, and the REST API integrates with any ETL tool or data platform.
Monitor documentation sites, news sources, or competitor pages. Regex filtering lets you extract exactly what you need, and Markdown output is perfect for content systems.
Index your own docs or crawl external documentation. Perfect for building searchable knowledge bases or feeding context to AI assistants.
Track changes across websites over time. Combine with webhooks to get notified when content changes, prices update, or new pages appear.
Gather competitive intelligence and market data. URL list mode lets you crawl specific pages across multiple sites in a single job.
In a world of SaaS sprawl and data concerns, Mulberry puts you back in control. Run it on your own servers, keep your data private, and never worry about vendor lock-in or surprise pricing.
Your crawled data never leaves your infrastructure. No third-party access, no data processing concerns, full GDPR compliance.
Free software forever. Pay only for your server costs. No per-crawl pricing, no API call limits, no surprise bills.
Modify the source code, add custom features, integrate with internal systems. It's your software to extend as needed.
Standard APIs, portable data formats, open protocols. Switch, fork, or modify without losing your work.
Runs on any VM with Docker. Recommended: 2 vCPU, 4GB RAM.
Deploy Mulberry on your VM today. Free software, ready for production in minutes.