Sync Crawls

Crawl a single URL and receive immediate results without creating a background job. Perfect for quick lookups, API integrations, or when you need results right away.

Endpoint

GET /api/v1/crawls/sync

Auth Required

Bearer Token (Private Key)

Max Timeout

120 seconds

Sync vs Async Crawls

Choose the right approach for your use case:

Feature	Sync Crawls	Async Crawls
HTTP Method	`GET`	`POST`
URLs	Single URL only	Multiple URLs, full site crawl
Response	Immediate results (200 OK)	Job ID (202 Accepted), poll for results
Depth	Single page only	Configurable depth (default: 1)
Webhooks	Not supported	Supported
Events	Does not emit events	Emits crawl lifecycle events
Persistence	No database record	Creates background job in database
Use Case	Quick lookups, API integrations	Large crawls, batch processing

API Endpoint

GET /api/v1/crawls/sync

Parameters

Parameter	Type	Required	Default	Description
`url`	string	Yes	—	The URL to crawl (must be HTTP or HTTPS)
`result_format`	string	No	`html`	Output format: `html`, `text`, `markdown`, or `json`
`timeout`	integer	No	`30000`	Request timeout in milliseconds (max: 120000)

Usage Examples

Basic Request

curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com" \
  -H "Authorization: Bearer sk_your_api_key"

With Markdown Format

curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com&result_format=markdown" \
  -H "Authorization: Bearer sk_your_api_key"

Custom Timeout

curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com&timeout=60000" \
  -H "Authorization: Bearer sk_your_api_key"

Output Formats

html

Raw HTML content (default). Returns the full HTML source.

text

Plain text with HTML tags removed. Clean, readable content.

markdown

Converted to Markdown format. Great for AI processing.

json

Structured JSON with metadata. Full parsing details.

Response Format

Success Response (200 OK)

{
  "data": [
    {
      "url": "https://example.com",
      "title": "Example Domain",
      "description": "Example Description",
      "content": "<html>...</html>",
      "content_type": "html",
      "status_code": 200,
      "response_time_ms": 150,
      "content_length": 1256,
      "fetched_at": "2024-01-15T10:30:00Z"
    }
  ],
  "errors": []
}

Error Response

{
  "data": [],
  "errors": [
    {
      "url": "https://example.com",
      "category": "fetch_failed",
      "message": "Connection refused"
    }
  ]
}

Error Handling

HTTP Status Codes

Status	Description
`200`	Success (check `errors` array for crawl failures)
`401`	Missing or invalid API key
`403`	Public API key used (requires private key)
`408`	Request timeout
`422`	Invalid URL or parameters
`429`	Rate limit exceeded

Error Categories

Category	Description
`fetch_failed`	Network error (connection refused, DNS failure, etc.)
`robots_blocked`	URL blocked by robots.txt
`content_too_large`	Response exceeded 10MB limit
`no_result`	No response from crawler
`internal_error`	Unexpected error during crawling

Rate Limiting

Sync crawls have a lower concurrency limit than async crawls. The response includes a Retry-After header when limits are exceeded. Use async crawls for batch operations.

Security Features

SSRF Protection

Mulberry blocks access to internal and private addresses to prevent server-side request forgery:

localhost and 127.0.0.0/8
10.0.0.0/8 (private network)
172.16.0.0/12 (private network)
192.168.0.0/16 (private network)
169.254.0.0/16 (link-local)
::1 (IPv6 loopback)

Content Size Limits

Responses larger than 10MB are rejected with a content_too_large error. Adjust the timeout parameter as needed for slow-loading pages.

MCP Integration

The sync crawl is also available as an MCP tool for AI agents:

MCP Tool Call

{
  "url": "https://example.com",
  "result_format": "markdown",
  "timeout": 30000
}

When to Use Sync Crawls

Use sync crawls when you need immediate results for a single page. For multi-page crawls, site traversal, or webhook notifications, use async crawls.

Next Steps

Async Crawls

Learn about multi-page crawls and batch processing

MCP Integration

Use sync crawls with AI agents via MCP

API Keys

Manage your API keys and permissions