Sync Crawls
Crawl a single URL and receive immediate results without creating a background job. Perfect for quick lookups, API integrations, or when you need results right away.
Endpoint
GET /api/v1/crawls/sync
Auth Required
Bearer Token (Private Key)
Max Timeout
120 seconds
Sync vs Async Crawls
Choose the right approach for your use case:
| Feature | Sync Crawls | Async Crawls |
|---|---|---|
| HTTP Method | GET | POST |
| URLs | Single URL only | Multiple URLs, full site crawl |
| Response | Immediate results (200 OK) | Job ID (202 Accepted), poll for results |
| Depth | Single page only | Configurable depth (default: 1) |
| Webhooks | Not supported | Supported |
| Events | Does not emit events | Emits crawl lifecycle events |
| Persistence | No database record | Creates background job in database |
| Use Case | Quick lookups, API integrations | Large crawls, batch processing |
API Endpoint
GET /api/v1/crawls/sync
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | The URL to crawl (must be HTTP or HTTPS) |
result_format | string | No | html | Output format: html, text, markdown, or json |
timeout | integer | No | 30000 | Request timeout in milliseconds (max: 120000) |
Usage Examples
Basic Request
curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com" \
-H "Authorization: Bearer sk_your_api_key" With Markdown Format
curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com&result_format=markdown" \
-H "Authorization: Bearer sk_your_api_key" Custom Timeout
curl -X GET "https://your-server/api/v1/crawls/sync?url=https://example.com&timeout=60000" \
-H "Authorization: Bearer sk_your_api_key" Output Formats
htmlRaw HTML content (default). Returns the full HTML source.
textPlain text with HTML tags removed. Clean, readable content.
markdownConverted to Markdown format. Great for AI processing.
jsonStructured JSON with metadata. Full parsing details.
Response Format
Success Response (200 OK)
{
"data": [
{
"url": "https://example.com",
"title": "Example Domain",
"description": "Example Description",
"content": "<html>...</html>",
"content_type": "html",
"status_code": 200,
"response_time_ms": 150,
"content_length": 1256,
"fetched_at": "2024-01-15T10:30:00Z"
}
],
"errors": []
} Error Response
{
"data": [],
"errors": [
{
"url": "https://example.com",
"category": "fetch_failed",
"message": "Connection refused"
}
]
} Error Handling
HTTP Status Codes
| Status | Description |
|---|---|
200 | Success (check errors array for crawl failures) |
401 | Missing or invalid API key |
403 | Public API key used (requires private key) |
408 | Request timeout |
422 | Invalid URL or parameters |
429 | Rate limit exceeded |
Error Categories
| Category | Description |
|---|---|
fetch_failed | Network error (connection refused, DNS failure, etc.) |
robots_blocked | URL blocked by robots.txt |
content_too_large | Response exceeded 10MB limit |
no_result | No response from crawler |
internal_error | Unexpected error during crawling |
Rate Limiting
Sync crawls have a lower concurrency limit than async crawls. The response includes a
Retry-After header when limits are exceeded. Use async crawls for batch operations.
Security Features
SSRF Protection
Mulberry blocks access to internal and private addresses to prevent server-side request forgery:
localhostand127.0.0.0/810.0.0.0/8(private network)172.16.0.0/12(private network)192.168.0.0/16(private network)169.254.0.0/16(link-local)::1(IPv6 loopback)
Content Size Limits
Responses larger than 10MB are rejected with a content_too_large error.
Adjust the timeout parameter as needed for slow-loading pages.
MCP Integration
The sync crawl is also available as an MCP tool for AI agents:
MCP Tool Call
{
"url": "https://example.com",
"result_format": "markdown",
"timeout": 30000
} When to Use Sync Crawls
Use sync crawls when you need immediate results for a single page. For multi-page crawls,
site traversal, or webhook notifications, use async crawls.