Sitemap Generation
Generate XML sitemaps from completed crawl results, following the sitemaps.org protocol. Essential for SEO optimization, site audits, and content inventory management.
REST API
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Crawl job UUID (path parameter) |
page | integer | No | Page number for large sitemaps (default: 1) |
Examples
curl -H "Authorization: Bearer sk_your_api_key" \
https://your-server/api/v1/crawls/crawl_abc123/sitemap curl -H "Authorization: Bearer sk_your_api_key" \
"https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=2" Response Format
Returns application/xml content type. For crawls with ≤50K URLs, returns a sitemap XML.
For >50K URLs (without page parameter), returns a sitemap index XML.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
<lastmod>2024-01-15T10:30:15Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://example.com/page2</loc>
<lastmod>2024-01-15T10:30:18Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
</urlset> <?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=1</loc>
<lastmod>2024-01-15T10:30:00Z</lastmod>
</sitemap>
<sitemap>
<loc>https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=2</loc>
<lastmod>2024-01-15T10:30:00Z</lastmod>
</sitemap>
</sitemapindex> MCP Tool
Generate sitemaps from AI agents using the crawl_sitemap MCP tool:
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Crawl job UUID |
page | integer | No | Page number for large sitemaps (default: 1) |
Examples
{
"id": "crawl_abc123"
} {
"id": "crawl_abc123",
"page": 2
} {
"data": {
"sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n <url>\n <loc>https://example.com/page1</loc>\n <lastmod>2024-01-15T10:30:15Z</lastmod>\n <changefreq>weekly</changefreq>\n <priority>0.5</priority>\n </url>\n</urlset>"
}
} data.sitemap field.
For large sitemaps (>50K URLs), consider using the REST API directly and streaming the response to a file.
Sitemap Format
Protocol Compliance
Generated sitemaps follow the sitemaps.org v0.9 specification with these features:
URL Filtering Rules
| Status Code Range | Included? | Reason |
|---|---|---|
| 200-299 | Yes | Successful requests |
| 300-399 | Yes | Successful redirects |
| 400-499 | No | Client errors |
| 500-599 | No | Server errors |
Use Cases
SEO Audit Workflow
Crawl a competitor's site and generate a sitemap to analyze their page structure:
# 1. Create crawl
curl -X POST \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://competitor.com", "depth": 3}' \
https://api.example.com/api/crawls
# 2. Generate sitemap when complete
curl -H "Authorization: Bearer sk_your_api_key" \
https://api.example.com/api/v1/crawls/crawl_xyz/sitemap > competitor-sitemap.xml
# 3. Compare with actual sitemap to find orphan pages Site Migration Verification
Generate sitemaps before and after migration to ensure all pages were migrated:
# Crawl old site
curl -X POST \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://old-site.com", "depth": 5}' \
https://api.example.com/api/crawls
# After migration, crawl new site
curl -X POST \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://new-site.com", "depth": 5}' \
https://api.example.com/api/crawls
# Generate and compare sitemaps Content Inventory for CMS Migration
Crawl entire site and export sitemap for import into new CMS:
# Crawl entire site
curl -X POST \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://legacy-cms.com", "depth": 10}' \
https://api.example.com/api/crawls
# Export sitemap for import into new CMS
curl -H "Authorization: Bearer sk_your_api_key" \
https://api.example.com/api/v1/crawls/crawl_abc/sitemap > content-inventory.xml AI-Powered SEO Optimization
Use AI agents to analyze sitemaps and provide SEO recommendations:
"Let me analyze the site structure. I'll crawl the site first, then generate a sitemap to identify which pages are being indexed and recommend improvements."
// Agent uses crawl_create
{
"url": "https://example.com",
"depth": 3,
"format": "markdown",
"wait": true
}
// Then uses crawl_sitemap
{
"id": "crawl_xyz"
}
// Agent analyzes sitemap to find:
// - Missing canonical tags
// - Pages with low priority
// - Orphan pages
// - Deep URL structure issues Error Handling
REST API Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
403 | Key lacks crawls:read permission |
404 | Crawl ID doesn't exist or crawl not completed |
404 | Page number out of range |
500 | Server-side error |
MCP Tool Errors
| Error | Description |
|---|---|
unauthorized | Invalid or missing API key |
forbidden | Key lacks required permissions |
not_found | Crawl ID doesn't exist |
page_not_found | Page number out of range |
no_results | Crawl has no successful URLs |
Rate Limiting
- Sitemap requests count toward standard API rate limits
- Recommended: Cache sitemap responses locally for completed crawls
- Use webhooks to regenerate sitemaps when crawl completes