Sitemap Generation

Generate XML sitemaps from completed crawl results, following the sitemaps.org protocol. Essential for SEO optimization, site audits, and content inventory management.

REST Endpoint

GET /api/v1/crawls/:id/sitemap

MCP Tool

crawl_sitemap

Permission

crawls:read

Looking to analyze site structure?

Generate sitemaps from crawl results for SEO audits, content inventories, and migration planning.

REST API

GET /api/v1/crawls/:id/sitemap

Parameters

Parameter	Type	Required	Description
`id`	string	Yes	Crawl job UUID (path parameter)
`page`	integer	No	Page number for large sitemaps (default: 1)

Note

Sitemaps are only generated for completed crawls. The crawl must have a status of "completed" before you can request a sitemap.

Examples

Basic Usage

curl -H "Authorization: Bearer sk_your_api_key" \
  https://your-server/api/v1/crawls/crawl_abc123/sitemap

With Pagination

curl -H "Authorization: Bearer sk_your_api_key" \
  "https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=2"

Response Format

Returns application/xml content type. For crawls with ≤50K URLs, returns a sitemap XML. For >50K URLs (without page parameter), returns a sitemap index XML.

Sitemap XML Example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page1</loc>
    <lastmod>2024-01-15T10:30:15Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>https://example.com/page2</loc>
    <lastmod>2024-01-15T10:30:18Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
</urlset>

Sitemap Index XML Example (>50K URLs)

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=1</loc>
    <lastmod>2024-01-15T10:30:00Z</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://your-server/api/v1/crawls/crawl_abc123/sitemap?page=2</loc>
    <lastmod>2024-01-15T10:30:00Z</lastmod>
  </sitemap>
</sitemapindex>

MCP Tool

Generate sitemaps from AI agents using the crawl_sitemap MCP tool:

Parameters

Parameter	Type	Required	Description
`id`	string	Yes	Crawl job UUID
`page`	integer	No	Page number for large sitemaps (default: 1)

Examples

Basic Call

{
  "id": "crawl_abc123"
}

With Pagination

{
  "id": "crawl_abc123",
  "page": 2
}

Response Format

{
  "data": {
    "sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n  <url>\n    <loc>https://example.com/page1</loc>\n    <lastmod>2024-01-15T10:30:15Z</lastmod>\n    <changefreq>weekly</changefreq>\n    <priority>0.5</priority>\n  </url>\n</urlset>"
  }
}

Important

The crawl_sitemap tool returns the sitemap as a string in the data.sitemap field. For large sitemaps (>50K URLs), consider using the REST API directly and streaming the response to a file.

Sitemap Format

Protocol Compliance

Generated sitemaps follow the sitemaps.org v0.9 specification with these features:

URL Limit

Maximum 50,000 URLs per sitemap

Pagination

Automatic for crawls exceeding 50K URLs

URL Filtering

Only includes successful pages (HTTP 200-399)

Sorting

URLs sorted alphabetically

URL Filtering Rules

Status Code Range	Included?	Reason
200-299	Yes	Successful requests
300-399	Yes	Successful redirects
400-499	No	Client errors
500-599	No	Server errors

Use Cases

SEO Audit Workflow

Crawl a competitor's site and generate a sitemap to analyze their page structure:

# 1. Create crawl
curl -X POST \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://competitor.com", "depth": 3}' \
  https://api.example.com/api/crawls

# 2. Generate sitemap when complete
curl -H "Authorization: Bearer sk_your_api_key" \
  https://api.example.com/api/v1/crawls/crawl_xyz/sitemap > competitor-sitemap.xml

# 3. Compare with actual sitemap to find orphan pages

Site Migration Verification

Generate sitemaps before and after migration to ensure all pages were migrated:

# Crawl old site
curl -X POST \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://old-site.com", "depth": 5}' \
  https://api.example.com/api/crawls

# After migration, crawl new site
curl -X POST \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://new-site.com", "depth": 5}' \
  https://api.example.com/api/crawls

# Generate and compare sitemaps

Content Inventory for CMS Migration

Crawl entire site and export sitemap for import into new CMS:

# Crawl entire site
curl -X POST \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://legacy-cms.com", "depth": 10}' \
  https://api.example.com/api/crawls

# Export sitemap for import into new CMS
curl -H "Authorization: Bearer sk_your_api_key" \
  https://api.example.com/api/v1/crawls/crawl_abc/sitemap > content-inventory.xml

AI-Powered SEO Optimization

Use AI agents to analyze sitemaps and provide SEO recommendations:

"Let me analyze the site structure. I'll crawl the site first, then generate a sitemap to identify which pages are being indexed and recommend improvements."

// Agent uses crawl_create
{
  "url": "https://example.com",
  "depth": 3,
  "format": "markdown",
  "wait": true
}

// Then uses crawl_sitemap
{
  "id": "crawl_xyz"
}

// Agent analyzes sitemap to find:
// - Missing canonical tags
// - Pages with low priority
// - Orphan pages
// - Deep URL structure issues

Pro tip

Combine sitemap generation with the pages endpoint to get both structure and content. Use the sitemap to identify pages of interest, then fetch their content via the pages API.

Error Handling

REST API Errors

Status	Description
`401`	Invalid or missing API key
`403`	Key lacks `crawls:read` permission
`404`	Crawl ID doesn't exist or crawl not completed
`404`	Page number out of range
`500`	Server-side error

MCP Tool Errors

Error	Description
`unauthorized`	Invalid or missing API key
`forbidden`	Key lacks required permissions
`not_found`	Crawl ID doesn't exist
`page_not_found`	Page number out of range
`no_results`	Crawl has no successful URLs

Rate Limiting

Sitemap requests count toward standard API rate limits
Recommended: Cache sitemap responses locally for completed crawls
Use webhooks to regenerate sitemaps when crawl completes

Next Steps

Creating Crawl Jobs

Learn how to create crawl jobs for sitemap generation

MCP Integration

Use sitemaps with AI agents via MCP

API Key Permissions

Manage API key permissions for sitemap access