Last Updated: March 2026

What Is Crawl Budget?

Crawl budget is the practical limit on how many URLs a search engine crawler will be able to fetch (crawl capacity limit) and want to fetch (crawl demand). Google defines it per hostname and it becomes a constraint primarily for large sites (1M+ pages) or sites with 10K+ pages that change daily. The highest-leverage optimization strategy is reducing crawl waste — eliminating low-value URL variants, fixing redirect chains, and improving server response times.

1. The Two Components of Crawl Budget

Google's crawl budget consists of two independently computed components that intersect to determine actual crawl behavior:

Component	Question It Answers	Influenced By
Crawl Capacity Limit	"How hard can Google crawl without breaking the site?"	Server response time, error rates, sustained responsiveness
Crawl Demand	"How much does Google want to crawl?"	Perceived inventory, popularity, staleness, site events

Critical distinction: Crawling is retrieval; indexing is evaluation. Being crawled does not guarantee being indexed. Statuses like "Crawled – currently not indexed" reflect quality assessment or canonicalization — not crawl budget limits.

Google defines crawl budget at the hostname level — www.example.com and code.example.com have separate budgets. Architecture choices (subdomains vs subfolders) can change how crawl resources are partitioned.

2. When Crawl Budget Actually Matters

Google provides explicit thresholds for when crawl budget optimization becomes relevant:

1M+ unique pages with content that changes moderately often (weekly)
10K+ unique pages with rapidly changing content (daily)
Large % of URLs classified as "Discovered – currently not indexed"

Sites with fewer than ~1,000 pages generally don't need crawl budget optimization. The common "10K+ pages" heuristic is conditionally correct — but the condition (rapid change and/or indexing backlog) is crucial.

Signals That Crawl Budget Is Constraining You

⚠ Priority URLs stuck in "Discovered – currently not indexed"
⚠ High crawl volume on low-value templates (filters, sort permutations) while important pages have low crawl frequency
⚠ Excessive redirect chains inflating crawl requests (each hop counts separately)
⚠ Many soft-404s and thin/duplicate URLs being re-crawled

3. Crawl Budget Optimization Techniques

Technique	Impact	Effort
Remove thin/duplicate pages	High — reduces perceived inventory	Med
Fix redirect chains (A→C not A→B→C)	High — each hop counted	Med
Server performance (TTFB, errors)	High — directly increases capacity	High
Internal linking (`<a href>`)	Med-High — faster discovery	Med
XML sitemap optimization	Med — improved discovery/refresh	Low
robots.txt blocking	High for capacity relief	Low
Canonical tags	Med-High — reduces duplicates	Med
Pagination best practices	Med — prevents crawl traps	Med

Prioritized Workflow

High Impact, First Sprint

1. Build URL inventory segmented by template & value — identify largest crawl sinks
2. Fix systemic redirect chains and internal links pointing to redirected URLs
3. Resolve soft-404 patterns (return real 404/410s or add proper content)
4. Implement faceted navigation controls (robots disallow for non-index targets)

Parallel Engineering Track

5. Improve server stability and response times — reduce DNS/network errors and 5xx
6. Ensure internal linking uses crawlable <a href> patterns

Quick Wins

7. Rebuild sitemaps — canonical, indexable URLs only; accurate <lastmod>; split by template
8. Audit pagination — unique URLs, self-canonical per page, no fragments

4. Noindex vs Robots.txt vs 404: Decision Guide

Method	Use When	Crawl Budget Impact
robots.txt	"Don't crawl at all" — URLs never needed for search	Saves crawl requests
noindex	Page must exist for users but shouldn't appear in search	No savings — Google must crawl to see noindex
404/410	Content truly removed — want URL to stop being crawled	Strong signal not to re-crawl

5. Measurement: KPIs and Data Sources

Use a "before vs after" baseline (minimum 2–4 weeks each side) to measure the impact of crawl budget changes:

Crawl requests/day — overall and by template (from Search Console Crawl Stats)
Average response time + host status — verify server improvements reflect in crawling
Discovery vs Refresh mix — spikes in discovery indicate improved discoverability
Crawl waste share — redirects, 4xx, soft-404 as % of total crawl
"Discovered – not indexed" trend — reductions signal improved capacity
Time-to-first-crawl — track with URL Inspection for new/updated pages

Data Sources

Search Console Crawl Stats: Macro trends, bots, response types, response-time trends
Server logs: Ground truth — which URLs are actually fetched and how often
Page Indexing report: Ties crawlability to indexing outcomes
URL Inspection: Per-URL verification (rendered HTML, indexing status, canonical)

What This Means for You

If your site has thousands of pages, start with Clickcentric's technical SEO checklist to identify crawl waste. Our schema markup and internal linking features help ensure every important page is discoverable without wasting crawl budget on low-value URLs. Start free.

Related Guides

Frequently Asked Questions

Crawl budget is the practical limit on how many URLs a search engine crawler will (a) be able to fetch without harming your servers (crawl capacity limit) and (b) want to fetch because the content appears valuable and fresh (crawl demand). It's defined per hostname.

Google says crawl budget matters for: (1) very large sites with 1M+ unique pages that change moderately often, (2) sites with 10K+ pages that change daily, or (3) sites with many URLs stuck in 'Discovered – currently not indexed.' For sites under 1,000 pages, it's generally not a concern.

No. Google explicitly separates crawling (retrieval) from indexing (evaluation and consolidation). Being crawled means Google fetched your page, but it may decide not to index it based on quality, duplication, or canonicalization signals.

Robots.txt blocking prevents crawling of specified URLs, which reduces crawl requests. However, Google notes this doesn't automatically 'shift' the freed capacity to other pages unless Google is already hitting your site's serving limit.

No — noindex doesn't save crawl budget because Google must crawl the page to see the noindex directive. Use robots.txt to prevent crawling entirely, or use 404/410 for truly removed content. Reserve noindex for pages that must be accessible to users but shouldn't appear in search.

No. Googlebot does not process the non-standard crawl-delay directive in robots.txt. To manage crawl rate, improve server response times and use Search Console's crawl rate settings.

Ready to Scale Your SEO?

Generate optimized content, review it with SEO checks, and publish to WordPress from one workflow.

Start 3-Day Free Trial