Robots.txt Configuration Template with Auditite
Configure robots.txt correctly with this template. Covers directives for major crawlers, common CMS patterns, and testing procedures.
Overview
The robots.txt file controls which parts of your site search engine crawlers can access. A misconfigured robots.txt can block important content from being indexed or waste crawl budget on low-value pages. This template provides standard configurations for common site types.
Robots.txt Syntax Reference
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Specifies which crawler the rules apply to | User-agent: Googlebot |
| Disallow | Blocks a path from crawling | Disallow: /admin/ |
| Allow | Explicitly allows crawling (overrides Disallow) | Allow: /admin/public/ |
| Sitemap | Points to your XML sitemap | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay | Requests delay between crawls (not honored by Google) | Crawl-delay: 10 |
Key Rules:
- Robots.txt must be at the root:
https://example.com/robots.txt - Directives are case-sensitive for paths
- More specific paths take precedence over less specific
- An empty
Disallow:means “allow everything” - A missing robots.txt means “allow everything”
Standard Template
# Robots.txt for [your site]
# Updated: [date]
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /search?
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /thank-you/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Allow: /
Sitemap: https://example.com/sitemap.xml
CMS-Specific Templates
WordPress
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-json/
Disallow: /*?s=
Disallow: /*?replytocom=
Disallow: /tag/*/page/
Disallow: /author/
Sitemap: https://example.com/sitemap_index.xml
Shopify
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /account
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*?variant=
Disallow: /*?sort_by=
Disallow: /*?filter=
Sitemap: https://example.com/sitemap.xml
E-commerce (General)
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /wishlist/
Disallow: /compare/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
Disallow: /*?page=
Allow: /
Sitemap: https://example.com/sitemap.xml
What to Block vs What to Allow
| Content Type | Block? | Reason |
|---|---|---|
| Admin/login pages | Yes | No SEO value, security risk |
| Search results pages | Yes | Thin/duplicate content |
| Cart/checkout | Yes | No SEO value, personal data |
| User account pages | Yes | Personal data, no SEO value |
| Faceted navigation URLs | Yes | Duplicate content, crawl waste |
| Sort/filter parameters | Yes | Duplicate content |
| API endpoints | Yes | Not for public consumption |
| CSS/JS files | No | Google needs these for rendering |
| Images | No | Images need indexing |
| PDF/documents | No | Often valuable content |
| Blog content | No | Core SEO content |
| Product pages | No | Core SEO content |
AI Crawler Configuration
| Crawler | User-agent | Purpose | Recommendation |
|---|---|---|---|
| GPTBot | GPTBot | OpenAI training data | Block if desired |
| Google-Extended | Google-Extended | Gemini training data | Block if desired |
| CCBot | CCBot | Common Crawl | Block if desired |
| Anthropic | anthropic-ai | Claude training data | Block if desired |
| Bytespider | Bytespider | TikTok/ByteDance | Block if desired |
# Optional: Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
Testing Checklist
- File is accessible at
/robots.txt(returns 200 status) - Syntax is valid (no typos in directives)
- Important content is NOT blocked
- Low-value pages ARE blocked
- Sitemap URL is included and valid
- Tested with Google’s robots.txt tester (Search Console)
- No accidental
Disallow: /blocking the entire site - CSS and JS files are not blocked (needed for rendering)
- Images directory is not blocked
Audit Worksheet
| URL Pattern | Currently Blocked? | Should Be Blocked? | Action Needed |
|---|---|---|---|
| / (entire site) | No | ||
| /admin/ | Yes | ||
| /blog/ | No | ||
| /search? | Yes |
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
Blocking entire site (Disallow: /) | Nothing gets indexed | Remove or specify paths |
| Blocking CSS/JS | Rendering issues, poor evaluation | Allow CSS/JS directories |
| Blocking images | No image search traffic | Allow image directories |
| Not including sitemap | Slower discovery of new pages | Add Sitemap directive |
| Using robots.txt for noindex | Does not remove from index | Use meta noindex instead |
| Space before colon | Directive may not be parsed | Remove spaces |
Auditite validates your robots.txt configuration during every audit, flagging blocked important content, missing sitemaps, and syntax errors.
Related templates
Crawl Budget Calculator: Automated SEO Workflow
Calculate and optimize your site's crawl budget allocation. Includes formulas for crawl rate, waste identification, and priority page coverage.
TemplateRedirect Mapping Template with Auditite
Plan and track URL redirects during site migrations or restructures. Includes redirect type selection, chain detection, and validation worksheets.
ChecklistSchema Markup Implementation Checklist for SEO
Implement structured data correctly with this schema markup checklist. Covers all major schema types, validation steps, and common implementation errors.
Want the how-to behind this template?
Check out our playbooks for step-by-step audit process guides.