Template SEO Manager

Robots.txt Configuration Template with Auditite

Configure robots.txt correctly with this template. Covers directives for major crawlers, common CMS patterns, and testing procedures.

Overview

The robots.txt file controls which parts of your site search engine crawlers can access. A misconfigured robots.txt can block important content from being indexed or waste crawl budget on low-value pages. This template provides standard configurations for common site types.

Robots.txt Syntax Reference

Directive	Purpose	Example
User-agent	Specifies which crawler the rules apply to	`User-agent: Googlebot`
Disallow	Blocks a path from crawling	`Disallow: /admin/`
Allow	Explicitly allows crawling (overrides Disallow)	`Allow: /admin/public/`
Sitemap	Points to your XML sitemap	`Sitemap: https://example.com/sitemap.xml`
Crawl-delay	Requests delay between crawls (not honored by Google)	`Crawl-delay: 10`

Key Rules:

Robots.txt must be at the root: https://example.com/robots.txt
Directives are case-sensitive for paths
More specific paths take precedence over less specific
An empty Disallow: means “allow everything”
A missing robots.txt means “allow everything”

Standard Template

# Robots.txt for [your site]
# Updated: [date]

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /search?
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /thank-you/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Allow: /

Sitemap: https://example.com/sitemap.xml

CMS-Specific Templates

WordPress

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-json/
Disallow: /*?s=
Disallow: /*?replytocom=
Disallow: /tag/*/page/
Disallow: /author/

Sitemap: https://example.com/sitemap_index.xml

Shopify

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /account
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*?variant=
Disallow: /*?sort_by=
Disallow: /*?filter=

Sitemap: https://example.com/sitemap.xml

E-commerce (General)

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /wishlist/
Disallow: /compare/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
Disallow: /*?page=
Allow: /

Sitemap: https://example.com/sitemap.xml

What to Block vs What to Allow

Content Type	Block?	Reason
Admin/login pages	Yes	No SEO value, security risk
Search results pages	Yes	Thin/duplicate content
Cart/checkout	Yes	No SEO value, personal data
User account pages	Yes	Personal data, no SEO value
Faceted navigation URLs	Yes	Duplicate content, crawl waste
Sort/filter parameters	Yes	Duplicate content
API endpoints	Yes	Not for public consumption
CSS/JS files	No	Google needs these for rendering
Images	No	Images need indexing
PDF/documents	No	Often valuable content
Blog content	No	Core SEO content
Product pages	No	Core SEO content

AI Crawler Configuration

Crawler	User-agent	Purpose	Recommendation
GPTBot	GPTBot	OpenAI training data	Block if desired
Google-Extended	Google-Extended	Gemini training data	Block if desired
CCBot	CCBot	Common Crawl	Block if desired
Anthropic	anthropic-ai	Claude training data	Block if desired
Bytespider	Bytespider	TikTok/ByteDance	Block if desired

# Optional: Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

Testing Checklist

File is accessible at /robots.txt (returns 200 status)
Syntax is valid (no typos in directives)
Important content is NOT blocked
Low-value pages ARE blocked
Sitemap URL is included and valid
Tested with Google’s robots.txt tester (Search Console)
No accidental Disallow: / blocking the entire site
CSS and JS files are not blocked (needed for rendering)
Images directory is not blocked

Audit Worksheet

URL Pattern	Currently Blocked?	Should Be Blocked?	Action Needed
/ (entire site)		No
/admin/		Yes
/blog/		No
/search?		Yes

Common Mistakes

Mistake	Impact	Fix
Blocking entire site (`Disallow: /`)	Nothing gets indexed	Remove or specify paths
Blocking CSS/JS	Rendering issues, poor evaluation	Allow CSS/JS directories
Blocking images	No image search traffic	Allow image directories
Not including sitemap	Slower discovery of new pages	Add Sitemap directive
Using robots.txt for noindex	Does not remove from index	Use meta noindex instead
Space before colon	Directive may not be parsed	Remove spaces

Auditite validates your robots.txt configuration during every audit, flagging blocked important content, missing sitemaps, and syntax errors.

Related templates

Calculator

Want the how-to behind this template?

Check out our playbooks for step-by-step audit process guides.

Browse playbooks View pricing