Auditite
All templates
Template SEO Manager

Robots.txt Configuration Template with Auditite

Configure robots.txt correctly with this template. Covers directives for major crawlers, common CMS patterns, and testing procedures.

Overview

The robots.txt file controls which parts of your site search engine crawlers can access. A misconfigured robots.txt can block important content from being indexed or waste crawl budget on low-value pages. This template provides standard configurations for common site types.

Robots.txt Syntax Reference

DirectivePurposeExample
User-agentSpecifies which crawler the rules apply toUser-agent: Googlebot
DisallowBlocks a path from crawlingDisallow: /admin/
AllowExplicitly allows crawling (overrides Disallow)Allow: /admin/public/
SitemapPoints to your XML sitemapSitemap: https://example.com/sitemap.xml
Crawl-delayRequests delay between crawls (not honored by Google)Crawl-delay: 10

Key Rules:

  • Robots.txt must be at the root: https://example.com/robots.txt
  • Directives are case-sensitive for paths
  • More specific paths take precedence over less specific
  • An empty Disallow: means “allow everything”
  • A missing robots.txt means “allow everything”

Standard Template

# Robots.txt for [your site]
# Updated: [date]

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /search?
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /thank-you/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Allow: /

Sitemap: https://example.com/sitemap.xml

CMS-Specific Templates

WordPress

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-json/
Disallow: /*?s=
Disallow: /*?replytocom=
Disallow: /tag/*/page/
Disallow: /author/

Sitemap: https://example.com/sitemap_index.xml

Shopify

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /account
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*?variant=
Disallow: /*?sort_by=
Disallow: /*?filter=

Sitemap: https://example.com/sitemap.xml

E-commerce (General)

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /wishlist/
Disallow: /compare/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
Disallow: /*?page=
Allow: /

Sitemap: https://example.com/sitemap.xml

What to Block vs What to Allow

Content TypeBlock?Reason
Admin/login pagesYesNo SEO value, security risk
Search results pagesYesThin/duplicate content
Cart/checkoutYesNo SEO value, personal data
User account pagesYesPersonal data, no SEO value
Faceted navigation URLsYesDuplicate content, crawl waste
Sort/filter parametersYesDuplicate content
API endpointsYesNot for public consumption
CSS/JS filesNoGoogle needs these for rendering
ImagesNoImages need indexing
PDF/documentsNoOften valuable content
Blog contentNoCore SEO content
Product pagesNoCore SEO content

AI Crawler Configuration

CrawlerUser-agentPurposeRecommendation
GPTBotGPTBotOpenAI training dataBlock if desired
Google-ExtendedGoogle-ExtendedGemini training dataBlock if desired
CCBotCCBotCommon CrawlBlock if desired
Anthropicanthropic-aiClaude training dataBlock if desired
BytespiderBytespiderTikTok/ByteDanceBlock if desired
# Optional: Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

Testing Checklist

  • File is accessible at /robots.txt (returns 200 status)
  • Syntax is valid (no typos in directives)
  • Important content is NOT blocked
  • Low-value pages ARE blocked
  • Sitemap URL is included and valid
  • Tested with Google’s robots.txt tester (Search Console)
  • No accidental Disallow: / blocking the entire site
  • CSS and JS files are not blocked (needed for rendering)
  • Images directory is not blocked

Audit Worksheet

URL PatternCurrently Blocked?Should Be Blocked?Action Needed
/ (entire site)No
/admin/Yes
/blog/No
/search?Yes

Common Mistakes

MistakeImpactFix
Blocking entire site (Disallow: /)Nothing gets indexedRemove or specify paths
Blocking CSS/JSRendering issues, poor evaluationAllow CSS/JS directories
Blocking imagesNo image search trafficAllow image directories
Not including sitemapSlower discovery of new pagesAdd Sitemap directive
Using robots.txt for noindexDoes not remove from indexUse meta noindex instead
Space before colonDirective may not be parsedRemove spaces

Auditite validates your robots.txt configuration during every audit, flagging blocked important content, missing sitemaps, and syntax errors.

Want the how-to behind this template?

Check out our playbooks for step-by-step audit process guides.

Get insights delivered weekly

Join teams who get actionable playbooks, benchmarks, and product updates every week.