Robots.txt Best Practices Guide with Auditite
Master robots.txt configuration to control crawl access, protect sensitive pages, and optimize crawl budget allocation.
Overview
The robots.txt file controls which parts of your site search engine crawlers can access. A misconfigured robots.txt can block critical pages from indexing or waste crawl budget on low-value URLs. This guide covers best practices for every common scenario.
Step 1: Audit Your Current Robots.txt
- Access your robots.txt at
yourdomain.com/robots.txt. - Test it using Google Search Console’s robots.txt tester.
- Verify no critical pages or resources are accidentally blocked.
- Check that CSS and JavaScript files needed for rendering are not blocked.
Step 2: Understand the Syntax
Basic Directives
| Directive | Purpose |
|---|---|
| User-agent | Specifies which crawler the rules apply to |
| Disallow | Blocks the specified path from crawling |
| Allow | Overrides a Disallow for a more specific path |
| Sitemap | Points to your XML sitemap location |
| Crawl-delay | Requests a delay between requests (not honored by Google) |
Rules of Precedence
- More specific paths override less specific paths.
- Allow takes precedence over Disallow when path lengths are equal.
- Rules are case-sensitive for paths.
- Wildcards (*) match any sequence of characters.
- The $ character indicates the end of a URL.
Step 3: Common Configuration Patterns
Block Admin and Internal Pages
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /internal/
Block Search Results and Filters
Disallow: /search
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*&page=
Allow Specific Resources Within Blocked Directories
Disallow: /private/
Allow: /private/public-page/
Block Specific File Types
Disallow: /*.pdf$
Disallow: /*.doc$
Step 4: Crawl Budget Optimization
- Block URL parameters that create duplicate content (sorting, session IDs, tracking parameters).
- Block internal search result pages — these rarely provide SEO value and can generate infinite URLs.
- Block paginated filter combinations that you handle with canonical tags.
- Do not block pages you want to noindex — use meta robots instead. Blocking with robots.txt prevents Google from seeing the noindex directive.
Step 5: Common Mistakes to Avoid
Blocking CSS and JavaScript
Google needs to render your pages to evaluate them. Blocking CSS or JS files in robots.txt prevents rendering and can hurt rankings.
Using Robots.txt for Deindexing
Robots.txt prevents crawling, not indexing. If a page has external links pointing to it, Google may still index it without crawling it. Use noindex meta tags for deindexing.
Overly Broad Disallow Rules
A Disallow: / blocks your entire site. A Disallow: /blog blocks both /blog/ and /blogging-tips/. Always include trailing slashes for directories.
Forgetting the Sitemap Directive
Always include your sitemap location at the bottom of robots.txt:
Sitemap: https://yourdomain.com/sitemap.xml
Step 6: Testing and Monitoring
- After any robots.txt change, test the updated file in Google Search Console.
- Verify your most important pages are not blocked using the URL Inspection tool.
- Monitor Google Search Console’s crawl stats for changes in crawl rate after robots.txt updates.
- Use Auditite to audit your robots.txt configuration alongside your full site crawl.
- Keep a version history of your robots.txt changes so you can roll back if issues arise.
Robots.txt for Multiple Environments
Ensure your staging and development environments block all crawlers:
User-agent: *
Disallow: /
Remove this block before launching any environment to production. Accidentally launching with a full Disallow is one of the most common SEO deployment mistakes.
Related playbooks
Canonical URL Guide: Automated SEO Workflow
Master canonical tags to prevent duplicate content issues, consolidate link equity, and control which URLs appear in search.
PlaybookCrawl Budget Optimization Playbook with Auditite
Maximize search engine crawl efficiency by directing crawl budget to your most valuable pages and reducing waste.
ChecklistHTTPS Migration Checklist with Auditite
Complete checklist for migrating from HTTP to HTTPS without losing search rankings, traffic, or link equity. Step-by-step guidance.
Stop copy-pasting. Start automating.
Auditite turns playbooks into live audit workflows. Get started to see how.