Auditite
Back to glossary
Technical SEO

Robots.txt

Learn what a robots.txt file is, how it controls search engine crawling behavior, and how to configure it properly for better SEO results.

Robots.txt is a plain text file placed at the root of your website that provides instructions to search engine crawlers about which parts of your site they should or should not access. It follows the Robots Exclusion Protocol and is the first file bots check before crawling your site. While it suggests crawling behavior, it does not guarantee that blocked pages will be excluded from search results.

Why It Matters for SEO

Robots.txt plays a critical role in managing your crawl budget. By directing bots away from low-value areas such as admin panels, staging environments, duplicate content, or faceted navigation pages, you ensure crawlers spend their limited time on pages that actually matter for your rankings. Misconfigured robots.txt files are one of the most common causes of indexing problems, capable of accidentally blocking entire sections of a site from search engines.

How to Configure Robots.txt

Place the file at your domain root so it is accessible at your domain followed by /robots.txt. Use User-agent directives to target specific bots and Disallow rules to block paths. The Allow directive can override broader disallow rules for specific sub-paths.

Reference your XML sitemap at the bottom of the file using the Sitemap directive. This helps search engines discover your sitemap without needing to visit Google Search Console.

For pages you want excluded from search results entirely, use a noindex meta tag instead of robots.txt. Blocking via robots.txt prevents crawlers from seeing the noindex directive, which can result in the page still appearing in search results.

Common Mistakes

  • Blocking CSS and JavaScript files: Modern search engines need to render pages to understand them. Blocking render-critical resources breaks JavaScript rendering and harms your rankings.
  • Using robots.txt to hide pages from search results: Blocked pages can still be indexed if other sites link to them. Use noindex for deindexing.
  • Leaving a disallow-all directive from staging: After launching a site, verify the robots.txt does not still contain a blanket disallow from the development phase.
  • Not testing changes: Use the robots.txt tester in Google Search Console before deploying updates to production.
  • Forgetting trailing slashes: Disallow /admin blocks /admin, /admin/, and /admin/page, while Disallow /admin/ only blocks paths under the directory.

A properly configured robots.txt file is foundational to technical SEO and sets the stage for efficient crawling and indexing.

See it in action

Learn how Auditite puts robots.txt into practice.

Explore Technical SEO

See how Auditite handles this

Get started and see the platform in action.

Get started

Get insights delivered weekly

Join teams who get actionable playbooks, benchmarks, and product updates every week.