Technical SEO

robots.txt Checker

robots.txt file is a simple text file that tells search engines which parts of the website they can and cannot crawl. Use this tool to check if you are blocking important pages from being crawled by search engines. Wouldn't it be a shame if you have the best content, but you are blocking search engines from even scanning your pages?

Loading tool…
PlaybookWant the strategy behind this tool? How to audit your robots.txt for SEO

What is a robots.txt file, and why is it important?

Robots.txt file is a simple text file that sits on the root folder of your website.

It is one of the simplest yet most talked about files on your website.

If your website is www.yourwebsite.com, the robots.txt file can be found by going to www.yourwebsite.com/robots.txt.

Contents of a typical robots.txt file.

Its main purpose is to instruct Google and other search engine bots on what pages and folders they are allowed to crawl or not to crawl. If you wish to keep certain parts of your website not crawlable to search engines, this is one of the first steps you would take. You can think of the robots.txt file as a wall around your website.

So do I need a robots.txt file for my website?

If you have a simple or smaller website with less than 50-100 pages, not having a robots.txt file will not hugely impact your site. You may not need one if:

  • Your site is small, simple, and has a simple site structure.
  • You want all of your content to be crawled and indexed by search engines.
  • You have nothing you want to be blocked for search engine crawlers.

It's worth pointing out that you will not get higher SEO rankings simply because you have a robots.txt file. But if you have sections you don't want crawled, create one - it only takes a few minutes. Robots.txt is one of the first files search engine crawlers scan, so having one is recommended.

How to create a robots.txt file

Most CMSs will automatically create a robots.txt file for you. For instance, if you are using WordPress, you can easily create a robots.txt file using any SEO plugin such as Yoast or RankMath.

WordPress SEO plugins automatically create robots.txt files for you.

If you want to create this file manually, simply create a text file named robots.txt and save it within the root folder of your website.

You can also create robots.txt file manually

Correct syntax of a robots.txt file

User-agent: [user-agent name]
Disallow: [URLs not to be crawled]

For example, if you have PDFs in /PDF-docs that shouldn't be crawled:

User-agent: *
Disallow: /PDF-docs/

If you want only Googlebot to skip a folder:

User-agent: Googlebot
Disallow: /PDF-docs/

To block only PDFs inside a folder:

User-agent: Googlebot
Disallow: /PDF-docs/*.pdf

The following instructs all search engine bots not to index any of the website's files or folders (by disallowing the root /):

User-agent: *
Disallow: /

Allow everything to be crawled by all search engines:

User-agent: *
Disallow:

As you can see, one incorrect syntax could block off your entire website from every single search engine crawler. So it makes sense to pay extra attention to the syntax of your robots.txt file.

Allow:

If you have sub-directories or files within a blocked directory you want crawled, the Allow directive is handy:

User-agent: Googlebot
Disallow: /PDF-docs/
Allow: /PDF-docs/important/

Crawl-delay:

You can specify how frequently search engines should crawl pages using crawl-delay: 20 - this instructs search engines to wait 20 seconds between page crawls.

Sitemap:

Always indicate the location of your XML sitemap file in your robots.txt:

Sitemap: https://yourwebsite.com/sitemap_index.xml

robots.txt Best Practices

Just because you block some pages with robots.txt, they may still get indexed by Google - especially if other pages link to them. To prevent indexing, use the noindex meta tag.

Always save your robots.txt file as robots.txt (all lower case) - never as Robots.txt or ROBOTS.txt.

robots file name matters

List of common user agents

  • Googlebot
  • Bingbot
  • msnbot
  • Slurp (Yahoo)
  • Baiduspider
  • DuckDuckBot
  • YandexBot
  • Facebot
  • Twitterbot
  • LinkedInBot
  • Applebot

External resources

https://www.robotstxt.org/

https://developers.google.com/search/docs/advanced/robots/intro

Want automated blog posts too?

SEOGraphy writes SEO-optimised articles, generates images, and publishes to WordPress - automatically.

Start Free - No Credit Card