• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Tech Help Canada

Tech Help Canada

Effective online marketing services and resources

  • Marketing
  • Company
  • Contact
  • Explore Web Services
  • Log In
  • 1866-215-5001
  • SEO
  • Copywriting
  • Website Maintenance
  • Business Directory
  • Domain Names
  • Blog
  • Free SEO Tool
  • More
    • Website Builder
    • Managed WordPress Hosting
    • Website Security
    • Web Hosting
    • Business Email
    • Email Marketing
    • Resources

Robots.txt: Understanding Commands With Examples

July 1, 2018 by Gabriel Nwatarali Leave a Comment

Tweet
Pin
Share
Share
WhatsApp
Reddit
Email
27 Shares

Last Updated on June 12, 2019

robots.txt

Robots.txt, I’m sure you’ve heard of it or maybe not. Either way, this guide will show you how to use the robots exclusion standard or protocol for your website.

So, what is it?

It’s a file that Webmasters use to communicate with web crawlers and bots (typically search engine robots). The robots exclusion standard specifies rules or language that can be used to regulate how a bots access, crawl and index information from a website.

Using this file, you can limit access to the information on your websites such as preventing the crawling of specific directories, subdirectories and web pages. You can even prevent certain user-agents (another term for bot software) from crawling your whole site. Neat, right?

How to Use Robots.txt With Examples

Imagine that you’ve got a massive website with millions of web pages and plenty of regular visitors. Your content changes frequently and thousands of new pages are generated by your users daily. Feeling happy? Good, you should.

The only debacle here is that when web servers handle too many requests at the same time, they can become overwhelmed and temporarily take your website offline. Not good!

Luckily, you can reduce the risk of server-overwhelm by using the robots.txt file. Create a new .txt file and name it Robots.txt. Then enter the following.

User-agent: Bingbot
Crawl-delay: 60

The user-agent directive is where you specify the bot name, which you can get from this list of crawlers. Crawl-delay tells the bot to wait a minimum amount of time between crawl requests. Essentially, your website is saying something like ‘hey robot, please wait 60 seconds before crawling another web page’.

Using an asterisk as the user agent means that your commands should apply to every robot.

User-agent: *

But what if we also wanted to prevent access to specific web pages or directories?

In this case, we would use the Disallow and Allow commands. The former refuses access while the latter allows it. So if we wanted to deny access to a particular web page, we would add the following to our robots.txt file.

Disallow: /directory
Disallow: /dir/web-page.html

To allow crawling:

Allow: /directory
Allow: /dir/web-page.html

Additionally, you can list your sitemaps in the robots.txt file. A sitemap contains a list of pages on a given site. They can also be used to pass extra details about web pages to bots such as a page’s priority (importance), last modified date, and change frequency.

To specify one or more in your robots.txt file, use the Sitemap command like so:

Sitemap: https://domain.com/sitemap.xml

We’ll be diving deeper into sitemaps in another article for our on-page series later but in general, sitemaps are no longer as important for SEO, providing that you are properly interlinking your web pages.

So our complete robots.txt file would look somewhat like this:

#Specific to Binbot (use a # to write comments in your file).

User-agent: Bingbot
Crawl-delay: 60
Disallow: /dir/web-page.html
Allow: /directory

#Any bot that will read and respect the directives within this file.

User-agent: *
Crawl-delay: 120
Disallow: /directory
Allow: /dir/web-page.html
Sitemap: https://domain.com/sitemap.xml

For a live example, please see Twitter’s robots.txt file. And if you have questions, drop them in the comments.

Important Things to Know About Robots.txt

Like with many things in technology, there’re a few things you should keep in mind when using ronots.txt.

  1. For your file to be found, it needs to be in a primary directory (e.g. domain.com/robots.txt).
  2. Each domain and subdomain uses separate robots.txt.
  3. Don’t get creative with the file name because it’s case sensitive.
  4. Malicious bot software or some user agents may ignore your robots.txt.
  5. The file is publicly accessible, meaning that anyone can view it.
  6. Some search engines use multiple user-agents. So make sure you’re targeting all of them.
  7. Always triple check to ensure that you’re not blocking any content that you want to be crawled.
  8. Links from pages that you’ve blocked via robots.txt will not be followed and no link equity (or authority) will be passed.
  9. Don’t use robots.txt to prevent access to sensitive data because all a malicious bot has to do is ignore it. Hence, it’s not a secure option. You should try something else.
  10. The Allow command is only used by Google.
  11. Every directive should be written on a separate line.

Do I Need a Robots.txt File?

If you’re thinking that it’s easier to just let the bots crawl everything, then yes, you’re right. But sometimes it’s not that simple. There are several reasons why you may want to use a robots.txt file.

  • You can prevent duplicate content from appearing on the SERPs (search engine results pages).
  • It can be a quick way of submitting your sitemaps or ensuring that search engine bots always read them.
  • For the most part, it can keep certain sections of your website private.
  • It can prevent certain documents from showing up on a public SERP.
  • The crawl-delay command is useful for reducing the likelihood of server overload.

A robots.txt file is still an important tool for SEO.

Understanding Robots.txt commands with use casesClick To Tweet

>> RELATED: Duplicate Content And How It Affects SEO

Some of the content on our website may contain affiliate links. You can read our affiliate disclosure here.

  • Good Meta Description Examples And FundamentalsGood Meta Description Examples And Fundamentals
  • Are Robots Replacing Human Jobs? How to Remain Relevant TodayAre Robots Replacing Human Jobs? How to Remain…
Tweet
Pin
Share
Share
WhatsApp
Reddit
Email
27 Shares

Category iconSEO

About Gabriel Nwatarali

I'm the founder of Tech Help Canada and a digital marketing specialist. I help people succeed with valuable insights and marketing services. When I'm not working, my favourite thing is to enjoy the outdoors or spend time with family. If you liked this post, please consider sharing it.

Book a Discovery Call

Schedule a video or phone call to assess your needs. Whether you require help with copywriting, SEO, website maintenance, design, or other, we can help.


BOOK A CALL

Footer

  • 8.8ReviewerRating
  • 10UsersUser Reviews
  • Shopify Review For Getting Started With Your Online Store
    9.3
  • WordPress Review – Pros and Cons of Automattic’s WordPress
    9.5

More Services

Domains
Website Builder
Web Hosting
Website Security
Marketing Tools
Business Email
Managed Wordpress Hosting
Resources

Tech Help Canada · Copyright © 2020

All posts or content is for informational purposes only. Tech Help Canada, it's authors, and any associates assumes no liability for actions taken in reliance upon the information contained herein. You use all information given, recommended or stated at your own risk.


Terms and Conditions
Privacy Policy
Cookie Policy
Disclaimer