Posts

robots txt file

The Importance of a Robots TXT File for your Website 


Increasing traffic to your website and staying on top of SEO is important for your business, but we know just how much of a rigorous process it can be. It takes a lot of analysis and patience, and between all the Google tools and intricate coding, the technicality of it all can get confusing and even overwhelming sometimes. To stay on top, the job is never done!

But what if we told you there is something that could make it a little bit easier? It’s called a robots.txt file, and in a nutshell, it’s a useful tool that gives you control on how search engines navigate or “crawl” through your site, and allows them to do so more efficiently.

It might sound intimidating, but it’s actually not as complicated as it seems, and you don’t have to be super tech-savvy to take advantage. Here we will break it all down for you, and explain exactly what a robots txt file is and how you can use it. Get ready—it’s a game-changer!

 
What is a robots.txt file?

The robots.txt file is a text file that tells web robots, usually search engine robots like Googlebot, which pages to crawl through on a website, and which ones not to crawl. Also referred to as the robot exclusion protocol, it is part of a larger group of web standards that controls how robots crawl the web, access and interpret content, and display that content to users. In this case with the robots.txt file, this is done by “disallowing” or “allowing” the actions of certain or all robots and user agents (web crawling software). 

This is the basic body of the file:

User-agent: *
Disallow: /wp-admin/


The asterisk after “user-agent” means that the file applies to all web robots that visit the site, and the slash after “disallow” informs robots which pages they can’t, or aren’t allowed to follow—which, in this case means only the admin login page of this WordPress site (we know it’s a WordPress site because of the /wp-admin). 

This is a complete robots.txt file on its own, but you can get more specific and add more lines of directives if needed. We’ll touch more on this later. 

Consider the robots txt file to be a function that is meant to prevent your site from being overloaded with requests. It is not meant to stop your website from showing up on Google. There are other functions to keep your site out of Google, such as “no indexing” the site.

Why is the Robots File Important?

Along with keeping your website running smoothly, there are many other benefits to having control over site crawlers. Some of these could include:

  • Keeping parts of your website private (for example, an internal or admin login page, or a duplicate printer-friendly version of a page).
  • Restricting a page on your site that you don’t want users to access unless they take a specific action.
  • Linking to your XML sitemap(s).
  • Preventing certain files such as images or PDFs from getting indexed and showing up on search engines.
  • Implementing a crawl delay to prevent your servers from being overloaded when multiple pieces of content are being loaded at once.


And that’s just to name a few! It is also worth considering that if you don’t give directions with a robots.txt file, search engines will have to take more time to unnecessarily crawl through every single page on your site, which will slow them down and make your site seem less reliable. All of this could put you lower in the search results. So why waste time? If you make it easier for the search bots, you’ll be rewarded.
 

How to Find and Create a Robots.txt File

Now that we’ve piqued your interest, it’s time to get your robots.txt file started.

The first thing is to know how to access it. All that you have to do is enter the URL of your website into the address bar and add “/robots.txt” to the end—like this: “ideazone.ca/robots.txt.” Head to the page, and you’ll find that it is either empty (meaning you don’t have a file) or that you already have a robots.txt file.

If you don’t have one, not to worry; it’s easy to create one from scratch. You can use any plain text editor, such as Notepad, TextEdit, or Editpad.org, and enter the lines of code there. 


PRO TIP: Note that using a program like Microsoft Word is a big NO, as it could add additional code into the text that you don’t want.


If you do have a robots.txt file already, you can find it in your site’s root directory by going to your hosting account website, logging in, and heading to the file management section of your webpage.

Now, it’s time to begin with the basic format that we mentioned above. But you’re going to want to be more specific and only disallow certain pages. To do that, you would simply use the URL of the page, like this:

User-agent: *
Disallow: /wp-admin/

 

Notice that each command/directive is on a separate line.

With Googlebot, you can allow certain pages, as well:


User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-example.php


Many robots txt files look just like this—short and simple. But once you get familiar with the syntax and language, you can get more advanced and include many more directives if you wish. Adding your sitemap and a crawl-delay function could be a good idea. To learn more about the technical syntax, Google is always a great resource.

 

We have just scratched the surface here, but hopefully this has given you a thorough introduction and encouraged you to set up a robots.txt file for your website. Whether it’s a text file you need, website design, or to develop an effective digital marketing strategy, our team of experts at IdeaZone can help you build a high-ranking website that delivers results. Call or email us today to learn how we can give your business a boost.

Do I Really Need to Upgrade my Website from HTTP to HTTPS?

The short answer is a resounding YES!

HTTPS stands for HyperText Transfer Protocol Secure and its purpose is to secure communication on the internet. By securing communication, we’re referring to protecting the privacy and integrity of data that is exchanged while users are visiting a website.  If the data is not secured, then hackers are able to intercept data that is being transmitted through these insecure (i.e. HTTP not HTTPS) websites.

When HTTPS first came onto the scene, it was mostly ecommerce websites that needed to worry about securing their websites. This was because these websites were processing credit cards and collecting other important personal information about their visitors. It made sense to encrypt personal data over ecommerce sites but, as the internet progressed, it made more and more sense to make sure that every website was secure.  And that’s where Google stepped in.

Google has long been making recommendations that webmasters should make the switch to HTTPS. They were even going so far as to say that it was a ranking factor back in 2014 insomuch as they would give an HTTPS enabled website precedence in the search results.  Then, in July of 2018 Google made it mandatory that all websites should have HTTPS.  And by mandatory, they meant that they were going to start calling out and downgrading non HTTPS sites.

Now, Google doesn’t own the internet and so it’s really up to the website owner if they want to change over to HTTPS or not. What Google does own, however, is the largest search engine in the world (well, 2 of the largest search engines if you include YouTube) as well as one of the most popular web browsers, Google Chrome. That kind of means that if you want to get traffic to your site, you had better listen to what Google is telling you to do.

So how does HTTP/HTTPS affect search engine results?

We’ve already stated above that Google has said that they will favour HTTPS sites over HTTP sites in the search results.  This translates into less and less HTTP sites appearing in the top search result pages.  Google’s ultimate goal is to ensure that the websites they are showing in the search results are secure (and relevant). The only reason that you’ll still see insecure sites for any given search query is because, even though the site is insecure, Google still thinks that it will match the searcher’s query better than another secure site further down its index.

How does a non-HTTPS enabled site affect the user?

Most (if not all) web browsers these days have some type of warning that is meant to inform the user that the site they are visiting is insecure.  Different browsers show these warnings in different ways.  Apple’s Safari browser is one of the browsers that shows the least in the way of warnings for insecure sites, Mozilla Firefox is somewhere in the middle while Google’s Chrome browser does its best to put it front and center.

In the Firefox browser, the warning is not super prevalent as you can only see it by way of a shield, which, if you didn’t know what it was, you would probably just ignore. Clicking on that shield, however, tells the user a little more information about the insecure site they’re visiting. In this instance, the connection is not secure (not HTTPS enabled) and the browser is blocking some content on the site. This is usually things like Google Analytics as well as some social media platforms that use some type of tracking.

Firefox shows a bit more of a pronounced error when the site has HTTPS enabled but is still not secure because some of the content (oftentimes images) is being served from insecure sources. This lock with the exclamation mark is what warns users that the site they are visiting has insecure content.

Some web browsers – such as Google’s Chrome browser – warns the user that they are about to visit an insecure site. This can come by way of a large interstitial warning page such as:

If a user comes across this page, they are not likely going to click the Advanced button and then hit the option to proceed to your website.  Would you?  Even if the page above doesn’t show up, the user will still see an exclamation mark beside “Not Secure” to the left of the URL (i.e domain name) in the browser bar.  Below are a couple more warnings that Google Chrome users might see when they are visiting an insecure site.  Just more signs telling the user not to proceed to the site…

 

So What Does All This HTTP(B)S Mean to Business Owners?

In a nutshell, it means that your business is losing customers and your brand is losing trust. From a user’s perspective, if they can’t trust your website, why would they take the leap and trust your business?

Is Your Website Insecure?

If you currently own a business that has a website online, then visit your site online to see what it looks like in different browsers. Look at your website through the eyes of one of your customers. In browsers where you don’t automatically see the starting HTTP / HTTPS of the website URL, then double click on the website URL (as if you’re going to copy/paste the link) and it should show up at that point.

Upgrading and securing your website is not as hard as you might think (if that’s what you do for a living).  Once we know what your site is built with (ie. custom coded, WordPress, Wix etc), then we can give you a pretty good idea of how best to make the change.

Get in touch with us today to find out how we can help secure your website.