Surely, you have come across the term robots.txt when doing your research about Search Engine Optimization (SEO). Do not ignore this because this will play a significant role in your campaign.
What is robots.txt?
The robots.txt is created by website owners like you when you do not want to allow search robots to visit your site, as well as index your content. Or, you will use this file when you find the need to block them from accessing a particular web page or web pages. Do not make a mistake of thinking that this is sort of an html code. Robots.txt is a real text. It is not even a firewall or some kind of a password protector. Anyone can still get into your site. It is just telling the good guys – the search robots, of course – not to crawl into your site or to a specific web page during search.
When to use it?
You may want to use robots.txt file in several instances. For one, if you are keeping two versions of a web page – one for anyone to see and the other one for your eyes or for office use only. If you do not create a robots.txt file for the second one, you will definitely receive penalty for duplication of content. Worse, everyone who comes across your site will read confidential files from your office. Another case in point is when you want to save bandwidth. You can place this in your image, style sheets and others.
Which specific place in your site is the rightful location of robots.txt file? It is in the primary directory of your site and not in any subdirectory. This is very important or else, the robots will surely visit and index your site.
What the Search Robots See
Once the robots start to help online visitors from their search, they may stumble upon your site. But instead of entering, they will look for this first: http://www.yoursitename.com/robots.txt/. They will dutifully leave your site if they find this structure:
The “*” refers to search robots. The “/” refers to your site. Hence, this means you are preventing all crawlers from accessing the whole content of your website. But if you are blocking only a specific robot for a particular content, here is the example:
The above structure will stop the Image robot of Google from creeping into your site images; hence, it cannot make them searchable over the Internet. So, in case you do not want all robots to index your private directory, here is what you do:
What if you prevent all search engines to crawl into your private directory for instance except for Google? Check this arrangement:
Remember, we are preventing the SEARCH robots from accessing your website and not the other robots like malware and email harvesters. Anyone may still visit your site. You only post the “No Entry” sign so other “bad guys” will still come in if they want to. This is close to saying that you do not rely on robots.txt if your main goal is to hide some pieces of information. Instead, construct your server in such a way that it will perform authentication and ask it to do proper authorization.