Posts Tagged ‘search engine submission’

How To: Create a Good Robots.txt for Google

Although for professional search engine optimisation consultants the notion for robots.txt is very familiar, for those who have just began their journey into the world of search engine optimisation it might not be so clear. Basically, a robots.txt will tell search engine spiders coming to crawl your website about the directories and files that you want to be indexed from your website. Therefore, a wrongly used robots.txt might actually have a huge impact on the evolution of your website. You could make it difficult or even impossible for search engine robots to crawl your website simply by a mistake in your robots. Txt.

Why use robots.txt? Here are the reasons why you should use robots.txt: – to signal search engine spiders not to crawl or index certain sections or pages of your site. – to prevent indexing totally. You can exclude certain areas of your site from being indexes. – to issue individual indexing instructions to specific search engines.

Creating your robots.txt Robots.txt is a simple file that you can edit in Notepad and after save in the root directory of your site, where you also have your home page.

Each entry for your robots.txt has to contain two lines: User-Agent: [Spider/ Bot name] Disallow: [Directory/ File Name]

According to your reason for using robots.txt, here is how you can put it together:

1. If you want to exclude a file from an individual Search Engine:

User-Agent: Googlebot Disallow: /private/privatefile.htm

2. If you want to exclude only a section of your site from spiders:

You do not need to specify each robot that you wish to exclude anymore, because you can simply use a wildcard character, ‘*’, which will lead to its immediate exclusion. User-Agent: * Disallow: /newsection/

3. If you want to allow search engine spiders to index everything:

Using the same wildcard, ‘*’, you signal that all spiders are welcome to crawl your entire site. Remember you need to leave the second line, disallow, empty, that is your disallow from nowhere.

User-agent: * Disallow: Once you got your robots.txt in place, you need to make sure they have been done correctly. The tool to use for this is Google Webmaster Tool. As we have talked about in a previous blog (Create your Search Engine Marketing Strategies by Using Google Webmaster Tool), one of the features offered by this tool allows you to check your robots.txt. Basically, Google will automatically and in real time retrievethe robots.txt from your website and adds the main url to the list of urls that should be checked.

As search engine optimisation consultants, we would recommend you pay very much attention to the way you are using this due to their great impact on your search engine marketing strategy. They can help your site in getting higher visibility from search engine spiders, but can also be quite destructive if done wrongly.

If you would like to share your experience in using robots.txt or would like to add to our list other tips that might help in the creation of robots.txt, we would love to read your comments!

SysComm International is a social media marketing, online reputation management and search engine marketing company. Click here to contact us.

Share
-
Topics Cloud
February 2012
M T W T F S S
« Jan    
 12345
6789101112
13141516171819
20212223242526
272829