Robots.txt file

I must say, I have to think about the sorry state of so called ‘wanna be’ Internet marketing professionals out there.

A lot of my freelance SEO / SEM work comes from business owners that had hired in someone full time, or as a contractor that promised them they would get ‘first page ranking’ on major search engines like Google or Yahoo. After about 6 months, and no such results, they let this person go and give up.. feeling scammed and not wanting to trust other, real Internet marketing professionals. This makes my job so much harder.

I recently took on a client for some SEO work here in Louisville, Ky. who was in this same position. It always amazes me how little the previous person did ( or did not do ) when I come in and clean up their mess. This clients web site had no title optimization, no keywords on any of the pages, links were not optimized.. and no robots.txt file on the server. There are a lot of things missing from this site that can help it get the results the owner wants.. most people may not know the advanced techniques, but if you are going to do Internet marketing.. you should know the most basic things like including a robots.txt file in the root folder of your site. If for anything, help your server admin out but not letting the search engines fill up your error logs in Apache or IIS with ‘request not found’ for the robots.txt file, because the search engines are looking for it and calling on it when they visit your site.

There are many reasons to include a robots.txt file, which tells the search engines what files they can index and which ones to not index. You may want search engines to not index admin pages, password protected pages, or other data you do not want freely available over the Internet.

The most basic robots.txt file is just a txt file uploaded to the root folder of your site, this is normally the same folder as ‘www’ or ‘http_docs’. This file simply lists the commands you tell the robot from the search engine.

As an example, my robots.txt file contains this entry:
User-agent: *
Disallow:

The above entry tells the search engines to freely index all pages of my site.

There is also a meta tag for robots, but this will be discussed later.

If you want none of your site indexed by search engines, put this in your robots.txt file:
User-agent: *
Disallow: /

Say you want to allow indexing on your site, but not in the cgi-bin or images folders:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

You can even specify that only Google stay out of certain folder:
User-agent: googlebot
Disallow: /cgi-bin/
Disallow: /images/

Do yourself a favor, and take the 2 minutes to create your robots.txt file, upload it, and forget about it.


About this entry