Andrew Gatenby

10 Things you can do with Robots.txt for SEO

Robots.txt is a very useful and underutilised tool that allows web designers to control how their site handles spiders from search engines. The majority of search engine spiders take robots.txt and will parse the instructions prior to indexing a website.

Using robots.txt you can ensure that spiders only index certain parts of your website, and you can further specify as to which spiders are allowed to do so, and what pages they index.

Read the rest of this entry »

Automatically Bounce non-www Domain to www. Domain (Canonical Domain Names)

You may have noticed, you may not, but when you go to www.andrewgatenby.com your browser is automatically bounce to andrewgatenby.com - no “www.” in the domain name. This is something I’ve actively done is to include what is called a canonical domain redirect. Basically, if you have a www.domainname.com then you can probably also visit your site by just going to domainname.com and it will still show the same content right?

Well, this to a search engine appears as 2 different domain names with the exact same content - something which could get you marked down in rankings. Using the modern-day witchcraft that is .htaccess we can solve this problem, and automatically bounce domainname.com to www.domainname.com by issuing a HTTP 301 (permanently moved) header redirect:

Read the rest of this entry »

Tell Crawlers Your Sitemap Location via robots.txt

A recent announcement from Google and Yahoo! led to their crawling engines to support an additional directive that can be included in the robots.txt file. By using the “Sitemap:” directive, you can tell crawlers the absolute URL of the XML sitemap for your website:

Sitemap: http://www.youdomain.com/sitemap.xml

Of course, you can still tell Yahoo! and Google about your sitemap and get some useful information on your website by using Google Webmaster Tools and Yahoo! Siteexplorer.