How To Use Sitemap & Robots.txt To Give Google a Complete Picture of Your Blog

So you’ve started a new blog. That’s great. How are you going to tell Google about it? Well, if you’re like most, you wait until Google finds out about you. However, there’s a way for you to let Google know that you’re alive.

The Google Sitemap

A sitemap is just that – a map of all the pages on your blog. A Google sitemap is a page that inform Google and other search engines about the URLs on your blog that are available for crawling.

A sitemap is an XML file that lists the URLs for your blog. It allows bloggers to include additional information about each URL like when it was last updated, how often it changes, and how important it is in relation to other URLs in the site.

Basically, you generate a sitemap and then let Google know about. If you’re running WordPress, this is very easy to do. But the first step is to get a Google Webmaster Tools account.

Webmaster Tools is where you’ll be communicating with Google. From there, you’ll be able to add your blog and tell Google where you sitemap is located. Just by adding your blog to Webmaster Tools, you’ve told Google that you have a blog and that their bot should go pay it a visit. The sitemap is kinda like a guide for the bot as it tells which pages are available for indexing.

Google Sitemaps Generator for WordPress

The nice thing about running WordPress on your blog is you never have to worry about creating an XML sitemap because there’s a plugin that will do it all for you.

Google Sitemap Generator for WordPress

The Google Sitemaps Generator for WordPress generates a XML-sitemap compliant sitemap of your WordPress blog. This format is supported by Ask.com, Google, YAHOO and MSN Search.

Installation is little more involved than the average WordPress plugin but once installed, it pretty much runs itself. Whenever you add a new blog post, the sitemap will auto update with the new page so the next time the Google bot comes by, it will know about it.

While the sitemap will allow Google and other search engines to crawl your blog more intelligently, it is only an URL inclusion protocol. In other words, it just tells what URLs to include. To give the bots a complete picture, you need to complement the sitemap with a robots.txt file

Using The Robots.txt File

Sitemap is an URL inclusion protocol. Robots.txt is an URL exclusion protocol. Together, they give Google a complete picture of your blog and how it should be index.

Now you might ask why would you want to exclude some URLs from Google? Wouldn’t you get more traffic if you have as many pages on there as possible? The answer is no. There are some pages you’ll want to be excluded from the index.

The robots.txt file tells the Google bot what it can and cannot index. The most common use for a robots.txt file is to prevent the indexing of duplicate content or members only area.

Sample Robots.txt File

sitemap: http://www.johnchow.com/sitemap.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /go/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /author/
Disallow: /page/
Disallow: /category/
Disallow: /wp-images/
Disallow: /images/
Disallow: /backup/
Disallow: /banners/
Disallow: /archives/
Disallow: /trackback/
Disallow: /feed/

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Allow: /

User-agent: duggmirror
Disallow: /

The above is the robots.txt file that powers my blog. The first link tells the bot location of my sitemap. This is followed by a bunch of folders that I don’t want to the bot to index.

Your blog is a huge generator of duplicated content so you’ll want to use the robots.txt file to block them out. I only want the bot to index the actual blog post. However, the post is generally repeated in category, archives, trackback and feed. Other areas I don’t want the bot going into includes my WordPress admin folder and the folder where I keep my redirects.

By combining a sitemap with a good robots.txt file, you give Google complete picture of your blog and that’ll get you ranked faster.


53 thoughts on “How To Use Sitemap & Robots.txt To Give Google a Complete Picture of Your Blog”

  1. Harshad says:

    Not long back one could not even think of having a website without having some HTML skills. Now anyone can have a good website without even looking at the source codes. WordPress and its plugins have changed things by far.

    1. PPC Ian says:

      I agree with you Harshad. It’s really amazing how much things have changed. Even though you don’t have to look at the source code with tools like WordPress, I do think it involves some serious skill. I know I’ve learned a lot in building out my blog (and have developed some great skills that aren’t all that easy to obtain).

      1. Alquma says:

        It’s very important to make a good content site and make the sitemap and the robots.txt files.

        And get useful backlinks or inlinks that point to the blog that it was listed or indexed.

        1. Yes this come under the basic requirements of any website or blog.

      2. I also agree, and I have to admit, this is quite a relief. I’ve also learned a lot as I’ve built a blog but those tools make it far less intimidating!

    2. d3so says:

      I’ve had to edit the source code to get some needed features on my blog. Having no experience, it takes me hours to figure out.
      I guess I’m picky when it comes to design.

      1. Hey John,

        This is a very useful stuff. I think that a lot more people use a site map than the robots.txt. I have to implement the second one myself.

        1. This one is really awesome post and I think this should go to resource category.

          Useful one.

  2. PPC Ian says:

    Great post! Everyone, trust me, this is really important stuff. Make sure to invest a little more time to give Google (and other search engines) the most accurate picture of your blog (and get rewarded with more traffic)!

    1. Alex Dumitru says:

      SEO is very important and I’ve been always telling it 🙂

    2. d3so says:

      I use a sitemap but never thought of using a robots.txt
      Now, I’m going to copy John’s 😉

      1. You are in the majority I think. It’s good that you already use site map. I think it is the more obvious and perhaps more important of the two but it depends on individual blogs and their content and structure.

        There are some other ways to eliminate at least some duplicate content using the WordPress settings.

      2. Lakhyajyoti says:

        I am also going to copy John.Thanks for your great tips.

    3. Abhik says:

      100% agree..
      A robots file let’s you choose which pages to push for indexing. It eliminates the chances to submit duplicate content.

      1. Yes … but normal blogger do not know about the depth of this kind of technology.

        They simply add and publish.

  3. lifestyle123 says:

    Very good! but I like to know how to generate a list of page on my website? my simple website currently not using anything likes WordPress.

    1. Lakhyajyoti says:

      You can search google to get your answer.

    2. Abhik says:

      http://www.xml-sitemaps.com/
      Works good for normal html sites.

      1. What about CSS and PHP related ?

  4. Thanks for the post John.You helped a lot of people on this post including me.Now I can get my blog more viewable than it is for Google,and get more traffic.Thanks Again 😉

    1. d3so says:

      Hey Bradley, you should post more on your other blog so you can drive more traffic to it.

    2. Lakhyajyoti says:

      Bradley,you have a nice blog.

  5. Alex Dumitru says:

    A great post for beginners, though you didn’t say anything new 🙂

    1. Abhik says:

      Not only beginners..
      I have seen a lot of ‘experienced’ bloggers do not use a robots file.

      1. Yes quite agree with you Abhik.

        There are many people who do not even know that such kind of file exist.

  6. Well, WordPress has changed the game for everybody. A person need not be a programmer to create a beautiful site and he not pay lots of money for SEO experts now. The plugins of WordPress are really amazing. We can achieve virtually anything with these plugins.

    Nice post John.

    1. You always need SEO … whatever plugin or automatic software you can use.

      Like your video on your blog. Nice to see that you have started using it at your early stage.

  7. Useful post John. It helps the new blogger that want to have more visitor in their site.

    1. More visitors ????

      This will help to give instruction to Google.

  8. Benjamin says:

    We have been using that plugin for a long time

  9. Benjamin says:

    Your robots.txt seems to block your feed which is not good because it blocks bots that specifically harvest your feed and syndicate it such as the Google Blog Search Bot.

  10. aatif says:

    i love ur blog . and do what u will do in ur blog . so will try to update my robot.txt

    1. Abhik says:

      Get some originality.. 🙂

  11. Samuel says:

    Awesome post John! i will check out that pluggin. am only using the sitemap pluggin. thanks for sharing have fun in Germany!!!!!!!!!!!!!!! lol

  12. Essays says:

    Forming a text file is easy as compare to sitemap. But sitemap is important if you are using Google webmaster tool or Google analytics. For that, I used sitemap builder.

  13. Be careful with that robots.txt file, you’re disallowing /page, which will prevent Google Adsense from crawling inner pages of your site. You’re also disallowing /feed, which prevents Google Blog Search from including your posts. (I learned this the hard way)

  14. Abhik says:

    I have been using the XML Sitemap generator since it developed.. Also, using a robots file strikingly similar to yours.

  15. zakton says:

    Google Sitemap … something that I heard of from Ken Evoy’s Sitesell, but never knew how to do it myself. Thanks John for letting us know how to do it.

    1. That is the benefit of visiting John Blog.

  16. I have started a new blog last month on 15 sept and using ‘Google XML Sitemaps’ plugin which really working very well.

  17. Steve Moyer says:

    That’s some pretty good advice … and seems kind of technical from someone whom I heard has “the technical skills of a gnat” (of course, Shoe’s spelling skills are pretty bad too).

  18. DA says:

    I myself use a google xml sitemap with all my blogs for the same reason. This make it easier when the bots drop by to index for sure,

  19. kishore says:

    i can only see the appended text in googe webmaster tools Crawl errors and i have observed the XML site map but it is ok. What to do? How to secure my wp blog from this worm?

  20. update files whenever a new web section is added

  21. fas says:

    Where to upload the robots.txt file, where to find it?

  22. Google sitemap generator is real quick way to get your blog indexed but by blogwalking really gives you good traffic and backlinks and get your blog well known around the block!!

  23. Tom Weyers says:

    Really cool post John,

    I have created quite a few WordPress sites over the past couple of years, and have a certain “routine” I follow when I create a new wordpress site.

    After installing WordPress, I always upload the basic plugins, these are:

    Google Sitemap Generator
    Dagon design Sitemap
    Privacy policy
    SI contact form

    After uploading and activating the plugins, I now have a site with the basic pages (Home, About, Sitemap, Privacy policy and Contact), I edit a few things in the WordPress settings, like changing permalinks to /%postname%/, editing the tagline, default category and comment settings.

    I then install Google analytics and also submit my site to Google webmaster tools. Once that’s done, I work on content.

    After a couple of days I pay for social bookmarking service to get some link juice to my site.

    I then consider the “set up” process complete and peroidically work on content and backlinks.

    The one thing I forget is the robots.txt file, and I must admit, that all the sites I have built so far never seen a robots.txt file.

    I also never edit the htaccess file, although it’s considered best practice to do so.

    This post just reminded me that I forgot one fundamental part of the proccess, so thank you for the reminder.

    Tom Weyers

  24. Johnny says:

    Hi John,

    This simple, yet brilliant information should help my blog move forward–quickly.

    I’ve only had my site for a month now. I’ll ask my tech guy if this is installed.

    Thanks for the info.
    Johnny
    P.S. What does blog walking mean?

  25. Alex Campain says:

    Hi John,

    This should help my blog move forward I really. appreciate all the great information and startegies you give thanks a bunch!

    great nfo.
    John

  26. Carl says:

    interesting article. I am totally in agreement of using goggle as a strategic tool. I recently started bogging in earnest. It is tough to blog with a full time day job – I am learning now, but need many time.
    I look at my blog as a potential business, not just a casual hobby, so I definitely share your idea of a blog as a business model. Thanks for the inspiration!

  27. The sitemap is super important for getting Google to index your pages properly. I’d be completely lost without it. Arne’s XML Sitemap Plugin makes this whole process very simple for me to keep my business up and running and know which pages are indexed and which one haven’t been found. It is really recommended to ANYONE with a wordpress blog.

  28. Interesting. Bloggers I know only only use sitemap and no URL exclusion tool such as Robot.txt. I better recommend this to them

Comments are closed.