Outsource all your SEO work
 

Getting Out Of The Google Supplemental Index

written by John Chow on May 24th, 2007

seo-firefox.png

Nathan Metzger of Not So Boring Life sent me an email saying I have over 1,700 pages in Google’s Supplemental Index and I really should try to get out of it.

The Google Supplemental Index is where the unworthy pages end up. According to the above SEO for Firefox plugin, I have 1,790 pages in the supplemental index. This doesn’t mean I have 1,790 unworthy pages. According Nathan’s post on the issue, it means the blog has a lot of duplicate content.

Lots of SEO masters believe that content that isn’t worthy ends up in the supplemental index. While this is certainly true, if you’re running a Wordpress blog it is more likely that you’re simply dealing with duplicate content issues. If you make a post today on a default Wordpress setup, there are about five different URLs you could type in that would give you the exact same content. You can generally get to the same content via the Category, Calendar, Author, Monthly, and Page archives. Unless you know exactly what you’re doing your site is probably heavily cached in the supplemental index.

Reducing the number of pages in the supplemental index has a positive effect on overall Google traffic. The fewer supplemental pages you have, the more traffic Google sends you. Nathan has seen a 20% increase in his search engine traffic since reducing his supplement pages from 170 to 38.

The way to reduce the number of supplemental page is by telling Google what it can and cannot index. You do this with your robots.txt file. I used Nathan’s file as a starting base and edited it to suit my blog. You can read how everything is done over at Not So Boring Life.

Google already sends me over 2,000 visitors each day. I can’t wait to see the numbers after the supplemental pages have been cleared. Look for an update soon.

Nathan said on May 24th, 2007 at 6:21 am

Thanks for the mention John. I have a feeling you’re going to get some major google love when those pages come out of the index.

Reply to this comment
Ali said on May 24th, 2007 at 7:55 am

One quick advice I can you, and everybody is, DO NOT POST IN MULTIPLE CATEGORIES. Just like John did with this post. Big No! No!

This was a tip from Graywolf that I’m repeating, not my own.

Reply to this comment
Kiltak said on May 24th, 2007 at 8:00 am

Or exclude your categories in your robots.txt file :)

Reply to this comment
Nathan said on May 24th, 2007 at 8:32 am

I don’t think you want to exclude your categories in robots.txt. If you do, you should have another avenue for GoogleBot to index your content (archive, calendar, etc).

Reply to this comment
Kiltak said on May 24th, 2007 at 10:50 am

Well, categories will duplicate posts from your archive, unless you only post an excerpt of your posts in them…

Hmmm, that’s what I’ll do :)

Reply to this comment
Walter Vos said on May 24th, 2007 at 1:43 pm

Why not exclude those Nathan? I exclude them, because: people entering my blog from Google on a category page never stayed very long. The content they look for often isn’t on top of the page. I’d much rather have them going directly to the post that has their keywords in it, instead of going to a category that has a portion of the page they’re looking for. What’s your philosophy on this?

Reply to this comment
Marc said on May 25th, 2007 at 2:02 am

You need to have some sort of link structure leading back to those older posts. Categories make sense since they’ll re-enforce the context of the links.

Reply to this comment
Nathan said on May 24th, 2007 at 8:01 am

Yup. You’re exactly right.

There is a link to GrayWolf post about this very subject in my post on the supplemental index. Hit the link above ;)

Reply to this comment
Law of Attraction said on May 25th, 2007 at 1:58 am

Multiple categories…hmm good but with tags we are bound to post in multiple categories

Reply to this comment
GiorgosK said on June 3rd, 2007 at 4:45 am

I don’t see the reason of not posting in multiple categories, after all tags are made for easy multiple categories posting …
What is the story ?

Reply to this comment
Nomar said on May 30th, 2007 at 2:39 am

The blog will probably Explode

Reply to this comment
Matthew said on May 24th, 2007 at 6:25 am

I have been working on this earlier today. I have the SEO tool installed but I never noticed that part of it ;) I did manage to clear some a while back with robots.txt but still need to do some more work.

Reply to this comment
Law of Attraction said on May 25th, 2007 at 2:18 am

i cannot use robots.txt hosted on blogspot..surprisingly my labels are indexed better in google than the articles

Reply to this comment
Nathan said on May 24th, 2007 at 6:30 am

I think my server may have a melt down. This article is about to go front page on digg with just a few more diggs (http://www.notsoboringlife.com/ramblings/top-10-causes-of-accidental-death/) plus all the traffic from Johns post. Going to be an interesting day ;)

Reply to this comment
Matthew said on May 24th, 2007 at 6:41 am

Is that a hint for us to go digg it ;) Go on then you twisted my arm.

Reply to this comment
Nathan said on May 24th, 2007 at 6:47 am

LoL. I would never :twisted:

Reply to this comment
MasterSparky said on May 24th, 2007 at 10:02 am

You guys are too kind ;)

I think I made a mistake by not making the url above a LINK :D

Reply to this comment
Law of Attraction said on May 25th, 2007 at 2:51 am

getting a 404 error on your link

Reply to this comment
Kiltak said on May 24th, 2007 at 8:02 am
Ali said on May 24th, 2007 at 9:13 am

dugg, my server died when i got dugg to death. it’s the best feeling ever! :mrgreen:

Reply to this comment
Aaron Cook said on May 24th, 2007 at 9:17 pm

And here’s to many more days like that for you Nathan. Well done. :smile:

Shine on,
Aaron

Reply to this comment
Law of Attraction said on May 25th, 2007 at 1:59 am

congrats man !!

Reply to this comment
Amanda said on May 24th, 2007 at 6:34 am

I think thats a really good idea…I might attempt to add the text into mine *puts it on a very long to do list*

Reply to this comment
Marc said on May 25th, 2007 at 2:04 am

I wouldn’t wait on this one, it has the potential to positively impact all of your later changes in a big way.

Reply to this comment
Daniel said on May 24th, 2007 at 6:34 am

2000 visitors from google? Wow, that’s something I’m now going to aspire too :) me like

Reply to this comment
Kumiko said on May 24th, 2007 at 7:03 am

Why aspire for a mere 2000? I won’t rest until everybody who uses Google visits my site!

Reply to this comment
Matthew said on May 24th, 2007 at 7:21 am

World domination :twisted:

Reply to this comment
SK said on May 24th, 2007 at 7:34 am

That’s the spirit.

Reply to this comment
Law of Attraction said on May 25th, 2007 at 2:00 am

way to go kumiko…thats the spirit !!

Reply to this comment
Marc said on May 25th, 2007 at 2:05 am

Only one of us can win Kumiko… I’ll see you at the finish line :P

Reply to this comment
Dave Starr --- ROI Guy said on May 24th, 2007 at 7:21 am

Great article John, (and Nathan). I’ve read about this perhaps dozens of times before, including a discourse from Matt Cutts, and no one ever explained what the supplemental index really menat or why I cared … just seemed like funny indetation to me. Got work to do, but thnaks for pointing it out in a way I could understand.

Reply to this comment
website copywriter said on May 24th, 2007 at 7:23 am

Pretty sure your numbers are going even higher, just like it did for Nathan when he reduced his supp pages. Do keep us updated — we thrive in your success, John LOL. :twisted:

Reply to this comment
Slaptijack said on May 24th, 2007 at 7:29 am

Thanks for sending me to Nathan’s site, John. Looks like I’ve got my work cut out for me this morning.

Reply to this comment
Donncha O Caoimh said on May 24th, 2007 at 7:48 am

Unfortunately you can’t use the wildcard “*” in Disallow: lines of the robots.txt file. It can be used for user agents but the following won’t work:
Disallow: /2006/0*

Reply to this comment
Nathan said on May 24th, 2007 at 7:51 am

Ah! Thanks for that tip. Another tweak to my robots.txt file ;)

Reply to this comment
Kiltak said on May 24th, 2007 at 8:05 am

why is everybody than doing this:

Disallow: */feed/
Disallow: */trackback/

Reply to this comment
Donncha O Caoimh said on May 24th, 2007 at 7:51 am

Oh, never mind the previous comment, http://www.seobook.com/archives/001329.shtml shows it does work, at least for Googlebot and perhaps for Yahoo Slurp too! :)

Reply to this comment
InvestorBlogger said on May 24th, 2007 at 8:01 am

I’m going to try it out as nearly 40% of my pages are now in the Supplementals, that is truly mental!

Wow! Thanks, John for the tip! I’d heard of the supps. but didn’t know I was affected!

Kenneth

Reply to this comment
InvestorBlogger said on May 24th, 2007 at 8:02 am

BTW, why all the red on your pages today? I’m seeing red really!

Kenneth

Reply to this comment
John Chow said on May 24th, 2007 at 8:44 am

Turn off SEO for Firefox. The red means those links are nofollow.

Reply to this comment
justinf said on May 24th, 2007 at 8:09 am

looks like i’ll be busy for the next few months. one of my day job websites has over 200,000 pages in supplemental.

i was wondering why our traffic has been going downwards, despite us adding new content nearly every day.

thanks for the tip Mr Chow.

Reply to this comment
John Hood said on May 24th, 2007 at 8:35 am

An eye opening article. Kudos. Time to be evil! :evil:

Reply to this comment
Joshua Watson said on May 24th, 2007 at 9:32 am

Supplemental Results and the index they are in have a bad rap however, a webmaster should not concern him/herself with trying to get their pages out of the index. It’s just like PR…everyone thinks it’s a big deal when in fact it is not. Web pages that are in the supplemental results can and will still rank for your targeted keywords. In fact, there are cases where a supp result will out rank a web page in the normal index.

This was talked about a lot at the search engine strategies conference in New York last month AND you can find more information over at SEOmoz. SEOmoz whiteboard

In my opinion, people should worry less about supp results and more on quality *unique* content. Just my 2 cents. :)

Reply to this comment
blogdinero said on May 24th, 2007 at 10:49 am

even the confussion, it´s a great tool

Reply to this comment
Johnny said on May 24th, 2007 at 11:04 am

so if i have wordpress in it’s own directory

/wordpress

would it look like this then…

Disallow: */wordpress/feed/
Disallow: */wordpress/trackback/

even if i made it so wordpress index.php is in my main directory.

i have like 78,000 pages in the supplemental index. woohoo. :(

Reply to this comment
Simon Gould said on May 24th, 2007 at 11:52 am

Charity begins at home when it comes to beer John :grin:

Reply to this comment
Digitalcameratips said on May 24th, 2007 at 12:33 pm

Again a post that shows how information rich John’s Blog is. Thanks for the link no so boring life where I got the full story but also thanks to the other postes her who discussed the issue more controversial. Yes you are right Joshua Watson, we should concentrate more on original quality content, but sometimes it is faster/easier to use free resources like articles, reviews etc.

Reply to this comment
Martin said on May 24th, 2007 at 1:22 pm

If you don’t have SEO for Firefox you can find out articles that are in the supplemental index by typing this in Google:

site:www.yourdomain.com -inurl:www

Nice post as usual John, Cheers

Reply to this comment
Chicago 2016 said on May 24th, 2007 at 8:17 pm

I found that by creating a robots.txt file with the following information and placing it in my root directory, I was able to get rid of those pesky supplementals.

Copy and paste the following:

User-agent: *
Disallow: /*/feed/
Disallow: /*/feed/rss/
Disallow: /*/trackback/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /tag/

This should do the trick for you, as it’s worked pretty well for me so far. (Thanks to Tim Spangler for the hint.)

Reply to this comment
Chicago 2016 said on May 24th, 2007 at 8:17 pm

Well, to be honest, it should do the trick. I’m just waiting for Google to update.

Reply to this comment
Wallet Rehab - Ways to save money said on May 25th, 2007 at 12:37 am

Yeah, I need to get my site out of supplemental heck. Thanks for the reminder.

Reply to this comment
Marc said on May 25th, 2007 at 2:12 am

Impressive. Big thanks to Nathan for sorting out the details and thanks to John for making me aware ;)

This is significant and I now have some work to do on 2/3 sites.

Reply to this comment
Chicago 2016 said on May 25th, 2007 at 5:25 am

Heck > hell.

Reply to this comment
Dave said on May 25th, 2007 at 7:22 am

I found this plugin for Wordpress that gets rid of duplicated content:

http://www.davidpitlyuk.com/2007/05/25/how-to-get-rid-of-duplicated-content-in-wordpress-automatically/

I believe it just blocks the category and archive pages, so you may still want to do some work on the robots.txt file.

Reply to this comment
Chicago 2016 said on May 25th, 2007 at 7:41 pm

Translation, Sheawey?

Reply to this comment
Chicago 2016 said on May 25th, 2007 at 7:41 pm

Oops, you’re a pingback!

Reply to this comment
Mark John said on May 26th, 2007 at 3:40 am

Great post John. I’ve already tweaked the robots.txt file and uploaded it onto my blog’s server and now eagerly await the results ;)

Reply to this comment
Make Money Blogging said on May 29th, 2007 at 1:36 am

This is a very interesting post. I found that I have 175 supplemental pages and 198 cached pages. I also found that my RSS feed was being returned by Google for various searches, which is obviously not preferred.

I made similar changes to my robots.txt as a result of this post that I hope will lower the supplemental number significantly. I’m especially interested in seeing how that effects my website reporting as a result.

Reply to this comment
Aaron Cook said on May 29th, 2007 at 9:49 pm

Here’s a recent post by Matt Cutts of Google regarding the supplemental index…

Google Hell?
http://www.mattcutts.com/blog/google-hell/

Reply to this comment
Zath said on June 9th, 2007 at 10:26 pm

I had a very basic robots.txt file set up and thought I was doing ok - I had nearly 2000 supplemental links when I checked. Hopefully the changes I’ve made will see a reduction and who knows an increase in traffic.

I found that a lot of mine were from Forum Archives which has been active on my site for years now - quite scary to think what damage this may have been doing to my site in the past!

Reply to this comment
Matt - Domain Feed said on June 12th, 2007 at 12:27 pm

Thanks for this information John. I haven’t had this problem yet but this could come in handy later.

Reply to this comment

Sorry, the comment form is closed at this time.