Google Clears Up Duplicate Content

The official Google Webmaster Blog has made a post clearing up Google’s position on duplicate content. Everyone will want to check out this official ruling.

What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

What isn’t duplicate content?
Though we do offer a handy translation utility, our algorithms won’t view the same article written in English and Spanish as duplicate content. Similarly, you shouldn’t worry about occasional snippets (quotes and otherwise) being flagged as duplicate content.

Why does Google care about duplicate content?
Our users typically want to see a diverse cross-section of unique content when they do searches. In contrast, they’re understandably annoyed when they see substantially the same content within a set of search results. Also, webmasters become sad when we show a complex URL (example.com/contentredir?value=shorty-george〈=en) instead of the pretty URL they prefer (example.com/en/shorty-george.htm).

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.

Google goes on to list a bunch of points on what Webmasters can do to proactively address the duplicate content issue. Check the full post here.


6 thoughts on “Google Clears Up Duplicate Content”

  1. Shane says:

    I’ve noticed some discussion about only showing excerpts of posts on page 2, 3, etc of WordPress blogs, so as not to duplicate the content of the single post page.

    I thought this might be a good idea, but I figured Google was smart enough to figure it out.

    Like to hear your thoughts on this John … is it more a matter of increasing pageviews?

  2. slasher says:

    So, John – what does that mean for huge media companies like CNET, CMP, PC World, PC Mag etc. who cross publish their articles on a variety of properties?

  3. Zen Bliss says:

    On dynamically generated pages that contain post content, it’s common practice to use the rel=”bookmark” attribute on permalinks to inform various agents of the permanent locations of your posts.

    The rel attribute has other uses as well, such as rel=”nofollow” for fighting comment spam. Technorati also recognizes rel=”tag”.

  4. girlrobot says:

    i work as a google quality rater and that’s one of the categories we have to rate…if two pages are duplicates of each other. so a lot of the filtering out is done by humans!

  5. David Mackey says:

    Glad Google is working on this. Wonder what effect this will have on article directories which offer articles for webmaster’s to place on their site to increase traffic. Will they get ignored and be useless?

  6. Andy says:

    I had to make a lot of changes because of this
    That sucks
    Google should be clear from the start

Comments are closed.