There has been a lot of discussion and debate around duplicate content. When I helped put together the Tips from the T-List website for example, the whole idea behind using the WordPress platform was to consolidate all the content into one place and then re-direct readers back out to the original posts and websites. The idea behind this was that by collaborating and consolidating the content, the Tips from the T-List site would in essence act as a portal that would generate more traffic then any single blog could do by itelf. Although this seems like a good idea there was a lot of concern around the duplication of content and whether such an approach would help or hinder the rankings of individual blog sites. Some bloggers refused to add their rss feed to the site because of fear that their site rankings would be negatively affected.
The implications of creating a site that re-uses content from many sources seems to me to be something that is actually quite commonplace. News aggregator sites for example pull in RSS feeds from other news sites, re-use content from press releases, and run stories across multiple sites. In the travel business, there are hundreds of sites that have licensed the Lonely Planet, Rough Guide, or Columbus Guide content for their own websites. In these cases, the companies are paying a license fee to the the original publishers to place duplicate content on their sites.
I wanted to find out what the implications are for duplicate content and how Google and other search engines actually treat duplicate content. This is what I have discovered:
1. Only use content that is Creative Commons, CopyLeft, Opensource, or otherwise not copyrighted. If you are going to use copyrighted material, be sure to follow the copyright exactly. Although this seems obvious, it is important to remember that by default, all written content is copyrighted unless otherwise stated and NOT the other way around.
2. When using duplicate content, try to rewrite the content or paraphrase the content in order to ensure that it is not exactly the same as the original. You will probably still need to give the original author credit for the original work.
3. When aggregating RSS feeds, it is always best to notify the original author to let them know what you are doing. This will avoid any uncomfortable questions about inappropriate usage later. Be sure to link to the original post along with the article.
Google clearly identifies what it considers to be duplicate content. These guidelines can be found on Googles Help site. According to Google, content is considered duplicate when:
content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic.
So my question is… if the author of the content has agreed to allow another site to post the content, and the poster has added a link back to the original article, does this count as duplicate? If the intention is not malicious but rather benign, can Google tell the difference?
In the case of consolidating or aggregating blog posts into a single site (like the Tips from the T-List), here is what Google recommends:
Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to block the version on their sites with robots.txt.
I know that Darren Cronian has had experiences in the past with splog sites re-using his content but does anyone have any real evidence that duplicate content issues have had a negative impact on your their blog readership, traffic, reputation, or rankings?