Saturday, December 30, 2006
The Great Deal Of Duplicate Content
Someone is stealing your content

Over the time we had a lot about content. Content is important and owns a key-role in the optimization of our blogs and websites. Web has even evolved with the concept of user-developed content. Different formats of content have been evolved under same medium of Internet like never before.

Being in a pool of Internet content it is even more important for us to have it unique and fresh. Much of the content on Internet is not original, it is either duplicate content or near Duplicate content.

What is Duplicate Content?

We all know about viruses and how they propagate in web. Content propagates in web in the same way. In the mean time many copies of content are formed. User developed content on net with no special addition of information or value to the real information is known as Duplicate Content. Sometimes this is done giving credits to the original websites and sometimes it’s just ripped of.

What are different types of Duplicate Content?

1. Content that is syndicated and re-branded for different users and different market like
Private Label Articles.
2. Websites/Blogs developed by just aggregating the content from different websites from
around the web.
3. Plagiarism: Copying from Public domain websites like Wikipedia, Project Gutenberg.
4. Web press releases are most often are duplicated in many blogs and other web media sites.
5. Businesses in quest of building brand image and protecting their Trademark ownership,
register all domains related to their main domain and build near duplicate, or similar websites
with no new content.
6. Auto generated content from different types of content extractor software.
7. Many registered domains either containing keywords or optimized for different keywords,
redirected to same website. This type of optimization is dangerous as content of the main
website can be taken as Duplicate content.

If we leave some business compulsions, most Duplicate pages are generated to develop content rich websites and generate a passive income, by placing contextual ads and manipulating search engines for high search engine rankings and optimizing their main website, taking the un-authorized credit.

Most times this is done without providing any valuable unique content for the visitor, neither giving credits to the real owner of the content, nor putting any useful inputs in it.

Duplicate content may be good for many big firms in the way that their name is attached with the content, and even though the content is duplicate, there is no harm to the real owner/source of content as it is providing inbound links to the main site of the real content owner. But this is not always the case.

Why should we be concerned?

Most often when we write even about latest news or info, chances are there its been discussed at many other websites. Search engines try to ignore or even sometimes de-rank the sites, which have very little original content.

Using 301redirects is also an important ways of telling search engines that page have been shifted instead of developing similar page. But using too many redirects to a particular page may make search engines take your website content duplicate

Latest trend of Page Jacking is really getting furious for new and small websites. Sometimes owners of authority websites with good page rank take the content from smaller sites/blogs and use them in their websites. Being white listed they have more relevant content at their website, and this ripped content looks more natural to that website. Now this may make search engines think as if the original website have duplicate content and it may face consequences.

You, now don’t have worry about things like http://domain.com and http://www.domain.com

How do Search engines look for Duplicate Content?

Google, Alta Vista(now owned by Yahoo.com) have many patented technologies with them to find duplicate content and it’s not matching word by word content but finding similarities.


1. Google looks for subsets of other content sources. It checks the for how recent the
information has been showed and also checks the previous authority of blog or website.
2. Yahoo compares the outbound links from the content
3. Public Domain websites and other high page authority sites are very well crawled by these
search engines. Most of the content optimized websites have similar or just copied content
from these websites are easy prey of the search engines.
4. Similar title tag, Meta tag description, with similar content are easily identified by search
engines, are noticeable traces of content by automated systems.

How to find if your content is duplicated?


1. Check by taking any 3-4 lines at random or taking important phrases from the article, and
Google them by putting them in inverted commas.
2. Use service called “copyscape plagiarism” provided by CreativeCommons.org and also at DisclosurePolicy.org.

Its really important for us to make information easily accessible as well we add valuable content to the web. Sooner or later most attempts of stealing content would be countered by search engines with better evolving technologies.
 
Add the post to Digg posted by vignesh at 4:29 AM | Permalink |


0 Comments: