Tag Archives: Duplicate

The causes and potential negative effects of duplicate content

The causes and potential negative effects of duplicate content

You know duplicate content can have a negative effect on web site
rankings. But how do you examine whether a particular web site exhibits
this problem, and how do you mitigate or avoid it?

To begin, you can divide duplicate content into two main categories:

Duplicate Content as a Result of Site Architecture
Some examples of site architecture itself leading to duplicate content are as follows:
- Print-friendly pages
- Pages with substantially similar content that can be accessed via different URLs
- Pages with items that are extremely similar, such as a series of differently colored shirts in an e-commerce catalog having similar descriptions
- Pages that are part of an improperly configured affiliate program tracking application
- Pages with duplicate title or meta tag values
- Using URL-based session IDs
- Canonicalization problems

All of these scenarios are discussed at length in this chapter.
To look for duplicate content as a result of site architecture, you can use a “site:example.com” query to examine the URLs of a web site that a search engine has indexed. All major search engines (Google,
Yahoo!, Bing Search) support this feature. Usually this will reveal quickly if, for example, “printfriendly” pages are being indexed.Google frequently places content it perceives as duplicate content in the “supplemental index.” This is
noted at the bottom of a search engine result with the phrase “supplemental result.” If your web site has many pages in the supplemental index, it may mean that those pages are considered duplicate content —
at least by Google. Investigate several pages of URLs if possible, and look for the aforementioned cases.Look especially at the later pages of results. It is extremely easy to create duplicate content problems without
realizing it, so viewing from the vantage point of a search engine may be useful.

Duplicate Content as a Result of Content Theft
Content theft creates an entirely different problem. Just as thieves can steal tangible goods, they can also steal content. This, unsurprisingly, is the reason why it is called content theft. It creates a similar problem
for search engines, because they strive to filter duplicate content from search results — across different web sites as well — and will sometimes make the wrong assumption as to which instance of the content is
the original, authoritative one.This is an insidious problem in some cases, and can have a disastrous effect on rankings. CopyScape (copyscape.com) is a service that helps you find content thieves by scanning for similar
content contained by a given page on other pages. Sitemaps can also offer help by getting new content indexed more quickly and therefore removing the ambiguity as to who is the original author.
unfortunately, fighting content theft is ridiculously time-consuming and expensive — especially if lawyers get involved. Doing so for all instances is probably unrealistic; and search engines generally
do accurately assess who is the original author and display that one preferentially. In Google, the illicit duplicates are typically relegated to the supplemental index. However, it may be necessary to take this
action in the unlikely case that the URLs with the stolen content actually rank better than yours.

Excluding Duplicate Content
When you have duplicate content on your site, you can remove it entirely by altering the architecture of a web site. But sometimes a web site has to contain duplicate content. The most typical scenario of this is
when the business rules that drive the web site require the said duplicate content.To address this, you can simply exclude it from the view of a search engine. Here are the two ways of
excluding pages:

Using the Robots Meta Tag
This is addressed first, not because it’s universally the optimal way to exclude content, but rather because it has virtually no limitations as to its application. Using the robots meta tag you can exclude any HTMLbased
content from a web site on a page-by-page basis, and it is frequently an easier method to use when eliminating duplicate content from a preexisting site for which the source code is availableScience Articles, or when a site
contains many complex dynamic URLs.

Article Tags:
Duplicate Content, Search Engine




My name is daksh and i help online business to improve their link popularity and especially for
social bookmarking services, article submission services and directory submission services

A Threat to Your WordPress Blog: Duplicate Content

Blogging is extremely popular these days. And the most popular stand-alone blog engine is WordPress. It is flexible, has many useful features and there is a lot of eye-catching templates for it. But those who have a WordPress blog must be aware of a serious problem that can cause your blog to be removed from Google’s search results. The problem is: Duplicate Content.

WordPress content management system which, when used with the default configuration, is not duplicate content proof. In fact this CMS is capable to render almost 100% of your content duplicate. As usual the fault of the system has roots in its advantages. WordPress has many features facilitating blogging and linking, such as RSS feeds to posts and comments, trackback URLs, monthly archives and so on. In the same time this variety of URLs returning similar or identical pages represents a clear case of duplicate content.

WordPress And Duplicate Content

The first evidences of duplicate content produced by your WordPress CMS can be found in your sidebar. They are category pages and monthly/daily archives. Category pages store your articles posted under the same topic—a category. Such pages have no unique content; they are just a collection of your previous posts. Monthly and daily archives also simply group your previous articles by the date of posting. Sometimes when you have only one post in a given day, the archive page for the date and your post are totally identical.

The next case of duplicate content is even more prominent. It can be your home page itself. If it contains not excerpts but the full text of your posts, then it duplicates your post pages. This also applies to the “next/previous entries” pages—those accessible via /page/2, /3, /4 etc.

Feeds. Search engine spiders crawl all the content they can reach and of course this includes RSS feeds too. The additional problem with them is that Google may choose to display your RSS URL in the search results over the link to the original post. In this case the user who clicks this result will see an XML formatted page which is not “human-friendly”.

Trackback URLs. Many WordPress templates add trackback links after posts. This links enable authors to track who links to their posts. Usually, if your post URL looks like “www.yoursite.com/2006-11-30/yourpost/” its trackback URL will be “www.yoursite.com/2006-11-30/yourpost/trackback/“.

Identical meta-description. By default WordPress doesn’t provide a tool to add unique meta description tags to your posts, and they either have none or share a single site-wide description. Having no meta description at all is a disadvantage, as a properly written one can make your snippet stand out in a SERP. Having an identical description for all your pages is a threat, as Google might get them filtered out as too similar.

Because of the duplicate content Google search can return less desired URLs (such as feeds or archives instead of original posts); your pages can be moved out of their index, or placed into the supplemental results, which are rarely displayed to users.

For tips how to get rid of the duplicate content in WordPress please refer to my tutorial: WordPress vs. Duplicate Content