iSnare.com - Free Content Articles Directory
Authors Contents [Advanced Search][Add OpenSearch][Job Search]
Distribute your articles to thousands of article sites for only $2 and below! Read more...

Index  Internet
 

How And Where Search Engines See Duplicate Content

 
[ Contact the Author] [ Send to a Friend] [ Article Publisher] [Make PDF] [ Print] [ Bookmark & Share]
 
Read our Terms of Service before reprinting this article. The submitter specified above has claimed the rights to this article.
Danny Wirken

Introduction

Search engines have become the gateway to information in the Internet. Search engines are so important that websites find that they need to rank well in search engine results pages (SERPs) in order to get noticed. With the numerous websites vying to get into the coveted position of the top 30 results listed in SERPs more and more website owners are using search engine optimization (SEO) techniques to improve their rankings. People who use SEO know that there are certain factors that can affect your ranking positively and of course negatively. Of the negative factors one of the most well-known is duplicate content.

Search engines are biased against duplicate content. As a matter of fact some sites do not get listed in SERPs because of this factor. This happens when crawlers do not index sites which they have previously determined to be a duplicate site of another site. The crawlers skip the duplicate site to be more efficient and save time. Crawler also do this for another reason – to avoid listing duplicate pages in SERPs and thus point users to different sites containing just the same information. Search engines do not like that to happen because it would be irritating for users who expect to see different sites for the different links they click. For similar sites, search engines also usually just list one of the sites and relegate the others under a link that says See related pages. For those that get manage to be listed in the SERPs the page rank is still usually affected and so affects the sites standing.

Where Search Engines See Duplicate Content

So where do crawlers see this duplicate content. And what are the possible content that they would interpret as duplicate? According to an article by William Slawski on Duplicate Content Issues and Search Engines, search engines see duplicate content from the following kind of web pages:

1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites.

2. Alternative print pages – This happens when website owners who are user friendly offer copies of the same documents in different formats for a varied printing options. Although helpful to users it might actually indexed by crawlers as duplicate pages.

3. Pages that reproduce syndicated RSS feeds through a server side script.

4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs.

5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs.

6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs.

7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally. – This is common for company websites that insist on having their logo, description, etc put on every page of their website.

8. Copyright infringement – Plagiarism is of course a good reason for not being indexed. The problem is that crawlers cannot distinguish the original from the duplicate and might mistakenly filter out the original instead.

9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs).

10. Article syndication – Some writer allow their articles to be published in other websites as long as they are given credit for their work. The problem arises when the crawler sees the original article as the duplicate and opts to index duplicate page or at least give it a higher rating.

11. Mirrored sites – Mirrored sites are used to handle the traffic of a very popular site. Mirror sites have a good chance of being ignored by web crawlers and so won’t be indexed.

How Search Engines See Duplicate Content

There are many methods employed by different search engines to determine pages with duplicate content. The methods in many ways, from the concept, to the algorithms, and of course their effectiveness. Search engines are, however, all finding new ways to improve their methods for searching duplicate content as seen by the patents filed by different search engines companies like AltaVista, Microsoft Corporation, Google, and other bodies like the company Digital Equipment Corporation and even the Regents of the University of California.

The different patents include methods for Detecting query-specific duplicate documents, Detecting duplicate and near-duplicate files, clustering closely resembling data objects, identifying near duplicate pages in a hyperlinked database, indexing duplicate database records using a full-record fingerprint, indexing duplicate records of information of a database, utilizing information redundancy to improve text searches and methods and apparatus for detecting and summarizing document similarity within large document sets, and for finding mirrored hosts by analyzing URLs.

Each method is unique and is interesting in its approach. The methods vary greatly from generating fingerprints for records to using query-relevant information to limit the portion of the documents to be compared. Discussing each method would be interesting and would shed light as to how different search engines approach the problem. The new methods are all innovative and if some of them are used in concert with each other, it would surely improve the search engine’s ability to detect duplicate documents. However, since the patent holders are competing companies, it is unlikely that there would be collaboration between them.

Conclusion

As search engines further refine their methods for detecting duplicate content it would be harder for plagiarists to get away with what they do. However, web pages containing duplicate content for a good reason could suffer as well. Furthermore since none of the published patents tackled the issue of differentiating the original content from the duplicate ones refinement in the search engine’s methods might mean further trouble for the website owners of original content. Because of this search engines ought to find ways and invent new methods for identifying original content from duplicate ones as well as valid duplicate content.

Important NoticeDISCLAIMER: All information, content, and data in this article are sole opinions and/or findings of the individual user or organization that registered and submitted this article at Isnare.com without any fee. The article is strictly for educational or entertainment purposes only and should not be used in any way, implemented or applied without consultation from a professional. We at Isnare.com do not, in anyway, contribute or include our own findings, facts and opinions in any articles presented in this site. Publishing this article does not constitute Isnare.com's support or sponsorship for this article. Isnare.com is an article publishing service. Please read our Terms of Service for more information.

Article Tags: duplicate [See Dictionary], engines [See Dictionary], search [See Dictionary]
Got a question about this article? Ask the community!
Article published on September 10, 2006 at Isnare.com
 
Rate this article:

Riya: A Big Leap In Visual Search Engines
Submitted by: Danny Wirken

Watch out for new software that will give a new face to search engines Rather, a program that includes faces in the search function...

Taguchi Method: The Key In Ad Optimization?
Submitted by: Danny Wirken

For people who are looking for the secrets on how to master ad optimization, your prayers have been answered...

What A .htaccess File Is And How To Make One
Submitted by: Danny Wirken

A htaccess file is a simple ASCII file similar to that created through text editor such as Notepad or Simple Text...

What You Should Know About Trackback Spam
Submitted by: Danny Wirken

Trackback facilitates communication between blogs When a blogger writes a new entry whether to comment on or refer to an entry found at another blog, the commenting blogger can notify the other blog with a Trackback ping...

What You Newbies Need To Know About Pay Per Click Ads
Submitted by: Danny Wirken

Just about anyone who has been using the Internet in the last few years has no doubt come across the term "pay per click" once or twice...

The Exciting World Of Video Blogging
Submitted by: Danny Wirken

When the idea of weblogs was first introduce online, it was an instant phenomenon Suddenly just about everyone feels the need to create their own space online by writing their thoughts...

The Latest On WordPress Themes
Submitted by: Danny Wirken

As WordPress and blogging become more and more popular, the list of customization options continues to grow...

Tips On How To Deal With Anonymous Comment Spam
Submitted by: Danny Wirken

Have you ever experience being flooded with anonymous comments If yes, then chances are you have been a victim of comment spam...

To Blog Or Not To Blog: The Ups And Downs Of Blogging
Submitted by: Danny Wirken

Whenever the subject of the phenomenon called blogging is raised, most people immediately think associated it with an online diary or weblog...

Trackback Spam Explained
Submitted by: Danny Wirken

In most blog applications, there is a feature called Trackback, which allows the user to send a trackback or notification to a different site or another blog that the user referred to in his own blog...

Web 2.0, A Guide For Newbies
Submitted by: Danny Wirken

A couple of years back Bill Gates introduce the idea of Convergence to the public It was a fresh idea that later became a catchphrase for the Internet Industry...

How To Use Linknotes Plugins
Submitted by: Danny Wirken

When users complained about inline links that are becoming way too obtrusive, someone was bound to find the answer...

Moving Your WordPress Blog
Submitted by: Danny Wirken

Moving a blog can make it unreachable for 24 – 72 hours, unless the new domain name has fully propagated around the Internet...

Google Update: A Test For Keyword Dominance
Submitted by: Danny Wirken

Google is one of the most popular search engines on the Internet today According to statistics about 50 to 80 percent of searches made by users worldwide are being done on Google...

How To Prevent Comment Spam With Google’s No Follow Attribute
Submitted by: Danny Wirken

Putting up and maintaining a weblog of your own could be done for free or built into your paid domain site...

The Gimp Tutorial And Free Gimp Download
Submitted by: Peter Nisbet

If you are looking for a Gimp tutorial, or 'the Gimp' as many refer to it as, then probably the best sites are those offering Gimp video tutorials...

How To Generate Traffic Using Social Bookmarking Websites
Submitted by: John Don

Social bookmarking will allow you to generate traffic to your website You will need a working strategy in order to generate the real traffic...

Make Money On The Web In These Basic Business Principles
Submitted by: Alicia Pierce

The Internet as an information superhighway has also become a venue for successful enterprises and ventures...

Quick Ways To Make Money Online – Very Easy!
Submitted by: Alicia Pierce

There are many quick ways to make money these days Most of these ways can come from doing some kind of activity on the internet...

How To Make Quick Money – Do It The Millionaires’ Way
Submitted by: Alicia Pierce

There are no shortcuts in how to make quick money the right way Those that seem to have a knack for it have been carefully honing their craft and talent for years and have paid their keep to the school of hard knocks...

Website Design and Development - Tips on How to Get Started With E-Commerce and be Successful
Submitted by: Daljeet Sidhu

If you want your business to thrive, do not wait for customers to come to you Take your products to your customers through internet and e-commerce...

Web Design Service - Important Steps to Better Security and Safety For Online Business
Submitted by: Daljeet Sidhu

Internet users are highly concerned about receiving spam mail and becoming vulnerable to identity theft...

Let Your Voice be Heard-Create a Blog
Submitted by: Cathy Lindsay

Since the beginning of recorded time, mankind has longed to be heard, to be remembered, to have their voice live on after their body is gone, in essence, to record their time on earth...

Some Tips For Building a Website
Submitted by: Cathy Lindsay

A lot has changed in the virtual world since its creation, and even in the last ten years the bounds made in technology have allowed for richer, more engaging content on the World Wide Web...

It’s in the Content!
Submitted by: Cathy Lindsay

Content is King People want the most concise, accurate, and helpful information that they can get, whether from books, magazines, newspapers, television programming, or on the internet...

Thinking About Totally Redesigning Your Website?
Submitted by: John Dow

At the end of each calendar year I often schedule a quick review of my clients websites with the owner or person in charge of the website...

Make Money Online Products Are Junk - Learn the Free Way
Submitted by: Chris Little

The internet is flooded with make money online schemes, eBooks, and “methods” that are actually cleverly designed products that don’t really offer you anything useful at all...

The Appeal of Flash Games
Submitted by: Darena Litt

One of the principal reasons for the popularity of the World Wide Web is the emergence of online gaming...

Google's SafeSearch Helps Keep Kids Safer Online
Submitted by: Gregg Housh

The internet is a evolving community, home to everything from exotic recipes to ridiculous videos to books that are no longer in print...

Home Online Careers: Why Affiliate Marketing is the Best Home Online Careers For Single Parents
Submitted by: Brandon Wenzel

Being a single parent isn't an easy job Working, taking care of your children, and trying to make ends meet in today's economy is a daunting task...

Isnare.com Footer Divider

© 2004-2009. Isnare Free Articles - An Isnare Online Technologies Free Articles Project. All Rights Reserved.   Privacy Policy