iSnare.com - Free Content Articles Directory
Authors Contents [Advanced Search][Add OpenSearch][Job Search]
Distribute your articles to thousands of article sites for only $2 and below! Read more...

Index  Internet
 

The Problem Of Demoting Spam On The Internet: Yahoo!’s Trustrank Approach

 
[ Contact the Author] [ Send to a Friend] [ Article Publisher] [Make PDF] [ Print] [ Bookmark & Share]
 
Read our Terms of Service before reprinting this article. The submitter specified above has claimed the rights to this article.
Danny Wirken

TrustRank is an attempt to counter the web spamming activities that threatens to deceive search engines’ ranking algorithms. It propagates trust among web pages in the same manner that PageRank propagates authority. However, tests would show that the combination of trust and distrust values have greater ability to demote spam sites than with the use of trust values alone.

The Assumption

A link between two pages holds an implied conveyance of trust emanating from the source page to the target page. Pointing to a link is a vote of confidence from the source that the target is able to provide content that will be of value to the user. It basically revolves around the ideal set-up that good sites only point to similarly good sites and will not knowingly refer people to spam sites. These good sites hold the trust of people which is then used in propagating trust through the link structure of the web.

TrustRank hopes to use a set of highly trusted seed sites to help in demoting web spam. The approach assigns a non-zero initial trust score to these seed sites while assigning initial values of zero to all other sites. A biased PageRank algorithm is used to propagate these initial trust scores to the outgoing sites where good sites are expected to get a decent trust score while spam sites are likely to get lower trust scores after convergence.

The possibility of a page pointing to a spam page increases as the number of links increases. It has been proposed that the trust score of a parent page be equally split among the children pages. There is the question as to the logic of having different trust scores for children pages in cases of multiple parent pages. TrustRank provides a solution by simple summation which has been not quite effective in curtailing the spam site’s efforts to raise their ranking.

The conveyance of distrust emerged as a natural extension of the conveyance of trust between links. Distrust may be an indication of lack of confidence to a source page due to its linkage to an untrustworthy page. Thus, when a link with a known spam page is established, the trust judgment of the source page cannot be considered valid.

TrustRank as it was originally conceived, proposed that trust should be reduced as we move further away from the seed set of trusted pages. However, the limited number of seed pages makes it impossible for the whole web to be touched by propagation. A well performing algorithm is needed to produce trust judgments at least for a larger fraction of web pages.

The seed sets used may not be able to sufficiently represent the different topics of the web. TrustRank tend to show a bias towards larger communities which can be remedied by the use of topical information to divide the seed set and calculate trust scores separately for each topic. The use of the pages listed in well-maintained topic directories can help in resolving the coverage issue. Seed filtering may be done to remove low quality pages or even spam pages that may inadvertently been included in the pool of seed pages.

Much work is being done to come up with methods that don’t rely heavily on human judgment for identification of spam free pages. As it is, searchers are highly challenged to locate pages that would serve their needs and not those that are intended for high ranking in search engines. Sites that do not provide any value to users are just too many to be ignored.

Semantic Cloaking on the Web

Semantics is the study or science of meaning in language that takes words and compares them with other words or symbols and determines the relevancy and relationship between them. Semantic cloaking is the practice of supplying different versions of a web page to search engines and to browsers. The purpose of the content provider is to hide the real content of the page from the view of search engines. The difference in meaning between the pages is supposed to deceive search engines’ ranking algorithms. Cloaking is one type of search engine spamming technique that makes it possible for non-relevant pages to occupy top ranking in searches.

Search engines are used by people when they need to find the most relevant responses to their search. It is typical for users to view just one page of results thus sites are hard put to compete for the top rankings particularly for popular queries. Increased traffic to a commercial website is equivalent to more profit.

Reputable content providers work hard to come up with high quality web pages to get their desired high ranking. Unfortunately, not all content providers hold the same view. These are the people that would try to reach high ranking through manipulation of web page features used by search engines as basis for their ranking algorithms.

Ranking algorithms assumes that page content is real. This means that the content seen by search engines is identical to that seen by actual users with browsers. With the use of the web spamming technique of cloaking, different versions are successfully supplied causing a big amount of confusion and disappointment for users.

Cloaking falls under the page-hiding spam category in search engine spamming techniques. Some cloaking behavior is considered acceptable. Cloaking is of two types – syntactic and semantic. Syntactic cloaking includes all situations in which different content is sent to a crawler and real user. Semantic cloaking is an offshoot of syntactic cloaking which employs differences in meaning between pages to deceive the ranking algorithms of search engines.

Syntactic cloaking may be acceptable in cases such as web servers using session identifiers within URLs for copies sent to browser and no such identifiers for copies sent to crawlers. This is in effect being used by web servers to differentiate their users. Search engines may interpret these identifiers as a change in the page. The cloaking behavior that needs to be penalized is the semantic cloaking.

There are various proposals on ways to counter the problem. One proposal suggests the comparison of copies from both the browser’s perspective and the crawler’s perspective. It may be necessary to get two or more copies from each side to be able to detect cloaking. Another suggests a two-step process that would require fewer resources. The first step implements a filter by use of heuristics to eliminate web pages that cannot demonstrate cloaking. All the pages that have not been eliminated will go through the second step for inspection. Features are extracted from about four copies and a classifier is used to determine whether semantic cloaking is being done or not. However, the reality remains that no ideal solution has been arrived at to effectively curb semantic cloaking. This is a technique that should not be practiced by anyone who wants to maintain good business ethics. The practice continues to undermine the search engine’s attempts to provide users with the actual information they need.

Important NoticeDISCLAIMER: All information, content, and data in this article are sole opinions and/or findings of the individual user or organization that registered and submitted this article at Isnare.com without any fee. The article is strictly for educational or entertainment purposes only and should not be used in any way, implemented or applied without consultation from a professional. We at Isnare.com do not, in anyway, contribute or include our own findings, facts and opinions in any articles presented in this site. Publishing this article does not constitute Isnare.com's support or sponsorship for this article. Isnare.com is an article publishing service. Please read our Terms of Service for more information.

Article Tags: cloaking [See Dictionary], pages [See Dictionary], trust [See Dictionary]
Got a question about this article? Ask the community!
Article published on August 02, 2006 at Isnare.com
 
Rate this article:

Riya: A Big Leap In Visual Search Engines
Submitted by: Danny Wirken

Watch out for new software that will give a new face to search engines Rather, a program that includes faces in the search function...

Taguchi Method: The Key In Ad Optimization?
Submitted by: Danny Wirken

For people who are looking for the secrets on how to master ad optimization, your prayers have been answered...

What A .htaccess File Is And How To Make One
Submitted by: Danny Wirken

A htaccess file is a simple ASCII file similar to that created through text editor such as Notepad or Simple Text...

What You Should Know About Trackback Spam
Submitted by: Danny Wirken

Trackback facilitates communication between blogs When a blogger writes a new entry whether to comment on or refer to an entry found at another blog, the commenting blogger can notify the other blog with a Trackback ping...

What You Newbies Need To Know About Pay Per Click Ads
Submitted by: Danny Wirken

Just about anyone who has been using the Internet in the last few years has no doubt come across the term "pay per click" once or twice...

The Exciting World Of Video Blogging
Submitted by: Danny Wirken

When the idea of weblogs was first introduce online, it was an instant phenomenon Suddenly just about everyone feels the need to create their own space online by writing their thoughts...

The Latest On WordPress Themes
Submitted by: Danny Wirken

As WordPress and blogging become more and more popular, the list of customization options continues to grow...

Tips On How To Deal With Anonymous Comment Spam
Submitted by: Danny Wirken

Have you ever experience being flooded with anonymous comments If yes, then chances are you have been a victim of comment spam...

To Blog Or Not To Blog: The Ups And Downs Of Blogging
Submitted by: Danny Wirken

Whenever the subject of the phenomenon called blogging is raised, most people immediately think associated it with an online diary or weblog...

Trackback Spam Explained
Submitted by: Danny Wirken

In most blog applications, there is a feature called Trackback, which allows the user to send a trackback or notification to a different site or another blog that the user referred to in his own blog...

Web 2.0, A Guide For Newbies
Submitted by: Danny Wirken

A couple of years back Bill Gates introduce the idea of Convergence to the public It was a fresh idea that later became a catchphrase for the Internet Industry...

How To Use Linknotes Plugins
Submitted by: Danny Wirken

When users complained about inline links that are becoming way too obtrusive, someone was bound to find the answer...

Moving Your WordPress Blog
Submitted by: Danny Wirken

Moving a blog can make it unreachable for 24 – 72 hours, unless the new domain name has fully propagated around the Internet...

Google Update: A Test For Keyword Dominance
Submitted by: Danny Wirken

Google is one of the most popular search engines on the Internet today According to statistics about 50 to 80 percent of searches made by users worldwide are being done on Google...

How To Prevent Comment Spam With Google’s No Follow Attribute
Submitted by: Danny Wirken

Putting up and maintaining a weblog of your own could be done for free or built into your paid domain site...

Article Writer - Do You Need One?
Submitted by: Enzo F. Cesario

Content is king Your web presence needs content that your audience will be interested in, period...

How to Find Quality Web Directories
Submitted by: Jason Kay

When you promote your website one of the first things you will want to do is to submit your website to a number of different web directories...

Review of Mozy Online Storage
Submitted by: Jason Kay

Every year people just like you lose countless documents and important files because of the unforeseen, but with online storage companies such as Mozy, this scenario can be avoided forever...

You Can Still Make Good Money on Ebay
Submitted by: Mark Thomas Walters

The banning of the sale of digital products on eBay has led to many online marketers abandoning the use of eBay as one of their revenue streams...

Secrets to Using Social Bookmarking For Link Building
Submitted by: Blake Evans

Social bookmarking became popular as a tool to share your favorite websites with others on the internet...

What is Pagerank?
Submitted by: Blake Evans

The Internet is a vast expanse of space which contains matter on anything you can think of Due to the instantaneous nature of the Internet, anyone who has access to a computer and a phone line indulges in some browsing on various subjects...

Teleseminars - How to Use Them Effectively
Submitted by: Mark Thomas Walters

Teleseminars can be extremely beneficial to your online business because they give you a new way to reach new prospects as well as a new way to interact with your existing clients...

Internet Safety For Kids That Parents Should Know
Submitted by: Paul Phoenix

Internet is a believed to be a remarkable source through which school going kids can look for their desired information...

How to Network Using Twitter
Submitted by: Mark Thomas Walters

Twitter is the fastest growing social networking site on the Internet, and is a very popular subject within Internet marketing circles...

How to Get Your Website Indexed by Google in 15 Minutes
Submitted by: Jeffrey J Smith

Getting your website or blog indexed by Google is a good way to get your new online venture established and receive free traffic in the process...

Online Jobs - Cash Making Power Sites
Submitted by: Jack L. Smith

Legitimate Online Jobs - Do They Exist “Cash Making Power Sites” is just what the title suggest – it is a membership site that offers you not one but FIVE (yes, you heard it right) ready websites for you to start generating income...

Web Development - Get Deep Insights About Your Customers to Precisely Target Products and Services
Submitted by: Daljeet Sidhu

No business can be successful without a deep understanding of its customers This knowledge can help you to adapt your business style and advertising techniques to attract more customers and expand your client base...

Google's SafeSearch Helps Keep Kids Safer Online
Submitted by: Gregg Housh

The internet is a evolving community, home to everything from exotic recipes to ridiculous videos to books that are no longer in print...

Isnare Free Articles Portal
Submitted by: Norma Holt

Even the appearance of a mountain range can send your heart into a flutter The higher the peak the more awe it generates and the greater the challenge to some who simply have to get up there with it...

Setting up Google Analytics For Your Bridal Shop's Web Site
Submitted by: Tom Esposito

Google Analytics is considered by many to be the most complete package for tracking traffic on a website...

Isnare.com Footer Divider

© 2004-2009. Isnare Free Articles - An Isnare Online Technologies Free Articles Project. All Rights Reserved.   Privacy Policy