Monday, April 11, 2011

Pagejacking - identifying and dealing with pagejackers ...


Pagejacking - identifying and dealing with pagejackers

What is pagejacking?

In essence, pagejacking is the copying of a page by unauthorized parties in order to filter off traffic to another site. The copying doesn't include just the wording - it's the whole box and dice. Traffic to the illegitimate page is then usually redirected to a competing, or at times, totally unrelated offer.

Why do people pagejack?

When you have the good fortune of having a page that ranks highly in the SERP's (Search Engine Results Pages); it brings you both good and bad attention. Some unscrupulous individuals make take copies of your pages in an attempt to get equally high, or higher rankings and therefore capturing some of the traffic that really should have gone to your site.

In the instance where the pagejacker is also well versed in search engine optimization; it can be the case that the *majority* of search engine traffic that usually arrives on your site is redirected to the pagejacker. As you can imagine, this can be very costly to your online business.

How is pagejacking executed?

The "newbie" pagejacker simply copies your page in it's entirety and pastes it into another page on his own site. They may add some of their own offers to the page and adjust the links in your content to point to other pages on their site. Only the most stupid of pagejackers use this process.

The more advanced pagejacking strategy is quite clever. First, a copy of your page is taken. A page is then created on the pagejackers site that is basically a carbon copy of your content - including meta-tags. The pagejacker then adds extra scripting to allow only search engine robots to be able to read the content of the page. A 302 .htaccess redirect or meta-refresh is then used to automatically redirect human viewers to a totally different page - they never see your content.

How do I detect pagejacking?

You can detect pagejacking quite easily as most pagejackers will only bother with pages that have decent search engine rankings. Use the following process:

Identify a couple of phrases that are rather uncommon in a popular page on your site.

Run these phrases through a query on the most popular search engines such as Google, Yahoo and MSN. When querying the engines, ensure your encapsulate your query with quotes; e.g. "the flomble is pink with black stripes"

In the results that come back, as long as the phrase you have used is uncommon, you'll probably only see your page and instances of pagejacking. Even if you're not able to use an uncommon phrase as the basis of your search criteria, or you allow the reproduction of some of your content on other sites and you wind up with 100 results, go through all the results pages anyway. Yahoo, Google and MSN always show extended snippets from the page which will make it easier to identify a site that is using pagejacked content.

To confirm that the suspect listing is in fact pagejacked content, instead of clicking on the link to the page in the search engine results, click on the "cached" option. It will display the page as it appeared to the search engine robot the last time it was crawled. High ranking pages are usually crawled quite regularly, so the cached copy should be reasonably fresh.

How do I deal with pagejacking

Pagejackers by nature are a snivelling, cowardly breed and easy to deal with if you go about it in the right way.

If you have identified pagejacked content, the first thing you need to do is to save the cached copy of the page - this is very important as it is solid evidence.

One of the great features of Google is that when it displays cached copies of pages, it adds a box to the top of it with identifying information, including the URL and the date the cached copy was taken.

If you are using Internet Explorer, to save a copy of the cached page, simply go to "File", select "Save as" and in the "Save as type" dropdown option, choose "Web archive, single file (*.mht)".

This option will download everything, including images and the Google info box into a single file.

Having a single file makes it easier to transmit to other parties during the follow up process.

Once you have the archive file safely stored on your own computer, it's time to swing into action.

The first thing you should do is to contact the owner of the site. There is no need to be overly polite in the notification, but also do not be abusive.

Bear in mind that in some cases, the pagejacker may *not* be the actual site owner. The owner of the site may have employed an unethical optimization company who used the pagejacking technique. Regardless, it is the site owners' responsibility to deal with the situation.

I recommend writing a brief note along these lines: Subject = "Copyright infringement - (Domain Name)" Body =

"It has come to my attention that you have made an unauthorized use of my copyrighted work located here; (copyrighted work URL), by reproducing it on your site (their URL with infringing copy). At no time have I given permission for you to reproduce my original content in such a way.

A cached copy from Google of the illegally copied content on your site is attached, along with details as to its location on your site and the date it was gathered. It appears that my content is being used on your site as part of a pagejacking strategy and is visible only to search engines.

As the legal owner of this copyrighted content, I demand that you remove my property from your site immediately.

You have 72 hours to remove this content. If the content is not removed within this time frame, then I will find it necessary to take further action; including contacting Google, your hosting service and any other legal avenues I have at my disposal.

Sincerely - Your name - Your contact details

Ensure you flag the email as urgent and select the read receipt option in your email software. If after 72 hours, the content is not removed, you should first contact the company hosting the site.

These details, as well as the domain name registrant, can usually be found on the WHOIS record for the domain name by looking at the nameserver information, or by running a trace on the domain name.

If you do find it necessary to contact the hosting service, check the host's site first for guidelines for copyright complaints. Each company may differ slightly in terms of copyright infringement complaints processes and it's important that you follow their submission guidelines carefully - usually a US company will direct you to follow a process as laid out in the DMCA (Digital Millennium Copyright Act).

If the infringement has caused you a major loss in profit, then it is advisable that you contact your lawyer before taking any sort of action if it is within your means to do so.

How do I prevent pagejacking

In short - It gets to a point where you can spend so much time in trying to protect your online business from parasites and copycats that you may as well not bother with having a site at all. Monitoring is the key in relation to pagejacking.

Other possible negative effects of pagejacking

I've read a number of reports on the subject of pagejacking that appear to indicate that some search engines will favor the pagejacked page over the original one to the point that the original page will be dropped from the SERPs altogether. The reason for this is that most search engines employ duplicate content filters - and the way some work is that the higher ranking page is usually the one that is kept.

One very important negative effect of pagejacking is damage to your brand. For instance, a pagejacker may copy a page that contains multiple instances of your business or product name. If the pagejacker is successful in achieving consistently higher rankings than your own content, unsuspecting surfers may begin to associate the brand with misleading content and steer clear of it altogether.

Protecting your site from online parasites is an ongoing battle; I hope this article has assisted you in dealing with one aspect of this multi-faceted war.

Related learning resources

Preventing credit card fraud.

Pay per click fraud - ppc anti-fraud strategies and tools

Michael BlochTaming the Beast http://www.tamingthebeast.net/

Tutorials, web content, tools and software.Web Marketing, Internet Development & Ecommerce Resources__________Copyright information....

This article is free for reproduction but must be reproduced in its entirety, including live links & this copyright statement must be included. Visit http://www.tamingthebeast.net/ for free Internet marketing and web development articles, tutorials and tools!