Archive for August, 2007

An interesting ‘bug’ in Google has been brought to light, one that could cost you a lot of money if you have a business that relies on natural Google search traffic.

Let me start with a bit of history…

To many in the SEO world, the myweddingfavors.com site is the poster child of success by mixing SEO with sales.  The key-phrase “wedding favors” gets over 108,000 searches per month according to current overture data, and myweddingfavors.com has been sitting at the top of the Google results for years now, making hundreds of thousands per month in revenue.  That is, until around 6 months ago when I first noticed it was totally gone (at least the home page).

Owner and renowned SEO Brad Fallon seemed a little tight-lipped about it.  Perhaps his fame had put his techniques under the microscope? Perhaps Google had discovered some black-hat techniques and banned the site?  Nobody was really sure.  All we heard though the grape-vine was the problem was being worked on. Finally about a month ago, I noticed that the home page was back on top (though toolbar pagerank still shows PR0 today).  Seems they had fixed whatever the problem was.

The glitch is finally exposed
It was only a few days ago that the explanation was given by Dan Thies.  It appears that the ‘de-indexing’ of their home page was a new kind of malicious attack that takes advantage of Google’s duplicate content filter by way of proxy servers.

Here is how it works.  There are thousands of web sites around the world that act as ‘proxies’ - - they dynamically read other site’s content and serve them up through a usually search-engine friendly link. For example, a URL such as www.myownproxy.com/content/http/msn.com could be the URL for a page that reads MSN’s home page (though it isn’t really).

Why are there proxy servers?  Usually they are to get around firewalls or content filtering by big brother (whether ‘big brother’ is the company you work for, your parent’s net-nanny type filter, your internet provider, or your oppressive country). Another big reason that some use a proxy is it gives privacy, as it can hide your actual IP address from the site you are visiting.

The malicious attack
So if you take dozens of these proxies urls that read from a competitor’s site, and then promote the links so they get crawled by the big G, then at some point Google might see the competitor’s page as duplicate content and deindex it in favor of the proxy page.

Their defense
The strategy to prevent the proxy attack involves a black-hat technique used for good in this case. Cloaking in SEO refers to serving up different content to the search engine spiders than you do a regular visitor. Usually it is used to pack a bunch of ugly keywords to the spider, while serving up the pretty page up to the real visitors. In this case, Dan (the consultant hired to fix defend the myweddingfavors.com site) serves up the following in the head section if a regular visitor (including proxies) looks at the page:
<meta name=”ROBOTS” content=”NOODP,NOINDEX” />

This tag will tell the robots not to index the page. So the proxy link will not be indexed because it doesn’t come from one of the known search engine robot addresses. But when a search engine spider comes to the page, the following is served instead:
<meta name=”ROBOTS” content=”NOODP,INDEX” />

Indeed, a simple and elegant solution that appears to be working for them now. Of course, malicious proxy server owners could easily modify their site to filter out this meta-tag, but at least it fixes the problem with the unknowing and innocent proxy sites.

Can Google fix it?
It would seem that a PR6+ home page should never be dropped in favor of a new proxy link. This seems like a no-brainer to fix, but I admit it may be harder than it appears. Google was contacted about this last year, and hasn’t yet done anything about it. After waiting around for a fix while more and more sites are being harmed, Dan Thies made the decision to release this to the world in hopes that it would spur some action. 

Hopefully Google will now make a fast fix before more top sites are harmed by this.  Thanks to those that made this information available so we’ll know what to look for if it happens to our clients.