{"id":239,"date":"2011-12-19T21:26:20","date_gmt":"2011-12-20T03:26:20","guid":{"rendered":"http:\/\/www.irishwonder.com\/blog\/?p=239"},"modified":"2011-12-19T21:36:39","modified_gmt":"2011-12-20T03:36:39","slug":"twitter-big-sites-fail-big","status":"publish","type":"post","link":"http:\/\/www.irishwonder.com\/blog\/2011\/12\/19\/twitter-big-sites-fail-big\/","title":{"rendered":"Twitter: Big Sites Fail Big"},"content":{"rendered":"<p>No, I am not talking about the infamous Fail Whale. And moreover, the big news is that it looks like the biggest fail of the last 5 years has just been fixed &#8211; but not due to Twitter&#8217;s efforts.<\/p>\n<p>Twitter is huge, this is not news to anyone. A site:twitter.com search in Google returns 1,750,000,000 results. Yup, that&#8217;s close to 2 BILLION. Yet, most of those results are actually non-existent pages.<\/p>\n<p>Yes, you heard me right. Google keeps in its index close to 2 billion non-existent pages from one domain alone. How come? Let&#8217;s look at the typical URL structure of a Twitter user profile:<\/p>\n<blockquote><p>http:\/\/twitter.com\/#!\/username<\/p><\/blockquote>\n<p>Now, what kind of URLs do we see in the aforementioned SERPs for the site:twitter.com query in Google? Something like:<\/p>\n<blockquote><p>http:\/\/twitter.com\/username<\/p><\/blockquote>\n<p>Notice the difference? the &#8220;\/#!&#8221; part is missing. Infact, it is not even possible to figure out if Google has at least a single URL with the &#8220;\/#!&#8221; bit indexed as these symbols would get ignored by Google so searching for site:twitter.com inurl:\/#!\/ just won&#8217;t produce any results different from site:twitter.com.<\/p>\n<p>Where did the whole issue come from? Some of you may remember that the new URL structure for user profiles came to exist <a href=\"http:\/\/blog.twitter.com\/2010\/10\/100.html\">over a year ago<\/a> &#8211; for some time afterwards, it was still possible to switch back to the old (less-Ajaxy) interface preserved under the old URL. Then, the old interface was killed and all old URLs were redirected to the new ones.<\/p>\n<p>Only, Twitter has never got the redirects right. They use 302 instead of 301! Here is a <a href=\"http:\/\/johnmu.com\/twitter-indexing-peculiarities\/\">2007 blog post by Google&#8217;s John Mueller<\/a> detailing what Twitter&#8217;s got wrong and how it should be fixed. Did they ever fix their redirects? &#8211; No! Do they think they are too good for SEO? Heck, even <a href=\"http:\/\/www.topherkohan.com\/topher-kohan-bio-and-headshot\/\">CNN has an SEO<\/a>, and did you ever think CNN should care much about search engines?<\/p>\n<p>Until recently, this profile URL redirect issue used to cause some serious troubles with cached versions of the corresponding pages in Google &#8211; all of them appeared as &#8220;that page does not exist&#8221;:<img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"Twitter 302 Redirects Cached\" src=\"http:\/\/www.irishwonder.com\/twitter-302-cache.png\" alt=\"Twitter 302 Redirects Cache Screenshot\" width=\"554\" height=\"141\" \/><\/p>\n<p>Lately, however, Google got better at indexing and caching their 302 redirects so the cache screenshots look better. But it was due to Google&#8217;s own action only, not Twitter&#8217;s. Are the folks at Twitter THAT blind and deaf?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>No, I am not talking about the infamous Fail Whale. And moreover, the big news is that it looks like the biggest fail of the last 5 years has just been fixed &#8211; but not due to Twitter&#8217;s efforts. Twitter is huge, this is not news to anyone. A site:twitter.com search in Google returns 1,750,000,000 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-239","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/posts\/239","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/comments?post=239"}],"version-history":[{"count":4,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/posts\/239\/revisions"}],"predecessor-version":[{"id":242,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/posts\/239\/revisions\/242"}],"wp:attachment":[{"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/media?parent=239"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/categories?post=239"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.irishwonder.com\/blog\/wp-json\/wp\/v2\/tags?post=239"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}