Working on a cloud

This blog is now coming to you from a cloud. A rackspace cloud server that is. Two of them in fact, the front end server running the CMS, and the back-end MySQL server.

The concept of cloud computing really isn’t all that new, but if you’re all at sea when it comes to clouds you might want to toodle over to Wikipedia and read about it there.

“This is the pointy end of the geek scale where crontabs are complex and the preferred editors have two letter names.”
The service I’m using is probably better described as cloud provisioning, in that I’ve got two virtual servers living somewhere in the bowels of the Rackspace data centre. I don’t have to care about memory sizing, disk space, network infrastructure, or anything else for that matter, I’m just renting some resources out of the cloud.

I picked how much memory and disk space I wanted in a few clicks then before the kettle had time to boil the server was on line and ready for configuration. If this service was available back when I was running a hosting business I’d probably still be running a hosting business, although I’d also be stark raving bonkers.

At this point I should say that I’m talking about virtual Linux servers here, not cloud hosting or full service shared hosting. This is the pointy end of the geek scale where crontabs are complex and the preferred editors have two letter names.

I’ve moved the blog onto the fluffy stuff to get a feeling for the service before I shift my work-in-progress link shrinker into the cloud as well. What I want to achieve with the lngz.org is simply not possible on a shared platform, as I want to build a tiered application which can scale quickly.

The traditional way of achieving this goal would be to slap your gold card down on the counter of a hosting company and then proceed to the bank to arrange a second mortgage on your house. Virtualised ‘cloud’ server services such as rackspace cloud, Amazon EC2 or gogrid lets you do the same things for a fraction of the cost and with amazing flexibility.

note: I’m not affiliated with Rackspace, I just think they provide a nifty service. 🙂

Bing checks in after 13 days. Dave Collins’ Blog

Finally some action from Bing in my search engine race, just after I said I’d given up. Thirteen days is not a startling performance by any measure, and there appears to only be the home page in the index so far, but that’s at least a good start.

Searching for phrases on the home page works, so it’s fully indexed, and the content it’s indexed appears to be from yesterday. What’s more exciting is the old URL’s which were invalid have now vanished from the site:trash.co.nz search, although the one disallowed in robots.txt is still there.

While we’re on the subject of Bing… I read an Interesting tid-bit on Dave Collin’s blog about Bing providing a twitter search facility. Seems like I might have spoken too soon when I pulled fun at Bing about not being real time. A quick play with the beta test Bing/Twitter search engine shows that Bing is at worst 2 minutes behind Twitter.

Dave’s blog can be found at http://blog.sharewarepromotions.com and has some good general ‘net marketing information. You can follow him on twitter at http://twitter.com/TheDaveCollins.

There’s a bit of information on the official Bing blog about their partnership with Twitter [here] if you’re into a longer read

Bing and Yahoo slow off the mark

Well, in my humble opinion it’s been a very poor showing from Bing and Yahoo in my search engine race so far.  Google has now spidered, and indexed pretty much the whole site, but Bing and Yahoo have failed to fully index even the home page despite visiting the site a couple of times.

Yahoo is coming in runner-up as it has made a start on the process, with their site explorer showing the new <title> tag from the site.  That’s a clear step up from Bing which still shows URL’s which have not functioned on the site for a number of years.

Searching for site:trash.co.nz on Google shows me 35 listings, which includes some of the old obscure stuff which has been given a new burst of life due to inbound links and the effect of having the 404 page responding with valid HTML as I described in my previous post.

Bing gives 7 results, one of which is disallowed in robots.txt, the old home page entry and five links which were removed from the site in 2004 when I sold my hosting business, although I believe there may have been valid pages on those url’s up until 2007, so we’ll give it the benefit of the doubt on that.  Bing gives the same results for www.trash.co.nz and trash.co.nz as does Google.

Yahoo takes you to the site explorer page when you search on site:trash.co.nz and the results speak for themselves. 3 URL’s, all of them with old content, but if you specifiy www.trash.co.nz as the url it does show the new <title> so I think it’s going to come in second place, leaving Yahoo out in the search engine cold.

I was surprised that yahoo hasn’t figured out that www.trash.co.nz and trash.co.nz are the same thing mind you, although that may well come with time as it’s databases update.

Almost 6 full days after submitting the sitemap to the big three, and it’s pretty apparent that Google’s spider and indexing process is far more effective than either of its cohorts.

Takeaways for today:

  • Submit cnames for your sites separately to the Yahoo spider, it treats them separately, or at least when partially indexed it does.
  • Don’t expect to see action in under a week from Bing or Yahoo when introducing a new site to the web.  (Once it’s indexed that may be different, as it should monitor the sitemap, well see!)

Houston, we have a winner in the search engine race

The race is in its final stretch now, with Google coming in the winner sometime over night, NZ time.  The new content of a few of the pages is up there, and searchable.

ref: The search engine race, Content vs Presentation

Google wins

Google wins the indexing race

Not only that but if I cherry pick some phrases from my blog posting from last night I’m hit number one and two which re-enforces some of the basic precincts of search engine optimisation.  What’s also interesting is that the content snippet that Google presents under the title is different for a given page depending on what you searched for.

Hmmm, SEO theory #321 out the door.  The meta description is not always used by google to present your results.

See the screenshots below.

SERP

SERP 1 – Google

Serp

SERP 2 – Google

The screenshot on the left shows the search results for ‘google lips tightly sealed non-disclosure’. Top hit is trash.co.nz/blog.html with an extract from the blog posting from yesterday that had that text in it.  The second hit is a shortened version of the link I posted to Twitter, going to the actual blog posting.

The second hit is a direct one to the blog posting via my link-shrinker.  This hit shows the description meta-tag verbatim as common wisdom would suggest.  The link was posted to Twitter about 10 minutes after I posted that blog entry last night, so it got spidered, indexed and searchable in under 12 hours which tells us that Google definitely plays favourites.

So, come on down screenshot number two.   Searching for ‘trash.co.nz blog’ gives me the two top hits again, but this time it’s given the meta description tags for both hits, even though the first one is the same as the first in screenshot two.  Hit number three is my twittered link again.  Nice.

The other interesting thing about this is the dates that appear at the left of the descriptions.  They are not in the meta tags, but boy-o-boy do they improve the effectiveness of the results presentation in Google.
Note that the date shown for http://trash.co.nz/blog.html is different for the two result sets.

I’m picking they came verbatim from the xml sitemap in the case of screenshot number 2, and in the case of screenshot number 1 google has done something clever and used the change date for the target of the link in the content.

Takeaways for today:

  • If you don’t already have a valid xml sitemap, what on earth are you doing reading this?  Get to it!
  • Meta description tags are all very well, but if your content is tag-soup you may still get crap results presentation.  Valid, clean HTML gave me two sets of clean results.
  • Googlebot plays favourites with twittered links, one would assume due to link popularity rules/formula that are secret squirrel stuff at Google HQ.
  • It takes about 84 hours for google to spider, index and summarise new content.

In the next couple of days it’s going to be interesting to see how long the old home-page text persists in Google’s cache, and what we get as results from bing and yahoo as they bring up the rear.

Adding onto that it’s going to be interesting to see what the refresh period for changes to the site is going to be now that the new sitemap is being used by google.  Let the SEO games begin!

Content presentation vs link ranking in google results

As part of the search engine race to index my new blog site that you’re reading now I’ve noticed some interesting behaviour from Google. (The race)

It’s pretty well known that Google calculates a page rank for every website it indexes, and that the page rank is a complex beast created from an aggregate score of a whole bunch of things. Link popularity, keywords, content, meta tags, the phase of the moon.

Millions of words have been written about the mystery box that is Google page rank, and all I can say definitively is that it exists, and SEO ‘experts’ can only guess at exactly how it works because the people at Google in the know are keeping their lips tightly sealed under non-disclosure contracts and the pain that only corporate lawyers can inflict.

My observation is about the presentation of content, and the calculation of page rank. It appears to be two separate processes, which leads me to assume that the Google monster keeps it’s data in at least two separate databases one for the page rank, link popularity and URL information and a second for the content, titles, and summaries information.

For that matter there is probably a third, which has the all important keywords index information with it’s magical mix of synonym and phonetic matching that makes Google far more useful than it’s competitors, or at least I my humble opinion.

I’m making this all up on the basis of the changing search results for ‘trash.co.nz’ after I put this site online and submitted the new xml sitemap to google. See the before and after screen shots below.

Old results

Results before the October 10th.

New results

Results after the 10th

The ‘after’ is around 60 hours after the before. So what made the extra results appear for ‘trash.co.nz’ when they did not appear two and a half days ago?

The two extra pages are linked from the forums site www.cnczone.com and get 4-5 hits a day from there. Before I put the blog online they had some holding pages saying that I’d moved the content to another one of my sites, www.ohmark.co.nz. The HTML was poorly formed, there was only one link on the page, no meta description tags, and little content of any sort.

Skip forward to now. Those links land on the new CMS and get redirected to the 404 page. The new page has properly structured HTML, meta description tag and multiple links. So, by my reasoning the page rank of the page increased, so the links became relevant enough to show in the results for trash.co.nz. Up until I change the content that was not the case, as the quality of the content on the old pages was quite low.

So, why is the new content not showing in the results? I’ve got a valid, unique title, and an equally valid, if not slightly silly meta tag description.

That’s where database number two comes in. The quality and ranking of my newly improved pages was stored by the spider on it’s first visit while following the link from www.cnczone.com. That went into database number 1, the page rank database we’ll call it for want of a better term.

At some stage in the next day or so I imagine that database number two will be populated by another visit from googlebot, where it will scrape the tag, and description and update the results in the page. This step will then probably populate database number three with the keywords, which will then recursively affect he page rank via link relevancy and the phase of the moon.

Also note that there is no ‘Cached’ link under the pages, I’m assuming that the second pass of google bot will enable this, and even though it had the description and title for a link the quality of the page was not high enough in the past to warrant caching a copy.

The takeaways from this are:

  • HTML quality does matter. If you’re involved in SEO work and didn’t know that you’ve probably chosen the wrong career.
  • HTML quality effects Google’s cache. It doesn’t cache junk pages.
  • Googlebot makes multiple passes to create an update to a page. In this case it got the pagerank / link quality rank up first, and has not got the content yet.

Now, lets see who wins the race to index the site fully.