Authorship, Small words and little tags that do good

How’s that for a confused, or at least confusing article title?

I posted a blog article last week about some DIY stuff which wasn’t particularly noteworthy and truth be known I just wanted to post something to see if I could test a fix for the authorship tags on the site.

Back when authorship was just a toddler in the Google suite of obscure and not so obscure tags I went with some advice from somewhere to put a link with ‘rel=author’ on every blog post page to my profile page and slap a link on the profile page to my Google+ profile and I’d be done.

That worked for about, well. I’m not entirely sure it did. For exact match entire passages and phrases from my posts I’d sometimes seen my face staring back at me from the search results, but mostly nothing changed.

At work however we have a blog contributor who is consistently showing up as his miniature self smiling beside search results for his posts even though none of the requisite link tags are in place.

We have no links to his Google+ profile anywhere on the site and the only part of the authorship puzzle that’s been met is the contributor entry on his Google plus page.

I’m not going to go into any detail about how to make authorship work, there are a lot of good articles around the web on how that can be done and Google’s own help pages are as good as any now that it’s well established.

After the page was indexed fully I ran a range of different test searches which told me that authorship was working along with confirming a bunch of other odds and sods that should be common knowledge if you’re in the online marketing game.

What I found interesting though is how subtle search phrase changes changed when authorship shows up in the results or when it doesn’t. Equally I discovered some small words that made differences as well when I normally wouldn’t expect it.

So, without further delay, a pile of search results screenshots with comments for each…

130825-01First up we have a mixed up phrase from the blog post, and I’m top result. That’s mission 1 achieved, the page is indexed and we can move onto testing some other ideas out.

As a group of keywords ‘portable risks side note’ is not that stunning but you can see immediately how less than ethical SEO companies might convince a customer that a set of keywords are critical and get a rank for that combo under the guise of long-tail search. Followed quickly by the bill and a rapid exit to the nearest hills.

Long story which I can’t really post about, but I recently helped a friend with exactly that problem who’d paid handsomely for an SEO consultant to get their pages to rank well for a totally useless set of keywords.

This stuff is not rocket science but if you want to be top hit for ‘used car’ that is a whole other can of worms and requires a lot more effort as the content I’m using for these test searches is not really what happens in the real world.

An interesting thing to note about this search result is that the snippet of text is not the meta description for the page.

SEO tidbit #1 from this blog post: No matter how much time you spend crafting the description tag it may not show up in the serps these days if the search terms don’t match the description.

Oh, and the authorship worked. Who’s that attractive looking chap beside the search result?

130825-02

I did a bit of messing about with combinations of keywords and found that this one still gave second place result but dropped my authorship. Again the search phrase itself is pretty meaningless but it highlights something about Authorship.

If Google doesn’t think who wrote the article is that important to the search results you wont get the extra credibility in the search results page. That means if you’re struggling with testing the markup pay a bit more attention to what you see in Google’s structured data testing tool and what you’re content is about rather than just trying to get your photo up on what you think the page should rank for.

Note that the snippet is different again. Still nothing from the description tag. Instead this time we have a mash-up from two paragraphs highlighting where the algorithm says the keywords were found within the body of the content.

130825-03

A simple change here. Removed ‘on’ and there’s 70,000 or so more results found in the index but it doesn’t change the top few results. The fact is that small words sometimes don’t matter, despite how much your english teacher might have insisted otherwise.

Clearly if you were prepared to click a few more pages into the results you’d see a difference though, so let’s try something different.

130825-04

Same words with the ‘on’ back in the mix with a different order and we’ve dropped a couple of hundred thousand potential results even though the top three results have not changed.

So, the order of small words does matter. It would seem that the combinations of ‘on side’, ‘on note’ and ‘note on side’ are probably more common in content than ‘on portable’.

I’m obviously mincing my words, almost literally, to make a point here.

When in the English language you write, order important it is. Unless you’re Yoda that is.

Google have long said that well crafted content is important and phrasing that is common to your target audience is going to rank better than the best writers missive or random words on a page that used to be common in the AltaVista days.

As a total aside, if you’re interested in SEO and don’t know what I mean by AltaVista days, you missed out on a golden age for SEO consultants that allowed people to do all sorts of things that would get them kicked from the index of even the slackest engine now. Ahhhh, those were the days.

130825-05

Another shuffle of keywords and the third result has vanished down to about position six although cbsnews and I are still batting pretty well for some obscure text.

‘Notes on’ in this case is what starts the page title tag and the first H1 on the page for the result that’s popped up to number three on the hit list.

That right there is old-school SEO advice. Have relevant title tags and heading structures with text people will search for. If your page is about tomatoes having the page title ‘Shoe leather replacements for tomatoes’ and the first H1 tag the same will probably get you more search traffic for shoe leather than it will tomatoes.

130825-06

One more shuffle of keywords and this time a more correctly constructed phrase from an English point of view and it’s got four of the five words in the same order as my post so the dashing fella on the left of the search makes a sudden re-appearance.

So even though this is not an exact match to the text the algorithm calculates that the order makes better sense and is more likely to be well structured content deserves that little bit of extra attention the authorship gives.

cbsnews.com is still there but lets face it… If my site had as much link juice as a major news site I’d have Google adsense on here and be counting my sports cars parked in the garage of my French Riviera holiday home not writing this for entertainment.

The osha.gov site appearing there is interesting, but again .gov sites have credibility oozing from their TLD so nothing surprises me when I see them showing up in search results.

130825-08

Now for a little image searching using ‘testing FT-857’ seems like a pretty good image search term if you’re into amateur radio and want to find out about the FT-857.

The image is result four which is a good slot and your SEO handbook will tell you the image names are all important for such things and the alt tags. Don’t forget the alt tags.

In this case the alt tag is indeed ‘Testing on the FT-857’ and searching for exactly that will bring the image up to the top hit, not the lowly number four slot.

What about that image name? It’s actually ‘130818-171341-0001.jpg’.

Correct and contextual naming of images is a good idea but don’t forget the auxiliary tags around images. The only place FT-857 appeared before this post on my entire website is in the alt and title tags for that image.

130825-09

Better than that, this search gets me top hit for a a combination of keywords from the page and FT-857 which only appears in the alt tag for the image and the title tag for the link to the popup copy of the image.

If I’d bothered to name the image in a useful fashion I could probably rank for some useful phrases as well as that one. This is basic stuff but day in day out I see SEO advice about all sorts of other things. Getting the basics right on this is going to get me traffic for people testing FT-857 Radios with power pole connectors.

130825-10

One last screenshot to round out the observations for the evening. An image search for ‘gel FT-857’ showing a top hit for my photo. The word ‘gel’ is not in the alt tag for the image, but it is in the title attribute for the link to the popup.

If you hang plain english title tags on links to images and content you can improve their positioning for key words and phrases in the linked content or in this case can give you a ranking for a term that does not exist anywhere in the content apart from the tag.

By way of a disclaimer and for the sake of completeness: I did these searches from a New Zealand IP on www.google.co.nz, using google chrome in incognito mode to avoid search history slanting the results. Your results may vary if you’re in a different country of have substantial search history for similar terms or sites. Some of them were on my Ubuntu Desktop and the balance on a Windows 7 laptop, because I happen to be sitting in front of the telly pretending to watch something, so the fonts look slightly different in some of the screenshots.

(I did do a bit of testing from a US IP using google.com in incognito mode and got very similar results, although the serps were slightly different the observations would be the same. If you’re reading this more than a week after I wrote it the search results will probably have changed, the web is a dynamic place.)

Website Indexation on Google Part one

The web site at work has many issues, and one of the slightly vexing ones was that a site: search on google only showed 540 odd of the 1100 pages in our site map. Google webmaster tools was showing 770 pages indexed, but that still left 400 pages missing in action.

I’m a realist and understand that google will never index everything you offer up, but we also have the paid version of google site search and it can’t find those pages either which is a little more annoying as that means that visitors who are already on our site might not be able to find something.

The real problem with partial indexation is where to start. What is it that Google hasn’t indexed exactly? How do you get the all seeing google to tell which of the 1100 pages are included, or not, in organic search results?

I spent a few meaningless hours on the Google webmaster forums plus a few more even less meaningful hours scraping through various blog posts and SEO sites which led me to the conclusion that either I was searching for the wrong thing, or there was no good answer.

At the tail end of the process I posted a question on the Facebook page for the SEO101 podcast over at webmasterradio.fm, which incidentally I recommend as a great source of general SEO/SEM information.

After a bit of a delay for the US Labour day holiday the podcast was out, and I listened with great interest in the car on the way to work. Lots of good suggestions on why a page might not be indexed, but no obvious gem to answer my original question. That being how to tell what is and what isn’t being indexed.

Luckily for my sanity Vanessa Fox came to the rescue in a back issue of ‘office hours’ another show on webmasterradio.fm. Not a direct solution to the problem, but an elegant way to narrow things down, by segmenting the sitemap.

One Site, many sitemaps

One Site, many sitemaps

In a nutshell; chopping the site map up into a number of bits allows you to see where in the site you might have issues. With only 1100 pages I could probably have manually done a site:search for each URL in a shorter time than I wasted looking for a soltion, but then I’d not have learnt anything along the way, would I?

So leading on from that, I thought I’d post this here on my site with one or two relevant keywords so that anyone else with the same question stands a chance of getting to the same point a little more quickly than I did!

As for the pages that were not indexed? A chunk of our news pages, which may be due to javascript based pagination of the archives, and a fair chunk of the popup pages which I’ve yet to full investigate.

Onwards and upwards.

Google Location, the best of results, the worst of results

Google announced on their official blog a couple of days ago that location was the new black. Enhancing search results by allowing the surfer to rank results ‘nearby’, or pick another location by name.

This is just a continuation of the direction on-line technologies have been moving with social media leading the charge. Services like foursquare giving people their constant location fix. Twitter has even gone local allowing you to share your location in 140 character chunks.

Up until now the only real down side of this location hungry trend has been the exact same thing touted as the benefit of telling the world where you are. Namely that the world knows where you are. Privacy concerns are rife as the mobile social media crowd go about their daily lives in a virtual fish bowl.

pleaserobme.com highlights this by aggregating public location information from various social networks and figuring out if your house is empty. How long before insurance companies wise up and use Social media as a reason for not paying out on your house insurance? “But Mr Jones, you told the entire world you were away from your house, you encouraged the burglar.”

The last thing on earth I would want to do is share my location real time with the world but I was keen to experience the Google location search to see how it actually effects search results.

The impact of location based search is going to be far more noticeable in the real world than the failed insurance claims of some iPod users.

The Google blog entry says that this is available to English google.com users, but we don’t have it here in New Zealand yet. We might have been first to see the new millennium, but not so much with Google changes.

To get my Google location fix I used a secure proxy based in the US and took in the view or the world from Colorado. Pretending to be within the 48 States is handy for all sorts of things.

LocationI did some searches from a clean browser install on a fresh virtual machine, so that personal search preferences or history would not taint the results. I then set about testing some long-tail search phrases that give top 5 results consistently for our website at work.

No surprise that I got essentially the same results as I do here in New Zealand, but with more ads due to targeted adwords detecting that I was in the US of A. What was disturbing was that selecting ‘nearby’ knocked our search result down past the tenth page of Google.

We sell products to the whole world, and do not have a geographical target so the location search will clearly have an impact on our organic results as it rolls out. A business which is targeting a local area such as a coffee shop or Restaurant might well benefit from the location search, assuming that Google knows where your website is.

But there’s the rub. How did Google decide our website was not near Colorado? Our webserver lives in Dallas TX, our offices are in New Zealand and Thailand, and we regularly sell products to over thirty countries.

Which leads to the impact of location for web developers and the SEO community. How do you tell Google what your ‘Local’ is? I messed about with location names, and putting in ‘Christchurch’ where our business is based got our long tail hit back up to the front page, but only a fraction of our business comes from Christchurch, dispite it being where our head office is.

I suppose anti-globalisation campaigners in their hemp shirts and sandals will be rejoicing at this news but I’m not so sure I’m going to be celebrating this development with the same enthusiasm.
A quick search for meta-tags or other methods of identifying your geographical target came up dry, and even if there was one we can only gently suggest to Google that it index and present things the way we as web site owners want.

When the dust has settled and the ‘Nearby’ link is clicked Google are the only ones who know what the best results are. It just might be that their best just became your worst if your business has a broad geographical target and weak organic placement.

So much for a Search Engine race

I’ve just finished watching a re-run of the BBC’s Top Gear. Richard Hammond took on an RAF Eurofighter in a Bugatti Veyron in one of their classically contrived races.

The Eurofighter came in first, but the Veyron wasn’t too far behind. It was a race of sorts, give or take. I only wish I could say the same for my attempt at search engine spider racing.

Google came in first by a country mile, with a complete indexing done in about 84 hours. We’re 10 days, a full 240 hours, into the race now and Yahoo has managed to get a grand sum total of one page indexed.

As for Bing. Well.

Bing is hanging out down at the start line with it’s eye candy interface clinging onto some pages that have not existed on this domain for at least two years.

While it is possible that Microsoft have developed a time machine, I think it’s more likely that msnbot doesn’t know an http 404 response from a mouse pad. Combine that with an inability to honour robots.txt and I’m not sure the folks up in Seattle know for sure if they’re running a search engine or a cake stall.

There has been a buzz in the blogsphere about real time search for a while, with twitter leading the charge in delivering on the dream. Twitter of course has the advantage that all the content it needs is provided on it’s doorstep by hordes of twittering users.

Back in the world of conventional search engines the battle to gather content is fought by the spiders. Clever robots sneaking around the web on the constant lookout for new or changed stuff. Indexing, ranking, summarising. The unsung heroes in our digital world even.

No prises for guessing how poor the real-time search ability of Bing is going to be if it takes longer than 10 days to index data that was handed to it on a platter, and 2 years to remove content that has been returning a 404 for that long.

My website is an internet backwater, I’m quite realistic about that little detail, but if Google pays attention to me, I’ll focus my SEO attempts on Google and ignore the other bit part players for the time being.

Bing and Yahoo slow off the mark

Well, in my humble opinion it’s been a very poor showing from Bing and Yahoo in my search engine race so far.  Google has now spidered, and indexed pretty much the whole site, but Bing and Yahoo have failed to fully index even the home page despite visiting the site a couple of times.

Yahoo is coming in runner-up as it has made a start on the process, with their site explorer showing the new <title> tag from the site.  That’s a clear step up from Bing which still shows URL’s which have not functioned on the site for a number of years.

Searching for site:trash.co.nz on Google shows me 35 listings, which includes some of the old obscure stuff which has been given a new burst of life due to inbound links and the effect of having the 404 page responding with valid HTML as I described in my previous post.

Bing gives 7 results, one of which is disallowed in robots.txt, the old home page entry and five links which were removed from the site in 2004 when I sold my hosting business, although I believe there may have been valid pages on those url’s up until 2007, so we’ll give it the benefit of the doubt on that.  Bing gives the same results for www.trash.co.nz and trash.co.nz as does Google.

Yahoo takes you to the site explorer page when you search on site:trash.co.nz and the results speak for themselves. 3 URL’s, all of them with old content, but if you specifiy www.trash.co.nz as the url it does show the new <title> so I think it’s going to come in second place, leaving Yahoo out in the search engine cold.

I was surprised that yahoo hasn’t figured out that www.trash.co.nz and trash.co.nz are the same thing mind you, although that may well come with time as it’s databases update.

Almost 6 full days after submitting the sitemap to the big three, and it’s pretty apparent that Google’s spider and indexing process is far more effective than either of its cohorts.

Takeaways for today:

  • Submit cnames for your sites separately to the Yahoo spider, it treats them separately, or at least when partially indexed it does.
  • Don’t expect to see action in under a week from Bing or Yahoo when introducing a new site to the web.  (Once it’s indexed that may be different, as it should monitor the sitemap, well see!)

Houston, we have a winner in the search engine race

The race is in its final stretch now, with Google coming in the winner sometime over night, NZ time.  The new content of a few of the pages is up there, and searchable.

ref: The search engine race, Content vs Presentation

Google wins

Google wins the indexing race

Not only that but if I cherry pick some phrases from my blog posting from last night I’m hit number one and two which re-enforces some of the basic precincts of search engine optimisation.  What’s also interesting is that the content snippet that Google presents under the title is different for a given page depending on what you searched for.

Hmmm, SEO theory #321 out the door.  The meta description is not always used by google to present your results.

See the screenshots below.

SERP

SERP 1 – Google

Serp

SERP 2 – Google

The screenshot on the left shows the search results for ‘google lips tightly sealed non-disclosure’. Top hit is trash.co.nz/blog.html with an extract from the blog posting from yesterday that had that text in it.  The second hit is a shortened version of the link I posted to Twitter, going to the actual blog posting.

The second hit is a direct one to the blog posting via my link-shrinker.  This hit shows the description meta-tag verbatim as common wisdom would suggest.  The link was posted to Twitter about 10 minutes after I posted that blog entry last night, so it got spidered, indexed and searchable in under 12 hours which tells us that Google definitely plays favourites.

So, come on down screenshot number two.   Searching for ‘trash.co.nz blog’ gives me the two top hits again, but this time it’s given the meta description tags for both hits, even though the first one is the same as the first in screenshot two.  Hit number three is my twittered link again.  Nice.

The other interesting thing about this is the dates that appear at the left of the descriptions.  They are not in the meta tags, but boy-o-boy do they improve the effectiveness of the results presentation in Google.
Note that the date shown for http://trash.co.nz/blog.html is different for the two result sets.

I’m picking they came verbatim from the xml sitemap in the case of screenshot number 2, and in the case of screenshot number 1 google has done something clever and used the change date for the target of the link in the content.

Takeaways for today:

  • If you don’t already have a valid xml sitemap, what on earth are you doing reading this?  Get to it!
  • Meta description tags are all very well, but if your content is tag-soup you may still get crap results presentation.  Valid, clean HTML gave me two sets of clean results.
  • Googlebot plays favourites with twittered links, one would assume due to link popularity rules/formula that are secret squirrel stuff at Google HQ.
  • It takes about 84 hours for google to spider, index and summarise new content.

In the next couple of days it’s going to be interesting to see how long the old home-page text persists in Google’s cache, and what we get as results from bing and yahoo as they bring up the rear.

Adding onto that it’s going to be interesting to see what the refresh period for changes to the site is going to be now that the new sitemap is being used by google.  Let the SEO games begin!

Content presentation vs link ranking in google results

As part of the search engine race to index my new blog site that you’re reading now I’ve noticed some interesting behaviour from Google. (The race)

It’s pretty well known that Google calculates a page rank for every website it indexes, and that the page rank is a complex beast created from an aggregate score of a whole bunch of things. Link popularity, keywords, content, meta tags, the phase of the moon.

Millions of words have been written about the mystery box that is Google page rank, and all I can say definitively is that it exists, and SEO ‘experts’ can only guess at exactly how it works because the people at Google in the know are keeping their lips tightly sealed under non-disclosure contracts and the pain that only corporate lawyers can inflict.

My observation is about the presentation of content, and the calculation of page rank. It appears to be two separate processes, which leads me to assume that the Google monster keeps it’s data in at least two separate databases one for the page rank, link popularity and URL information and a second for the content, titles, and summaries information.

For that matter there is probably a third, which has the all important keywords index information with it’s magical mix of synonym and phonetic matching that makes Google far more useful than it’s competitors, or at least I my humble opinion.

I’m making this all up on the basis of the changing search results for ‘trash.co.nz’ after I put this site online and submitted the new xml sitemap to google. See the before and after screen shots below.

Old results

Results before the October 10th.

New results

Results after the 10th

The ‘after’ is around 60 hours after the before. So what made the extra results appear for ‘trash.co.nz’ when they did not appear two and a half days ago?

The two extra pages are linked from the forums site www.cnczone.com and get 4-5 hits a day from there. Before I put the blog online they had some holding pages saying that I’d moved the content to another one of my sites, www.ohmark.co.nz. The HTML was poorly formed, there was only one link on the page, no meta description tags, and little content of any sort.

Skip forward to now. Those links land on the new CMS and get redirected to the 404 page. The new page has properly structured HTML, meta description tag and multiple links. So, by my reasoning the page rank of the page increased, so the links became relevant enough to show in the results for trash.co.nz. Up until I change the content that was not the case, as the quality of the content on the old pages was quite low.

So, why is the new content not showing in the results? I’ve got a valid, unique title, and an equally valid, if not slightly silly meta tag description.

That’s where database number two comes in. The quality and ranking of my newly improved pages was stored by the spider on it’s first visit while following the link from www.cnczone.com. That went into database number 1, the page rank database we’ll call it for want of a better term.

At some stage in the next day or so I imagine that database number two will be populated by another visit from googlebot, where it will scrape the tag, and description and update the results in the page. This step will then probably populate database number three with the keywords, which will then recursively affect he page rank via link relevancy and the phase of the moon.

Also note that there is no ‘Cached’ link under the pages, I’m assuming that the second pass of google bot will enable this, and even though it had the description and title for a link the quality of the page was not high enough in the past to warrant caching a copy.

The takeaways from this are:

  • HTML quality does matter. If you’re involved in SEO work and didn’t know that you’ve probably chosen the wrong career.
  • HTML quality effects Google’s cache. It doesn’t cache junk pages.
  • Googlebot makes multiple passes to create an update to a page. In this case it got the pagerank / link quality rank up first, and has not got the content yet.

Now, lets see who wins the race to index the site fully.