The web site at work has many issues, and one of the slightly vexing ones was that a site: search on google only showed 540 odd of the 1100 pages in our site map. Google webmaster tools was showing 770 pages indexed, but that still left 400 pages missing in action.
I’m a realist and understand that google will never index everything you offer up, but we also have the paid version of google site search and it can’t find those pages either which is a little more annoying as that means that visitors who are already on our site might not be able to find something.
The real problem with partial indexation is where to start. What is it that Google hasn’t indexed exactly? How do you get the all seeing google to tell which of the 1100 pages are included, or not, in organic search results?
I spent a few meaningless hours on the Google webmaster forums plus a few more even less meaningful hours scraping through various blog posts and SEO sites which led me to the conclusion that either I was searching for the wrong thing, or there was no good answer.
At the tail end of the process I posted a question on the Facebook page for the SEO101 podcast over at webmasterradio.fm, which incidentally I recommend as a great source of general SEO/SEM information.
After a bit of a delay for the US Labour day holiday the podcast was out, and I listened with great interest in the car on the way to work. Lots of good suggestions on why a page might not be indexed, but no obvious gem to answer my original question. That being how to tell what is and what isn’t being indexed.
Luckily for my sanity Vanessa Fox came to the rescue in a back issue of ‘office hours’ another show on webmasterradio.fm. Not a direct solution to the problem, but an elegant way to narrow things down, by segmenting the sitemap.
In a nutshell; chopping the site map up into a number of bits allows you to see where in the site you might have issues. With only 1100 pages I could probably have manually done a site:search for each URL in a shorter time than I wasted looking for a soltion, but then I’d not have learnt anything along the way, would I?
So leading on from that, I thought I’d post this here on my site with one or two relevant keywords so that anyone else with the same question stands a chance of getting to the same point a little more quickly than I did!
Onwards and upwards.