It’s no secret that it is getting a bit more difficult to get webpages indexed by search engines, primarily Google. The internet continues to grow at an exponential rate, and Google themselves have admitted it is impossible to index it all, even if they wanted to.
I’ve been at this for over 15 years now and rarely did I ever see anyone complain about indexing issues until the past 12-18 months. Now it seems to be something a lot of site owners and SEOs are struggling with.
I have 3 things that I always keep an eye on that will help insure your important pages are being indexed.
Before I dive into those, I want to get a few things out of the way that I think are fairly common sense when it comes to indexing. Thing thing about common sense though is that it is not all that common.
Avoid duplicate content – Although Google indexes duplicate content all the time, original content has a greater chance of being indexed. Google has limited resources. Are they going to continuing indexing the umpteenth version of the same piece of content forever?
Check your robots.txt file – Make sure you are not unintentionally blocking bots. Be specific. Remember that a file like the one below
User-agent: *
Disallow: /category
In addition to blocking your /category directory, it would also block URLs like:
Check for noindex tags on pages – I cannot count how many times I have seen this mistake happen. Someone accidentally adds noindex tags to pages. The most common cases I see are in WordPress where someone accidentally checks an option to noindex a page in whatever SEO plugin they are using or when a website redesign is happening and the developer had noindex tags on the staging site. Then they forget to remove them when they push the site live.
Okay, now let’s dive in to my 3 tips.
1) Do not noindex category pages that are a part of your site structure.
This gets discussed a LOT in SEO forums. Should you or should you not noindex category pages. I did a note last year discussing whether or not you should noindex category pages. There is also a video version of that one.
The TL;DR version is there is no good reason to add a noindex tag to them, and if they are part of your site structure, you can actually do some damage to your site by adding a noindex tag. By being a part of your site structure, I mean that there are links on your site that point to the category pages. A visitor, or search engine spider, could reach them by clicking a link from somewhere on your site.
Google will crawl these pages less often, if at all. As you publish new content, in those categories, those posts will be less likely to be reached by search spiders.
2) Internal Links
It is important to have a well-constructed internal link structure. Not only does it help with rankings, but it also makes your site easier for search spiders to consume. Google determines the importance of any page on the internet based on links pointing to that page.
If there are only 1 or 2 internal links pointing to a page, what do you think that tells Google? If your own site doesn’t think a page is important, why should Google think it is?
3) Crawl Depth
Lastly is crawl depth. Crawl depth is how many links from the home page would a visitor have to click on to reach a page. A good rule of thumb is to make sure all pages are reachable within 3 clicks. For really large sites, that is nearly impossible, but still a good thing to strive for.
If a page takes 9 clicks to reach from your home page, what signal is that sending to search engines about its importance?
If you have ever used a tool like JetOctopus or ContentKing to analyze your server logs, you can find a definite correlation between crawl depth to crawl rate and indexation. Pages further from the home page will be crawled less frequently and indexed less often.
Make your content easily accessible.