When search engines begin to analyze a website, the first thing they look at is its URL structure. If what they see is confusing, they may end up indexing only a fraction of the site's pages, causing some of its content to go undiscovered in search.
How can one prevent this from happening? Basically, you'd need to:
- Let search engines know which of your site's pages should be crawled first;
- If some of your pages can be accessed via multiple URLs, create a map of your site's structure to tell search engines which pages correspond to which URLs.
Additional challenges arise when the SEO specialist is called in too late during a site's development process. In my experience, SEOs and web developers have differing perspectives on what an SEO-friendly URL is.
E.g., from the web designer's point of view, the URL - http://www.example.com/forum/viewtopic/t-121638.html- is completely normal. While from an SEO's perspective, the following version http://www.forum.example.com/section/topic-name.html- would be much more meaningful.
This is just a small example, but there are many more SEO nuances web developers tend to overlook. Hence, it's better if the SEO gets on board early in the project.
Common URL structure problems
There is a number of typical URL-related issues found on many websites.
1. Gibberish URLs
Be default, some content management systems produce URLs like this:
Such URLs do not tell much about the page, look unsightly in search results and are hard to recite from memory. A much user- and search engine-friendlier URL would be:
By looking at it, one can tell it leads to a page about a casio pocket calculator. If a human can get this, search engines can, too.
So, if your CMS produces meaningless default URLs, rewrite them in such a way that they make sense to users and SEs. If you don't know how to do it, here is a great URL rewriting guide for beginners.
Also consider using punctuation in your URLs. According to Google, www.example.com/green-dress.html is more useful than www.example.com/greendress.html.
2. Multiple URLs pointing to homepage
Quite often, a website would have several URLs leading to its homepage:
Although, in most cases, search engines can tell these are variants of your homepage, it's best to consolidate the SEO value of these pages anyway, because some people could link to the WWW version, while some might link to the non-WWW one.
1. You can redirect the WWW version to the non-WWW one, or vice versa, with a 301 redirect.
2. Alternatively, you can pick one version and set it as the canonical (preferred) URL for your homepage. I’ll talk about what canonical URLs are, and how to specify them it later.
3. Duplicate URLs caused by sorting options
Most online stores let users slice and dice information in different ways. For example, one can search for merchandise by type, by brand, by price, etc. What this creates is large volumes of duplicate URLs, many of which point to one and the same page.
For instance, if you go to JCPenney website and choose Women -> Shop Brands -> Levi's -> Levi's Field Jacket, the path you take will be recorded in the ULR:
However, this URL points to essentially the same content as a its "cleaner" version:
So, how do you avoid getting search engines confused, and let them know these are the URLs for the same webpage?
1. You can set a canonical (preferred) URL for each group of duplicate URLs that appear because one's path gets recorded.
2. Another option that’s more time-consuming, but saves search engines resources, is to close off unwanted URLs with the robots.txt file.
3. At the same time, blocking certain URLs completely may lead to also sealing off the link juice that could be flowing through them. So, another alternative is to create "Noindex" Robots meta tags (but leave them Dofollow) for the pages you don’t want to be indexed.
Please, find instructions on how to create Robots meta tags here.
4. Duplicate URLs caused by tracking parameters
If you use user tracking parameters such as session IDs, the utm parameter or others on your site, this could lead to the number of duplicate URLs going over the top.
1. You can tell search engines to simply ignore certain parameters in Webmaster tools.
In Google Webmaster Tools, go to Crawl -> URL Parameters -> Add Parameter
In Bing Webmaster Tools, go to Configure My Site -> Ignore URL Parameters
5. Problems caused by use of relative URLs
What are relative URLs? They are URLs bound to the context of the page they’re located on.
For example, if I wanted to link from webmeup.com to our pricing plans page, I could use either webmeup.com/plans.html (an absolute URL) or /plans.html (a relative URL) for this purpose.
The problem with relative URLs is that, when specified incorrectly, they can create infinite loops that would trap search engines in a sort of hamster wheel. Eventually, the bot would give up, but this is not in your best interests.
Webmasters are recommended to use absolute URLs over relative once these days. This may take a bit more of your web developer's time, but will let you avoid the problems relative URLs often create.
SEO-friendly URL structure best practices
In case you think you don't have any of the common URL structure problems listed above, here are some site structure best practices webmasters should follow to help search engines better understand their site.
Sign up for Webmaster Tools
For example, if you see duplicate meta descriptions in Google Webmaster Tools, this likely means there are webpages on your site that can be accessed through multiple URLs.
To see if that's the case, go to Search Appearances -> HTML Improvements.
In Bing Webmaster Tools, these may appear under SEO Reports.
Create a robots.txt file
When a search bot comes to your site, it first off looks if you’d left a robots.txt file for it at example.com/robots.txt.
It's a text file that lists which pages or sections of your site the crawler should NOT visit. Robots.txt is often used to block pages with sensitive information or duplicate pages that can be reached via multiple URLs.
Instructions in this file usually look like this:
The first 2 lines mean that all robots must nor access the /search/ section of the site, and the next 2 lines mean than no crawler must access the /wp-admin/ segment of the site.
As you can see, you can block not only individual pages, but also entire sections of your site (or even the whole site).
One can create a robots.txt file by hand (see instructions) or with the help of a robots.txt generator.
Submit an XML Sitemap
An XML Sitemap is a list of pages of your site that should be to be crawled and indexed by the search engines. An XML sitemap is different from the sitemap one creates for human visitors. Google recommends creating separate sitemaps for machines and people.
An XML sitemap serves 2 main purposes:
- It tells search engines which pages of your site are most important
- It helps search engines sort out URL duplicates that may exist on the site
Google has published XML Sitemap guidelines one should follow when preparing their sitemap for search engines. When it's ready, you should upload the sitemap to your site, and either link to it from your robots.txt or submit it via Webmaster Tools, or both.
Make use of canonical tags
Remember we spoke about the situation when there are many URLs pointing to one piece of content? That’s when canonical tags come handy – to signify that a group of URLs can all be attributed to one, standardized version of a URL aka the "canonical" one.
How do you specify that? You need to add a <link rel="canonical"> element to each duplicate page in the group. Here is the actual piece of code (just replace the marked URL with that of your canonical page).
<link rel="canonical" href="http://www.example.com/product.php?item=green-balloon"/>
This element goes into <head> section.
Making a site's URL structure SEO-friendly isn't hard. One only needs to follow SEO best practices and join site development (or re-design) early on. Plus, do make sure there are no URL issues reported in Webmaster Tools and then you may rest assured that your website is indexed in full.