In this blog post, I am going to talk a bit about some best practices to keep in mind from an SEO perspective around Multilingual and Multi-Country Sitecore solutions. My colleague at Verndale, Kevin Schofield, contributed much of the knowledge here for identifying steps to follow for the best SEO experience when you are executing on a multilingual, multi-country strategy.
I touched upon this briefly in my previous post, but you seriously want to consider country-code Top Level Domains (ccTLD) for each country, if you either already own those domains, or they are available.
A ccTLD provides a strong signal to both users and search engines that your site and its content is intended for and targeted to a certain country (and not exclusively targeting a specific language).
A ccTLD strategy is ideal if the strategy dictates a specific presence and messaging within a region / country. ccTLDs are restricted within some countries unless you have a physical presence and go through an application process. In many cases, this process can be expensive as you must apply for, purchase, and maintain each ccTLD as if it were a different website. Additionally, ccTLDs have no effect on the domain authority of the primary website (i.e. yoursite.com) because they are seen by the search engines as unique websites. In essences, a ccTLD strategy would necessitate development of content, and optimization practices for each country version of the website and could potentially require different infrastructure (if servers or data need to reside in the country itself).
If the ccTLD strategy is selected, then remember EACH domain will need its own domain-specific search resources like XML sitemaps and the robots.txt file. Google will not be able to tell that these sites are managed centrally and will consider them to be standalone domains and therefore treated as such.
The good news is that each domain can be setup as their own configuration entry in the Site Definition config file and if you leverage the sitemap XML manager from Sitecore, this will automatically generate each country specific xml file with no extra development effort. At Verndale, we also manage the robots.txt file virtually and maintain its content for each language separately, we can adhere to these standards with no extra effort.
Instead of using ccTLD, you could use the same domain with different languages embedded directly after the domain:
If the multilingual content is NOT targeted at specific countries (eg Spanish content is just for people that speak Spanish, not tailored for a specific Spanish speaking country), the content SHOULD be kept on a single, non-ccTLD domain with the language content divided up using language code embedded after the domain.
Note: Sitecore makes it easy to do this, with a Link Manager attribute 'languageEmbedding'. Unfortunately Sitecore has this as a global setting, and you can probably see that within a single instance, you may have many sites, some with ccTLD and some without. Verndale has customized our Link Manager settings to use a custom Link Provider to look for this at a site-specific config file level, so you can set it differently per site.
Language is a highly relevant and important signal to search engines about how and where to rank content. The site should utilize the proper tags (rel=”alternative”) to serve up the appropriate language / translated content based on the selected location and language by the user. Quite often search engines have difficulty differentiating similar languages (US English vs UK English) which can result in duplicate content problems. To prevent this you will want to make it simple for search engines and users to determine which language speaking group of users are targeted. In the same vein, you will want to ensure that the same or similar content served on different URLs in the same language utilizes the proper canonical tagging to show search engines which content is preferred.
Canonicals are NOT used to identify language variants of pages, since they serve different audiences (the only pseudo-exception here is the language directory example above where /page and /en/page would be canonicalized because they serve the same audience). Instead, hreflang alternate tags are used to identify all the variants of each page on a site. These tags are placed in an array in the <head> section and list out each variant of a given page including the page itself. An example looks like this:
<link rel="alternate" hreflang="de-DE" href="http://www.yoursite.de/" />
<link rel="alternate" hreflang="es-AR" href="http://www.yoursite.com.ar/" />
<link rel="alternate" hreflang="es-CO" href="http://www.yoursite.com.co/" />
<link rel="alternate" hreflang="es-MX" href="http://www.yoursite.mx/" />
<link rel="alternate" hreflang="zh-CN" href="http://www.yoursite.cn/" />
<link rel="alternate" hreflang="en-CA" href="http://www.yoursite.ca/" />
<link rel="alternate" hreflang="en" href="http://www.yoursite.com/" />
Note: hreflang tags can be used across domains, and can identify BOTH country-specific and language-only variants (in the example above the English variant is for English speakers anywhere, while there are 3 different Spanish variants directed at three specific countries). These alternates can also be defined in the XML sitemap if that is easier than putting it in the <head> section (though keep in mind it would need to be reflected in the sitemaps for ALL domains, this is likely only a good option for directory-based language variants on a single domain).
We have automated the output of these with Verndale-built sites by specifying which language/countries map to which site/domain configurations and their target domains.
Finally, there are two other places in the markup that language should be identified in a multilingual environment:
By following the advice here, your multilingual, multi-country Sitecore instance will keep you in-line with current SEO best practices!