Skip to main content

Localization Strategies to Build Your Sites

Your audience has diverse needs and preferences. They may share a common language or country, or it may vary based on a variety of countries and languages. Arc XP is here to support your efforts to provide an authentic experience for your audience to engage with your brand and content. Localization is a critical method for communicating with them. To Arc XP, localization means the process of adapting content, products, and offers, or services to meet the linguistic, cultural, and functional expectations of a targeted audience in a specific region or market. It is more than just translation, it involves how your experience communicates URLs, date formats, currency symbols, images, colors, symbols, and the tone and style used when speaking to your target audience. This guide helps you prepare your site to be offered as either a multilingual or multi-regional experience.

Important terms used in this document:

Multilingual: any website that offers content in more than one language. For example, a Canadian business with English and French versions of its site. For SEO, Google tries to find pages that match the language of the searcher.

Multi-regional: any website that explicitly targets users in different countries. For example, a publisher that offers different content to Canada and the United States. For SEO, Google tries to find the right country or locale page for the searcher.

Locale: from our friends at Localizely,

“…a specific combination of language, region, script, and other cultural elements that together define the preferences and conventions for a particular user group or geographical area. It is an essential concept in software development, as it helps developers create applications that can be tailored to the specific needs of users from different cultural backgrounds, ensuring a more inclusive and user-friendly experience.

The components of a locale are:

  • Language: The primary language spoken by the user group or in the geographical area, represented by a two-letter code according to the ISO 639-1 standard (for example, "en" for English or "fr" for French).

  • Region: (optional): The country or region where the user group is located, represented by a two-letter code according to the ISO 3166-1 standard (for example, "US" for the United States or "FR" for France).

  • Script: (optional): The writing system used for the language, which is especially important for languages that use multiple scripts (for example, "Latn" for Latin script or "Cyrl" for Cyrillic script).

Locale codes are typically formed by combining the ISO 639-1 language code and the ISO 3166-1 region code, separated by an underscore or a hyphen. For example, "en_US" or "en-US" represents English as spoken in the United States, while "fr_CA" or "fr-CA" represents French as spoken in Canada.

Option 1 - Fast Launch, Out-of-the-Box Experiences

The supported approach for offering out-of-the-box (OOTB) experiences with multiple languages or countries using Arc XP Themes is to create one website for each language. Themes has OOTB support to set a language on a per-site basis. When writing an article, the author must select the appropriate website for circulation.

Separate websites can use the same branding, styles, and pages. From the user’s point of view, you can offer a consistent experience for your brand; the content would just be in a different language, with localized formatting applied to important fields.

At the CDN level, you can set up subdomains for each website. For example, you could use English as the default site, like at https://mainsite.com, and the Spanish site could be located at https://es.mainsite.com. Using subdomains is a budget-friendly approach for offering localized sites. Identity and authentication services can share cookies across those sites, ensuring a seamless journey for registration, subscriptions, and other logged-in experiences.

Best Practices

Domain Names

When registering a domain name, keep it in English, without special characters or diacritics. While there is growing technical support for non-Latin characters, the solution doesn’t work in many places, like email services, notification platforms, and legacy systems. It also requires business users to work in Punycode, causing a headache for your audience. It also becomes increasingly expensive to operationalize and scale, far outweighing the benefits a localized domain name could provide.

For additional reading, see these articles:

Domains with different accents

Diacritics in URLs

Slugs

When manually creating a slug or Section ID, ensure that it uses Latin characters without any special characters or diacritics. Many Arc XP tools automatically generate slugs and apply formatting rules to ensure consistency. Learn more from the articles, below:

Configuring URL Formats

Arc XP Tags API

Canonical URL Not Generated for Non English Sites

Option 2 - Building Custom Experiences

Arc XP’s development platform enables your team to build unique and custom experiences. When leveraging our framework, it is possible to offer localized sites using subdirectories, instead of subdomains. This enables you to create one site that is segmented by region and/or language. This approach supports improved search engine ranking by consolidating all traffic into the same domain.

Important Considerations

Managing articles and circulation

Arc XP Composer supports different languages and allows your team to select one language per article. The list can be configured to only allow languages your site offers. Operationally, your team can organize and search articles by language. 

The downside of organizing exclusively by language is that dynamic WebSked Collections using the section query would not be able to organize articles by language. The solution to handle this would be to create sections for each language with subsections. Keep in mind that search within Composer works best when working with one language per site.

gen_sites_diagram.png

For editorial convenience and better workflow with multiple sections, we recommend creating Composer templates and persistent Quick Circulation Groups in circulation, like in the following image. To set up Quick Circulation Groups, contact Arc XP Customer Support. You can find links to Arc XP Learning Center documentation for Composer templates in the Appendix of this document.

gen_circulate_sites_sections.png

Composer Story Templates lets you create new stories with pre-established metadata selections (such as circulations). The Quick Circulation Groups let you pre-establish combinations of sections and sub-sections to include in your story with a single click, instead of adding each section individually. To configure the quick groups, contact Arc XP Customer Support.

Themes blocks must be forked

Arc XP Themes does not support multiple languages on a single site. When creating custom blocks, instead of setting a language per site in the blocks.json, you can use the requestURI prop from the context to decide what language to use based on the path. For example, /en/opinions would use the English translations. You must eject and update all blocks that contain translated phrases to use the requestURI instead of the siteProperties.locale.

Duplicate Pages

Using subdirectories will require you to duplicate pages. The list blocks can use either a story feed query or a section query. In either case, you must update all blocks on the page to use the appropriate language query or language section. 

For example, let's say you have a block that uses the story-feeds-sections to get articles from /sports. In a multi-site / subdomain strategy, all that is required is to update the includeSections field to /sports. Both sites can use the same page, and the articles shown are automatically filtered based on the site you are viewing. For subdirectories, however, there isn’t a shared /sports section; instead, there is an /en/sports and a /es/sports section.

English versions:

gen_display_content_info_02.png
gen_display_content_info.png

Spanish versions: 

gen_includeSections.png
gen_configure_content_query.png

You must update all blocks on the page in a similar fashion.

Permissions and access control

Controlling access and permissions to content is enforced per site. When organizing content by subdirectory, all users will have access to all content. Other restrictions, like defining who can publish content, are also enforced per site.

Additional Best Practices

Regardless of how your site is organized to support localization, there are few more important best practices to share.

Supported language and region codes

The hreflang attribute's value comprises one, or optionally two, values, separated by a dash. For example, en-US. The first code of the hreflang attribute is the language code (in ISO 639-1 format), followed by an optional second code representing the region code (in ISO 3166-1 Alpha 2 format) of an alternate URL.

To target different language speakers in Belgium, you might use the following language and region codes:

  • Good (German for users in Belgium): de-be

  • Good (Dutch for users in Belgium): nl-be

  • Good (French for users in Belgium): fr-be

  • Bad because the first code is for language (be is the Belarusian language code): be

To simplify your labeling, you can specify a language code by itself. For example:

  • de: German language content, independent of region

  • en-GB: English language content, for users in Great Britain

  • de-ES: German language content, for users in Spain

For language script variations, the proper script is derived from the country. For example, when using zh-TW for users in Taiwan, the language script is automatically derived (in this example: Chinese-Traditional). You can also specify the script itself explicitly using ISO 15924, like this:

  • zh-Hant: Chinese (Traditional)

  • zh-Hans: Chinese (Simplified)

Like with other language codes, you can also specify an optional region. For example, use zh-Hans-US to specify Chinese (Simplified) for users in the United States.

Indicating language and region

The International Organization for Standardization (ISO) defines the global standards for expressing information across various industries. This includes how the internet works. ISO 639 is a standard for organizing and classifying languages, and ISO 3166 is a standard for organizing and classifying region/country codes. When architecting a localized website, it is important to use the proper ISO codes in your subdomain or subdirectories for both SEO benefit and interoperability with other systems.

Your business will define how to best express language and region for your site.

  • If you offer your audience the same content, translated into different languages, then you’ll define your site by language

  • If you offer your audience the same content in one language but differentiated by regions, then you’ll define your site by region

  • If you offer your audience targeted content by region, and available in one or more languages for each region, then you’ll define your site using a combination of both language and region

Examples:

An experience that serves the same set of content across a variety of languages:

mysite.com can be the default language for your primary audience, such English.

es.mysite.com can be the Spanish language version of your site.

ko.mysite.com can be the Korean language version of your site.

Note

In a subdirectory approach, the URLs would reflect mysite.com/es and mysite.com/ko.

An experience that serves targeted content by region, available in different languages:

mysite.com can be the default location for your primary audience, such the US, available in assumed US English.

us.mysite.com/es can be the US site, available in Spanish

es.mysite.com can be the site for your audience in Spain, available in Spanish.

es.mysite.com/en can be the Spain site, available in assumed British English.

Note

You will need to indicate language using the appropriate en-GB hreflang attributes.

kr.mysite.com can be the site for your audience in Korea, available in Korean.

Note

In a subdirectory approach, the URLs would reflect mysite.com/us/es , mysite.com/es, mysite.com/es/en, and mysite.com/kr.

Regional Infrastructure

Sites hosted by Arc XP can be based in a single region, or you can use multiple regions for multiple sites. You will need a separate Arc XP Organization for each region when working across different regions. A single site is unique to a single region in a single organization. Content can be shared across multiple regions, but PageBuilder pages must be manually imported and exported.

Learn more:

International SEO Guide

Managing Multi-Regional Sites

Working with Chinese, and the Chinese Macrolanguage

Managing Article Translations

When translating articles in Composer, it’s important to link all versions of the article together so that search engines do not penalize your site for duplicate content. You can use links in the head of the document to establish this relationship so that it sends the right signals to Google and other search engines.

For example, when an editor writes a story in English and then clones that story and translates it to Spanish, the following links show up in the head tag of both the parent and the child:

<link rel="alternate" href="<url-of-the-english-story>" hreflang="en" />
<link rel="alternate" href="<url-of-the-spanish-story>" hreflang="es" />

To manage this relationship, we recommend you create the original article in Composer, clone it, and then translate the cloned article. By doing so, stories maintain a reference to each other in related content.

// Parent Story (original)
{
    "_id": "GNKEB236TVEA3HIAWD73RZ4D4I",
    "type": "story",
    "version": "0.10.1",
    "related_content": {
        "basic": [],
        "redirect": [],
        "clonedFromParent": [],
        "clonedChildren": [
            {
                "type": "reference",
                "referent": {
                    "id": "USVCUO63ZVHEVEXLZOG4I5OQ2I"
                }
            ]
        },
    }
}
// Child Story (clone)
{
    "_id": "USVCUO63ZVHEVEXLZOG4I5OQ2I",
    "type": "story",
    "version": "0.10.1",
    "related_content": {
        "basic": [],
        "redirect": [],
        "clonedFromParent": [
            {
                "type": "reference",
                "referent": {
                    "id": "GNKEB236TVEA3HIAWD73RZ4D4I"
                }
            }
        ],
        "clonedChildren": []
    },
}

Note

When creating custom blocks for PageBuilder, be sure to add <link rel="alternate" tags to those pages, as well. This will ensure that pages are properly indexed and rewarded for great content.

Handling the Clone Relationships in the Feature Pack

The Content API has an endpoint that fetches and inflates all stories in related_content, GET /related-content. You must check each story for related_content.clonedFromParent and related_content.clonedFromChildren. If any entries exist, you must make a fetch for related_content. If the related story has a different language, you can add the appropriate link tags to the page.

Expressing the right information

Google recommends using different URLs for each language version of a page rather than using cookies or browser settings to adjust the content language on the page.

If you use different URLs for different languages, use hreflang annotations to help Google search results link to the correct language version of a page.

If you prefer to dynamically change content or reroute the user based on language settings, be aware that Google might not find and crawl all your variations. This is because the Googlebot crawler usually originates from the USA. In addition, the crawler sends HTTP requests without setting Accept-Language in the request header.

How to implement hreflang for sites with multiple languages

If you have multiple versions of a website in a different language, the hreflang link attribute signals to Google what language the page is in and helps to serve the most relevant content to that user.

For instance, if you have a website with a news story in English, but you also copy that article to a Spanish and Chinese version, the hreflang link attribute signals to Google that there are alternate versions of that page in other languages. This helps Google and other search engines determine the best version to serve to the user based on their search settings.

The following code snippet is an example of how the hreflang and canonical look on the English home page:

<link rel="canonical" href="https://www.mainsite.com/" />
<link rel="alternate" hreflang="es" href="https://es.mainsite.com/">

And here is how it would look on the Spanish version:

<link rel="canonical" href="https://es.mainsite.com/" />
<link rel="alternate" hreflang="en" href="https://mainsite.com/">

Here's an example for an article page on the default English version:

<link rel="canonical" href="https://www.mainsite.com/YYYY/MM/DD/headline/" />
<link rel="alternate" hreflang="es" href="https://es.mainsite.com/YYYY/MM/DD/headline/">

And this is how it should look for an article page on the Spanish version:

<link rel="canonical" href="https://es.mainsite.com/YYYY/MM/DD/headline/" />
<link rel="alternate" hreflang="en" href="https://www.mainsite.com/YYYY/MM/DD/headline/">

Items to keep in mind:

  • hreflang and canonicals are only a signal to Google. Google ultimately decides on which version to show. However, it's best practice to implement hreflang and canonicals correctly.

  • Each version of the page should have a self-referencing canonical link element.

  • hreflang and canonicals should contain absolute URLs.

  • Each page should contain an hreflang attribute for each alternate version and itself.

Designing Locale Site Selectors

When offering different languages and countries to your audience, it is important to follow a few specific design guidelines.

  • Ensure the values for selection are displayed in the appropriate language for the visitor to understand.

  • Avoid using flags to indicate countries or locales when possible. Flags will cause your brand unwanted harassment in regions with political uncertainty. It is fine to use another icon, like a pushpin or globe.

  • Do not automatically redirect a visitor to match a locale to their IP address and device fingerprint. Instead, let them land on their destination and ask if they would like to view the locale that matches them better. Always give your audience a choice, rather than assume their intent.

For more best practices on designing site selectors, see this article:

Designing a Perfect Language Selector UX

Bringing it all Together

Two main options exist for managing content in different languages or regions: OOB experiences through Themes using subdomains and multiple sites, or custom-built experiences using a single site with subdirectories for each language. We recommend the multi-site solution because it requires less customization and works directly with Themes.

You should manage translated articles by creating an original article, cloning the article, and then translating the clone. You should update the language meta field for both articles, and then publish the articles to the matching language website or section, depending on the chosen solution. Linking the articles by cloning gives a straightforward way to fetch translated versions of the article that the link HTML tags can use for SEO purposes, as well as the custom button for switching languages.

FAQ

Would author pages be expected to show all versions of an article or only articles in the selected language?

The story-feed-author content source that is included with Themes returns all articles for an author by site. That means for the multi-site approach, the author page would show only the articles in the language that matches the currently selected site. 

Appendix