Content Migration Overview for Developers
A guide for back-end developers on migrating legacy content. This guide will walk you through prerequisites to migration, how to map your source content to ANS, and how to start migrating content.
Assumptions
This guide is for developers who have experience using a restful API, or for those with technical experience in mapping and migrating data.
Getting Started
Some things to think about as you discuss these options are:
How much content will you be migrating?
How do you treat and store different kinds of content? Looking at a sample data of a random story, image, video, gallery, etc. is helpful.
We recommend you select a large sample (100 or more) of your stories to use for iteratively testing your content migration. Some clients have gathered this data from Google Analytics top pages. This sample of stories should include all elements you plan on carrying over from your legacy system(s). This sample will be used over and over for testing each piece of your migration mappings. Things to include: All possible article types such as sponsored content, election coverage, special editorial features, etc.; Articles with and without referent data such as authors, images, tags; Articles with any other referent data such as related articles; Samples from all source systems.
What content will you not map and migrate? This will likely include any custom-built pages such as your homepage or special feature pages.
Basic steps
Map existing data to Arc XP ANS
Build an adapter - a tool to transform data from old CMS into Arc XP ANS and move the ANS into Arc XP
Migrate all existing content to Arc XP
Go live on Arc XP - migrated content is now served/rendered from Arc XP
Editorial Cutover - Editorial team will create all content within Arc XP
Old CMS is deprecated
ANS
ANS (“Arc Native Specification”) is the collection of schema documents that comprise the Washington Post’s definition of “content”, in so far as content is passed back and forth between systems in the Arc XP ecosystem of applications. ANS is the content’s data such as headline, subheadline, etc.
More about ANS can be found here: https://github.com/washingtonpost/ans-schema#readme, and most of the high-level elements are found in this directory https://github.com/washingtonpost/ans-schema/tree/master/src/main/resources/schema/ans/. ANS versions vary over time. As a rule, develop on the most recent version of ANS that is available.
It is important to become familiar with ANS so that you can correctly map your legacy data to Arc. Think of ANS as pieces and parts to a story or a video (headline, images, body text, embeds, metadata, tags, authors, etc). These pieces and parts have their own existence outside of the story, and one or more stories can use these objects with references.
For example, an image inside of a story is a reference to an image object in Photo Center. To map an image to Photo Center, you would use the image schema: https://github.com/washingtonpost/ans-schema/blob/master/src/main/resources/schema/ans/0.10.8/image.json. As another example, story tags (or topics) would be references to Tag objects in Tag Center. The schema for this is here. While Tag Service is not required, and stories can have reference-less tags (tags that only exist on that story) and tag pages can still be built using Content API, having all of your Tags in one place to be able to manage is better long-term. It removes the clutter of having duplicate/extraneous tags such as “car” and “cars”.
Legacy Data Prep
Before you begin migrating, it is a good idea to clean up legacy data as much as possible. For example, do an audit of your taxonomy. Do you have older, superfluous tags that can be deleted? Do you have redundant tags that can be merged together such as “robot” and “robots”? Are there old pages/stories or unused images that no longer get traffic and can be excluded or deleted? Are all of your authors still necessary, or can some be deleted? This will help your legacy data come into Arc XP a bit more organized.
Content Mapping
Mapping your content will be the largest lift in your migration. You will need to test and retest your mappings as they come into ANS to ensure all scenarios have been accounted for.
If you find that some of your data does not exactly map to ANS, but it is information you would like to keep, you can use the Additional Properties to store this data. This array can be structured any way you like, but aware that these fields are not currently searchable in the Content API.
Planning Your Migration
If you have any questions about rate limits for migration, contact Arc XP Customer Support. If your site(s) will be using Arc XP Themes, see Migrating Data for Themes as you are planning your mappings.
You will likely build a set of classes to process each type of content (images, tags, authors within an article). We recommend starting with basic data fields, then moving on to individual story elements, iterating through all of the elements in your legacy data. Start small, and then build up your handler library.
Arc XP recommends that your source data is sent to ANS in JSON format. Clients onboarding onto Arc XP have the option of using Migration Center, the Arc XP API that will handle the work of deploying Arc XP ANS to the correct Arc XP product suite for an ANS content type.