How to Ingest Wire Content Using an Inbound Wires Adapter
Arc XP supports feeding content into Composer, Photo Center and Video Center from external sources -- this is called a “wire”. One well recognized external provider of news content is The Associated Press. Each wire provider has unique requirements, credentials, and mappings. Arc XP clients need established contracts with the agencies providing access to the wire content and the credentials necessary to invoke that access. Once these contracts are instated and the credentials have been provided, a client can begin to import the wire content into Arc XP.
Inbound Feeds Proof Of Concept Github Repository
How Wires Work
Wires are brought into Arc XP through an adapter, very much like when clients migrate content from an old CMS. The steps involved are:
Identify the wire source, credentials to access and method of content delivery.
Obtain samples of the wire source content in its delivered form: JSON, XML, story sources, image sources, etc.
Map the wire source content data to ANS fields.
Build an adapter that connects to the wire service, collects and transforms the content into ANS, and delivers transformed ANS to Arc XP through the Arc XP APIs.
Set up a way for the adapter to run on a schedule and regularly receive or poll for wire content.
Identify the Wire Source
The questions you’ll want to answer about the wire are:
What is the name of the agency?
What content is your organization entitled to access? Learn how to periodically query for the latest subscribed content through their APIs?
What types of content will be ingested (images, text, videos)?
How does the wire send content? An RSS feed?
What are the access credentials? These would include the wire’s feed urls, API keys, or other login credentials.
Does this wire require making requests to multiple different URLs and possibly using multiple different logins? For example, one URL may be used to fetch the wire story, but a separate URL may be used to fetch the wire images.
Content Mapping Wire Source to ANS
Note
If you use IFX or ever plan to use it, you will need to have the following values in your published ANS in order for story:update
event to trigger your integration: published = true
and additional_properties.is_published = true
You will require sample content of the wire’s source material for mapping. Save a sample response for each product type that returns a unique source schema into a file for use in development and testing. Ideally try to gather as many unique samples as possible, for example if sometimes stories have images and sometimes they do not, collect at least one of each. Things to consider when planning your mapping code:
Bring in wire stories unpublished by default. Typically wire stories are brought in unpublished and a story is cloned to a new copy if it is chosen for use. The copy is then published.
Bring in wire images unpublished by default. Photo Center can delete unused photos once a particular date has passed, but this will only work if the image has not been published (see Wire Photos And Expiration).
Decide should the wire content expire at some point?
If you have multiple websites, to which sites should each wire go? All of them? Only a select few?
Would you like to circulate the wire into its own section within the website hierarchy? Many clients put them into /wires/agencyname
What WebSked status should these stories have? Wire content will be excluded from WebSked if imported with the recommended Draft API request header.
What distributor should stories from this wire use? You will need to set up this distributor in your Arc Global Settings if it does not already exist.
How often should the wire be polled? You will need to decide how often the wire content needs to be retrieved (every 30 minutes, 5 minutes, 1 hour, etc).
How are you going to keep track of what content has already been sent to Arc?
You will need to map each field from the wire to ANS, but only those fields that are needed to create a valid ANS document. You will require sample content of the wire’s source material for mapping. You may also find it useful to have some examples of ANS content, so you have examples of the target. To get samples of ANS content, create some stories in Composer that look like what you expect to see out of an ingested wire. Then use the APIs and query for those examples.
Create a table where you list out each of the wire’s fields and to which ANS fields the data within them should be saved.
Look at all your source examples of the same product type to be sure that the values you are mapping against always exist and are in the same format in every response.
Determine if a value needs some massaging to display correctly in Arc XP. Note the high-level transformation logic needed in your mapping code.
Be aware there are some fields in ANS that are inserted automatically by Content API, so not every ANS field that will be in the Composer samples needs to be in the ANS you post. Focus on mapping source values and things that are unique to the organization’s publishing requirements or publishing workflow, and ANS fields that have been identified as specifically used by the front end implementation of the website.
Below is a representative, though incomplete, example of what the mapping table might look like were you planning to ingest content from the Associated Press.
Mapping the AP Story Data
Source field | Sample Value | Destination (ANS) field | Conversion Instructions |
---|---|---|---|
| 1e0c675d2e5dfe50665d + {arc_org_id} |
| use this value + arc organization id to HASH + Base64 encode and turn into a standard Arc ID |
| 1e0c675d2e5dfe50665d |
| - |
| BC-EU--Russia-Drug Shortages |
| - |
| Drug shortages persist in Russia after start of Ukraine war |
| - |
| [{"by":"By The Associated Press"}] |
| Remove “By” at the start of the string |
| 2022-04-04T02:51:56Z |
| Convert to date-time. ANS uses UTC timestamps: 2019-07-30T23:46:27Z |
- | wires |
| This is a static value, hard coded |
| - |
| story text |
| - |
| Use the first picture in the associations array as the promo item |
- | 1 |
| WebSked status Static value, determine appropriate value |
Mapping the AP Story Circulation Data
One complete circulation is needed for each website. See How to Bulk Circulate Documents in Draft API for more information.
Source field | Sample Value | Destination (ANS) field | Conversion Instructions |
---|---|---|---|
item.altids.itemid | 1e0c675d2e5dfe50665d + {arc_org_id} | document_id | same value as the story _id |
- | <Site Service site id> | website_id | Arc site service website id |
- | - | website_url | URL, if not allowing URL service to create the URL using its URL format rules |
- | - | website_primary_section | dictionary, an arc reference to the section where this story will be circulated as its primary location for this website |
- | - | website_sections | array of dictionaries, containing references to all the sections where this story will be circulated, including the primary section |
Mapping the AP Image Data
Source field | Sample Value | Destination (ANS) field | Conversion Instructions |
---|---|---|---|
item.alt_ids.itemid | 3cf60b0ea0a348e | _id source.source_id | use this value + arc organization id + arc content type to HASH and turn into a standard Arc ID |
- | Associated Press | distributor.name | This is a static value, hard coded Displays as a filter item in the Photo Center UI. |
- | wires | distributor.category | This is a static value, hard coded |
- | custom | distributor.mode | This is a static value, hard coded |
item.renditions.main.href | “https:// api.ap.org/media/v/ content/3cf60b0ea0a 348e093b38faf630a4273.1/ download? role=main&qt=xRauQ-hCPF” | additional_properties. originalUrl | Necessary for Photo Center to download and import image |
item.originalfilename | Maverick_Red_Carpet_69618.jpg | additional_properties. originalName | Necessary if the URL used to download the image does not end in a normal extension like .jpg. If this value is not set, the imported image in Arc could end in .aspx or .1 or some non-standard characters, exctly as they are at the end of the download URL. |
- | - | additional_properties. expiration_date | Date photo should expire ANS uses UTC timestamps: 2019-07-30T23:46:27Z |
Resources
Build the Adapter
The form of the adapter is difficult to dictate. How complex a form it takes depends on clients’ priorities of scale, cost and ease of maintenance. Clients could write a python (or server side engine of your choice) application to do this, and run it off a physical server. They could use AWS services and engineer a combination of Serverless Framework, Lambda, Dynamo DB, Parameter Store, API Gateway and Step Functions to break the application into micro services and deploy them to operate in the cloud.
This process of ingesting wire content into Arc XP is composed of the same steps as when writing an adapter to migrate content from a historical CMS into Arc XP. The shape the units code will take will be highly determined by the methods and costs with which the development team are comfortable. What follows is a high level treatment of the processes an adapter for importing wire content either “will need” or “may find useful” to include.
Retrieve Wire Content
Clients will need to store and retrieve sensitive values used to access the provider's API. It is not advisable to do this in the adapter code. This “store and retrieve” could be by hard coding the credentials into an environment file that is stored locally on the server where the adapter runs, or it could be kept in a parameter store service that is accessible through an HTTP request.
Clients will need to access the wire content. They may do this by making HTTP requests to the wire endpoint. Perhaps they will know the id of a single wire story and fetch just that piece of content from an endpoint. Or, perhaps they will access a large feed of wire stories, and will need to first retrieve the group of story ids and then loop through to process every story in the group into Arc. If the wire content has links to additional content, the adapter may need to make additional requests to fetch the content of those references for processing.
Track Wires Already Processed
The adapter should process a wire story once. If the wire has been verifiably updated since it was previously fetched, then it can be updated. Devising ways to keep track of new content versus updates versus repeated content that can be ignored is tricky. But this should be worked out, or the consequences include:
Duplicate content in Arc XP, costing clients increased billing over what they are expecting
Waste of Arc XP rate limits, which could spill over into a decreased experience for the editorial users
Errors caused by exhausted Arc XP rate limits
You may want to keep an inventory database of the wire content that you have already processed. During the retrieval process you would check this inventory first to decide if further work is even necessary. If you want to avoid unnecessary processing but still allow for processing updates to existing wire content, you may want to store in the inventory database a SHA-1 hash of the wire content that has already been processed. If a SHA-1 hash is available, you can query the Arc XP version of the wire content, process it into a SHA-1 hash and compare it to your inventory’s stored copy. If the SHA-1 hashes are different, then the adapter could proceed to the next step in the process.
Transform Wire Content
The adapter will need a process that reforms the wire content from its source into ANS. Clients will take the mapping document you created and turn it into a method, class, lambda, or other unit of work that will take as input the wire content source and return a JSON document that is valid ANS, or possibly a Migration Center object (which also contains ANS).
The wire will provide story content and so you will need a converter that creates valid story ANS. If the wire provides photo content as well, then clients will need a converter that creates valid photo ANS, and if the wire provides video content then clients will need a converter that creates valid video ANS.
They will also need to prepare a separate circulation document for the story. They will need to have decided where the wire content will live in your website hierarchy. If the client organization has multiple websites, they will need to take into account in which websites and section(s) the wire content will live.
Resources
SHA-1 Data Verification: using cryptographic hashes to compare changes between two sets of data
Send ANS to Arc XP
Clients will have one or more units of ANS to be sent to the appropriate Arc application. They will create multiple HTTP requests to to send the story and associated references to Draft API, Photo Center API, and Video Center API, or possibly instead a single HTTP request to Migration Center API should you decide to pay the extra costs associated with that option.
Resources
Content Expiration
In order to set the story content to expire at a future time, clients need to access the Content Operations API.
In order to set wires photo content to expire at a future time, clients need to set specific values in the Photo ANS during the transformation process. No other special endpoints or APIs are needed, other than the same Photo Center API call that sends the Photo ANS in to Arc XP.
Video content cannot be set to expire at a future time.
Resources
Content Publishing
Unlike stories that are migrated from a historical CMS, it is strongly recommended that wire content needs to come into Arc XP unpublished. If you wish wire content to come into Arc XP published, the adapter will need to make an HTTP request to the Draft API endpoint that publishes a document. This will add to the use of your Draft API rate limit, reducing the number of stories you can bring in to Arc XP without experiencing an HTTP 429 over-the-rate-limit error. You will also need to ensure that if your adapter is publishing wire content, that the same content item is not getting published over and over again. Not only will this be a waste of your Draft API rate limit, but it could result in a delay in the publishing operations that are generated from the Arc XP editorial applications used by the newsroom.
In order to have wires photo content come into Arc XP published, the adapter will need to set specific values in the Photo ANS during the transformation process. No other special endpoints or APIs are needed, other than the same Photo Center API call that sends the Photo ANS in to Arc XP.
Rate Limiting
To ensure stable performance, the adapter must control the rate at which it sends content to Arc XP APIs:
Draft API: the default API rate limit is three requests per second. This limit can be increased if there's documented throughput to justify the change. For adjustments, contact Arc XP Customer Support.
Photo Center API: While there's no enforced rate limit, Photo Center extracts metadata from each uploaded image directly in the API, rather than using a queue-based service. To avoid performance issues, we recommend limiting requests to one to two per second when importing images.
Video Center API: Though no official rate limit exists, Video Center relies on a time-sensitive transcoding service. Sending requests faster than the transcoder can process may cause a backlog, potentially delaying processing by days. To prevent this, we recommend limiting requests to two to three per second during video imports.
Ingestion Headers
When making API calls to create Draft API stories, set the ingestion header Arc-Priority: ingestion
.
curl --request POST \ --url https://api.sandbox.arctesting1.arcpublishing.com/draft/v1/story/ \ --header 'Arc-Priority: ingestion' \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{ans}'
There are two lanes of API traffic that come into Arc XP: a standard
lane and an ingestion
lane. Standard traffic receives priority and it is the lane that content created with the editorial tools uses. Ingestion traffic is everything else created by the Draft API. Traffic created by ingestion adapters should not come through on the standard lane unless there is a valid use case for this. If you believe you have a use case, contact Arc XP Customer Support. Arc XP reserves the right to disable an integration abusing the standard priority lane.
Content in the ingestion
lane is not sent to WebSked and is therefore not available to be added to Collections.
Resources
Draft API Getting Started (for information about Arc-Priority header)
Run Adapter on a Schedule
Unless the intent is to request content from the wire source manually, clients will need a way to kick off the start of the adapter process on a schedule. If clients are hosting the adapter on an internal server, they might use a cron job. AWS has the ability trigger events on automated schedules.
Resources
More Information
What values from the wire story should go into the creation of the Arc ID?
You should create a base64 encoded hash of at the very least the wire’s unique id value and your Arc XP organization id. You may also decide to add the content type to the hash, to make the resulting value more unique.
The agency requires a static IP to fetch content. Does Arc XP support this?
No, Arc XP cannot provide static IPs.
What happens on expiry?
You can unpublish or delete the content.
What if the contract from the wire company doesn’t allow us to bring in images with stories?
Program the transformation logic to skip importing the images when checking for referenced content.
What happens once my wire is complete and my contract with the agency changes?
You will need to update the code in your environment to support those changes.