How To Use Kinesis and Draft API to Auto Tag a Document

In this guide, we’ll go over how to auto-tag stories that your organization produces using a combination of the Kinesis Mirror and Draft API.

Our code example and a more detailed technical explanation can be found in our Auto-Tagging Code Sample On GitHub.

Overview

You may find that after someone in your organization writes an article, you want a way to automatically apply tags to the story. These tags could be used to categorize articles based on keywords, be used by SEO, or be used elsewhere in your Arc XP workflows.

In all use cases, the overall flow is the same -- you need a way to listen for new stories, modify them by adding a tag, and then update (and possibly re-publish) the stories within Arc. To do this, we’ll combine the power of the Kinesis and Draft API.

Kinesis Mirror

The Kinesis Mirror is provided to Arc XP clients as a method of listening as events happen within Arc XP. The Kinesis Mirror is an event data stream that allows you to consume messages that are generated from the creation, publish, un-publish, and deletion of Arc XP content in real-time.

In this example, we use the Kinesis Mirror to listen to all publish events for the organization, with the intention of tagging those stories as they happen. For more detailed information on the Kinesis Mirror and consuming events, see our Content Event Stream Documentation.

Tagging through Draft API

Once you’ve found an event to react to, you can use Draft API to modify your document. The example linked here uses Draft API to get the latest revision of the document, add a tag, and then update the Draft Revision of the Document. You can also use Draft API to re-publish the document, add more circulations, or make any other updates to the Document. You can learn all about Draft API in our Getting Started Guide.

Workflow

Generally, auto-tagging in Arc XP should follow this workflow:

Set up a Kinesis Consumer that listens to events
Configure the Consumer to find events that match certain criteria, like documents that were just published, created, etc.
Upon finding a matching event, use Draft API to retrieve the entire document
Determine the correct tags for your document. This could be calling out to a third-party service, generating something on the fly, etc.
Use Draft API to update the Document with your tags
(optional) Use Draft API to re-publish the Document

Code Sample

To show this in action, we’ve put together a Code Sample that walks through the basic workflow as outlined above.

Considerations

In your auto-tagging system, there are some things to consider that this code sample does not address.

1. Publishing

Currently, the code sample only creates a new draft revision of your document. If you want your tag to apply to the published document, you’ll need to add an extra Draft API call to publish the document.

2. Concurrent Updates

Draft API and Composer do not share a locking mechanism. Because of this, changes made to a document through Draft API can cause subsequent saves in Composer to fail for the user. At this time, there is no workaround for this potential conflict.

3. Asynchronous Updates

The code sample currently adds the same tag to every story it encounters, which is a very quick tagging process. For this reason, the code does the tagging synchronously. This means that during the time it takes to tag a document, the consumer is not listening for new matching events. If your tagging function is long-running or involves a third-party system, you may wish to make the tagging functionality more asynchronous.

4. Tags & Composer

For a tag to appear to an editorial user in Composer, it must be created through the Tags API before being applied to a story. Tags that are added directly to a story without being created through Tags API will not show up in the Composer drop-downs. Depending on your use case, you may wish to add a call to Tags API for each unique tag you are trying to create.

5. Hosting

Finally, the code sample is currently made to run locally as an example. As you build out your use case, you also will want to consider how and where you will run your consumer such that it can be listening to new events at all times.

In this section: