Skip to main content

Bring Your Own Analyzer (BYOA)

Build Your Own Analyzer (BYOA) offers a solution for organizations generating content in languages not currently supported by editorial search in Arc XP. With BYOA, you can:

  • Create a custom language analyzer tailored to your specific needs for one or multiple sites in your Arc XP environment.

  • Easily test and update your custom language analyzer to ensure it's always optimized for delivering accurate results across your sites.

Note

While BYOA offers flexibility, it requires more time and expertise than using a supported language-based analyzer in Arc XP. For the full list of supported languages, see Languages supported in Arc XP

Navigating language analyzers can be complex. To simplify, here's a breakdown of options available from Arc XP:

Option

Configuration requirements

Arc XP support

Default Analyzer

No configuration required

Automatically set up by Arc XP as the default option.

Supported Analyzer

No configuration required

Set up by Arc XP at your request.

BYOA

Configuration required

You configure your own analyzer, and then Arc XP sets up the analyzer at your request.

Custom code

N/A

Arc XP does not provide support for this.

Supported analyzer

Arc XP supports language analyzers, also known as standard plugins in the OpenSearch community, which typically include:

  • Language analyzers provided by OpenSearch

  • Core analysis plugins specified here

BYOA

Arc XP's support for BYOA encompasses text analyzers, also known as open-source plugins in the OpenSearch community, as well as custom analyzers that you or third parties develop.

Open-source plugins

Third parties distribute open-source plugins, often found in GitHub repositories, and OpenSearch does not directly offer them. While the Elasticsearch open-source community contributes to many of these plugins, Arc XP requires a comprehensive audit by our security team before accepting any open-source software.

As an example, we offer a popular open-source analyzer plugin specifically designed for Vietnamese language analysis. Known as the Vietnamese Analysis Plugin for OpenSearch, this plugin has been reviewed and approved by Arc XP's security team. Therefore, it is fully supported as an option for Bring Your Own Analyzer (BYOA) within our platform.

Custom analyzers

Custom analyzers involve selecting and combining different analyzer components, granting you greater control over the process. You can create a custom analyzer using the appropriate combination of:

  • zero or more character filters

  • a tokenizer

  • zero or more token filters

Here's an example of a custom analyzer created using an N-Gram tokenizer.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "tri-gram": {
          "type": "custom",
          "tokenizer": "tri-gram"
        }
      },
      "tokenizer": {
        "tri-gram": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": ["letter"]
        }
      }
    }
  }
}

For a detailed guide on creating custom analyzers by combining text analysis components, see Create a custom analyzer from Elasticsearch. When handing over the custom configuration to Arc XP, ensure you save it to a JSON file, just like in the example.

Here are a few more examples of BYOA:

Custom code

Arc XP does not support any code that you or third parties develop for text analysis that extends beyond the capabilities of supported analyzers or standard or open-source plugins. For instance, creating a Python script for text analysis falls under this category of unsupported custom code.

However, it's important to understand that configuring standard and open-source plugins by adjusting settings like custom stop words, tokenizers, and filters is considered safe and is supported by Arc XP. These configurations are not categorized as custom code. 

Testing tool for BYOA

At Arc XP, we understand that creating custom analyzers requires effort. That's why we provide a helpful tool to build and test language configurations using OpenSearch. The Content Search Validator is an OpenSearch Docker container with an Express.js app. It lets you load analyzers, data, and search for that data easily. For more information on how to use this tool, see the Developer Documentation and this demo video: