Skip to main content

How to fail gracefully with Arc XP

Encountering errors is a normal part of working with software. What matters is how you handle those errors. Arc XP is designed with safeguards to minimize user impact, if you follow best practices. This guide outlines how to manage errors gracefully to maintain stability and performance in Arc XP environments.

Prerequisites

Before proceeding, ensure you have a strong understanding of the following Arc XP concepts:

Workflow overview

To fail gracefully in Arc XP, you must understand:

  • How caching behaves, including TTLs and stale content serving

  • How content sources should handle errors

  • How render errors can affect behavior and cache strategies

Caching

In Arc XP, some errors are treated as successful responses for caching purposes. The CDN caches these expected errors, which affects both rendered pages and client-side content source calls. For more detail, see Web Delivery Error Handling.

For a deep dive into how caching works, see Caching at Arc XP.

Why accurate error codes matter

Errors must match the actual condition. The wrong error code can harm SEO and degrade system behavior.

  • Do: If a section URL doesn't exist in Site Service, throw a 404 in the global content source. This allows proper caching and informs search engines that the content is intentionally not found.

  • Don't: Throw a 500 for a missing section. This suggests a server issue, triggers shorter cache TTLs, and can result in fluctuating responses. Over time, repeated 500s damage SEO and user trust.

Server-side: Serve stale on content source

As explained in Content Caching In PageBuilder Engine (Server-Side), if a content source returns an error, but a previously cached successful response exists, the engine returns that stale response for up to 72 hours.

  • Some errors, like 404, are treated as successful and override stale content.

  • Avoid catching all errors and defaulting to 404. This eliminates the opportunity to serve stale content meaningfully.

Serve-stale is helpful when upstream APIs are temporarily unavailable. For example, if Arc XP APIs return 429 rate-limit errors, the engine can serve stale content seamlessly during short outages.

Often, users don't notice these temporary issues unless content appears outdated for too long or fails to update altogether.

Client-side: Serve stale on CDN

CDNs cache both the HTML from the server-side render and client-side content source requests. For more information, see How Does Rendering Work in PageBuilder Engine.

When the PageBuilder engine triggers serve-stale, the CDN remains unaware until the cache expires. If a new request fails, the CDN extends the stale content's life, but only if a previous cache exists.

Limitations:

  • If content has never been requested (for example, an obscure article), there may be no cached version to serve.

  • Sever-stale from CDN applies only to client-side fetches and rendered HTML when a render error occurs.

In cases of render failures, particularly in outputType rendering, a 500 is returned, and the CDN serves stale content for up to 72 hours.

Error logs include the exact path and feature ID where the fender failed.

For example, if a bad deployment breaks all article renders, serve-stale ensures the site still delivers cached articles. Without it, articles would fail progressively as caches expire. This gives you a 72-hour window to revert the deployment and investigate, minimizing disruption.

Monitor the PageBuilder Performance Dashboard or Engine log forwarding after any deployment. Serve-stale masks immediate breakage, so logs become your first line of defense.

In content sources

Effect content source management is key to resilient systems. This section addresses caching, error handling, and failure mitigation strategies.

Partial caching

Partial Caching with PageBuilder Engine allows different content segments to have distinct TTLs and serve-stale policies. This minimizes disruption if one part fails, while others continue to serve fresh content.

Use partial caching when:

  • Some content is static or rarely changes

  • External APIs are unreliable

  • System resilience is a priority

Content Source API management

When modifying content source names or parameters:

  • Follow Content Source API best practices

  • Update resolvers and associated configurations

  • Validate caching behavior to prevent breaking changes

Error handling

Poor error handling breaks the serve-stale mechanism, mislead caching logic, and harm SEO and data consistency.

Ensure that errors are classified and thrown correctly. Avoid catch-all 500 responses unless truly indicative of a server-side issue. See Error Handling in Content Sources.

In global content sources

Global content sources require advanced error management and a deep understanding of rendering contexts.

Rendering contexts

Isomorphic versus Server-Side versus Client-Side (SPA) Rendering explains the trade-offs of each approach:

  • Isomorphic - best performance and SEO

  • Server-side only - simpler, but slower user interaction

  • Client-side (SPA) - more dynamic, but limited SEO

Choose rendering strategies based on performance and SEO goals.

Custom error pages

Use custom error pages to replace default error messages with branded, user-friendly versions. This improves UX during unexpected outages or errors.

In components

Component behavior during render and logging is essential for troubleshooting and maintaining delivery integrity.

Rendering process

The How Rendering Works in PageBuilder Engine guide outlines how requests flow from the CDN to origin and back. This provides insight into where failures may occur and how fallback strategies like serve-stale activate.

Log analysis

How To Read Engine Logs explains how to monitor live systems, diagnose render errors, and use tools like Dozzle for local development.

Limitations

Despite robust fail-over mechanism, certain limitations remain.

  • Caching limitations: Errors like 404 can be cached as success but may lead to long-lived stale content if overused. Over-caching can mask underlying issues.

  • Serve-stale content: Stale content may become outdated if errors persist beyond the 72-hour window. Users may not see updates even if systems recover.

  • Developer responsibility: Systems rely on developers to throw appropriate error codes, manage caching configurations carefully, and test changes in both Sandbox and Production environments.

  • Environment differences: Local and Production environments behave differently. Log tools, caching behavior, and error recovery vary. Always validate changes across environments.