This post is also available in the following languages. Chinese

Easier, Flexible, and Lower Resource Cost Deployment Strategies by Feature Toggle

It's crucial to have various deployment strategies to ensure that new versions of your software is delivered to users efficiently and reliably. After reviewing other articles, we've summarized the following (If you're unfamiliar with deployment strategies, please refer to Baeldung's deployment strategies or Plutora's deployment strategies explained in depth for a comprehensive explanation):

Recreate deployment is the simplest, but it might cause service downtime and expose potential bugs to all users. Other strategies, such as blue/green, rolling, A/B testing, shadow, canary, and others, ensure zero downtime. Some of these strategies use more resources (like memory and CPU) to run both versions of the applications simultaneously. This provides more confidence in a release and makes it easier to rollback to the old version if necessary.

However, we shouldn't treat hardware resources as if they're free or unlimited, especially during this challenging time for the software industry. As Pete Hodgson suggests in his article (Feature Toggles), we can use a feature toggle system to perform strategies (such as canary or A/B testing) that can save resources. This also eliminates the difficult tasks (for developers unfamiliar with DevOps or SRE knowledge) of setting up the continuous delivery tool or network components (like the load balancer) for the strategy. The only tasks left are setting toggles and writing some code (simple if/else or switch).

In this article, we'll cover:

The necessary features for a toggle system to perform these tasks
How to use a toggle system to implement different deployment strategies (Blue/Green, A/B testing, canary, shadow, and others)
How to minimize the effort required to maintain toggles

Here's the open-feature-openflagr-example GitHub repository related to this article. Feel free to visit and leave any comments.

Requirements for a feature toggle system

When considering the use of a toggle system over complex release strategies, it's crucial to explore the various open-source and enterprise-level toggle systems available online. These include Unleash, Flagsmith, Flagr, LaunchDarkly, and others. You should choose a toggle system that meets the following minimum requirements:

High RPS handling with dynamic evaluation: The toggle system should handle high requests per second (RPS) loads efficiently when evaluating toggle states through its API. This is important because the toggle state (on/off) should minimally impact core business performance.
Dynamic configuration and persistence: The toggle system should offer the flexibility to adjust settings dynamically, with changes made either through a user interface or an API. Additionally, it should ensure that these configuration changes persist even during a server shutdown, ensuring consistent behavior across system restarts.
Feature-rich toggle evaluation API: The toggle evaluation API should provide the following features:
- Targeting key support: The system should be able to distribute toggle results based on an identifier in the request. For example, it could use a hash algorithm to ensure that the same ID always receives the same result.
- Evaluation context support: The system should be able to set constraints to decide the result. For example, if the region in the request payload is Asia, the toggle is on; if it's Europe, the toggle is off.

The above are the bare minimum requirements for replacing deployment strategies by integrating our application with a toggle system. We can shift the traffic configuration work into our development job on the codebase and have it reviewed with the feature pull request.

Deployment strategies with toggle

In this section, we'll demonstrate how to configure the toggle (using Flagr as an example) and show what the code snippet might look like in a simple way. I'll use basic if/else or switch statements for the demo, but in a real project, this could be implemented in a more elegant way, such as using a strategy pattern. We'll start with the simplest toggle on/off to perform Blue/Green or shadow deployment. Then, we'll apply the percentage-based rollouts setting on the toggle to achieve canary Release. Finally, we'll add constraints to evaluate the context (fields in request payload) to implement A/B testing.

Here's a shared code snippet for the following demos:

public static String v1Feature() {
    return BLUE + "o" + RESET;
}

public static String v2Feature() {
    return GREEN + "x" + RESET;
}

Deployment strategies with toggle

The configuration of the toggle is quite straightforward in these two scenarios.

Below is an example of what the code might look like for a blue/green deployment:

public static String v1Feature() {
    return BLUE + "o" + RESET;
}

public static String v2Feature() {
    return GREEN + "x" + RESET;
}

I initially set the toggle off and then turned it on during the iteration execution. As you can see, the app smoothly switches between the two features, just as we expected.

By doing this, we can save a significant amount of hardware resources since we don't need two distinct environments (blue and green) to run the different versions of the apps.

Shadow release (on/off)

In this example, we can use the same flag configuration as the blue/green deployment, but we set the toggle on initially. The following is an example of the code for a shadow Deployment:

...
String version = client.getStringValue(FLAG_KEY, "off", ctx);

String message = "";
message = v1Feature();
v1++;
if (version.equalsIgnoreCase("on")) {
    Thread newThread = new Thread(() -> {
        atomicString.accumulateAndGet(v2Feature(), String::concat);
        v2.getAndIncrement();
    });
    newThread.start();
}
System.out.print(message);

Initially, we call both the v1 and v2 features. Suppose we find something wrong with the v2 feature, we then turn off the toggle during the iteration. Then we can see that v2 is no longer being called.

Using a toggle system to perform a shadow release is a highly flexible and efficient method. It just requires adding a bit more complexity to the code and making a small effort to handle asynchronous operations.

Canary release (percentage-based rollouts)

Now, let's introduce the distribution feature into the toggle's configuration for a canary release.

Here's an example of what the code might look like for a canary Release:

...
UUID userId = UUID.randomUUID();
MutableContext ctx = new MutableContext(userId.toString());

String version = client.getStringValue(FLAG_KEY, "v1", ctx);

String message = "";
switch (version) {
    case "v1" -> {
        message = v1Feature();
        v1++;
    }
    case "v2" -> {
        message = v2Feature();
        v2++;
    }
}
System.out.print(message);
...

Given the distribution is set as 3:1 (v1=75%; v2=25%), and since we provided different targetKey values for every request, we will get a result that is very close to the given distribution.

But what if we used the same targetKey value of tester for every request?

The result will remain the same since the same targetKey is hashed to the same result (in this example, v2).

So, using a toggle system for a canary release is quite easy and straightforward. We can change the percentage anytime we like, as long as we believe the new feature is stable enough to move to the next level.

A/B testing (constraints on context)

Finally, let's implement A/B testing. We can add the final piece of a toggle system, constraints on the context, as shown below.

Here's an example of what the code might look like for A/B testing:

...
UUID userId = UUID.randomUUID();
MutableContext ctx = new MutableContext(userId.toString());
ctx.add("region", region);

String version = client.getStringValue(FLAG_KEY, "v1", ctx);

String message = "";
switch (version) {
    case "v1" -> {
        message = v1Feature();
        v1++;
    }
    case "v2" -> {
        message = v2Feature();
        v2++;
    }
}
System.out.print(message);
...

Given the constraint that all users from Asia should use the v1 feature while users from Europe use the v2 feature, and users from other regions should use the feature based on a fifty-fifty distribution. As we can see in the report, the distribution matches our expectations.

Since we can adjust the constraint dynamically, it makes it extremely flexible and easy to control feature experiments, pilot features in a production environment, and so on.

Minimizing toggle maintenance effort

As the development cycle progresses, toggle-related code snippets will spread all over the codebase, or even worse, across multiple repositories. Then, the code will look messy, and developers can easily get lost in toggle logic and core business logic. Furthermore, we might also find that the chosen toggle system doesn't meet our expectations or raises security concerns, leading to the need to switch to a different toggle system.

To address these complexities, it's crucial to introduce an additional layer of abstraction to the toggle logic to help the app perform toggle evaluation elegantly. Hence, the OpenFeature specification was created.

We won't cover too much about OpenFeature here, but here are the basic key concepts that you should know:

Implementing the client

Develop an exampleClient (where "example" represents the toggle system we choose, for example, flagrClient), or use the SDK provided by the toggle system to be an API Client to send requests to the toggle system.

public interface OpenFlagrClient {

    String BASE_PATH = "/api/v1/";

    @RequestLine("POST " + BASE_PATH + "evaluation")
    @Headers("Content-Type: application/json")
    V1EvaluationResponse evaluate(V1EvaluationRequest request);

}

Develop an exampleFeatureProvider. This should list all the common (or perhaps more reasonable) use cases for real-time toggle evaluation logic.

public class OpenFlagrProvider implements FeatureProvider {
...
  public ProviderEvaluation<Boolean> getBooleanEvaluation(String key,
           Boolean defaultValue, EvaluationContext ctx) {

      V1EvaluationRequest request = buildRequest(key, ctx);

      V1EvaluationResponse response = flagrClient.evaluate(request);
      String answerVariant = response.variantKey() == null
              ? ""
              : response.variantKey().toLowerCase();
      boolean isOn = defaultOnToggleKeys.contains(answerVariant);

      return ProviderEvaluation.<Boolean>builder()
              .value(isOn)
              .variant(response.variantKey())
              .build();
  }

  @Override
  public ProviderEvaluation<String> getStringEvaluation(String key,
           String defaultValue, EvaluationContext ctx) {
      V1EvaluationRequest request = buildRequest(key, ctx);
      V1EvaluationResponse response = flagrClient.evaluate(request);
      String answerVariant = response.variantKey() == null
              ? ""
              : response.variantKey();

      return ProviderEvaluation.<String>builder()
              .value(answerVariant)
              .build();
  }
... there are a lot of other methods

}

Configuring the client and OpenFeature

Next, configure the exampleFeatureProvider to the OpenFeatureAPI instance, which is designed to support multiple different FeatureProviders (which can be set or retrieved with a name). Since I'm working on a Spring Boot, I've built a class to contain the OpenFeatureAPI instance.

public class FeatureToggleApiProvider implements InitializingBean {
    @Autowired
    FlagrClient flagrClient;

    OpenFeatureAPI api = OpenFeatureAPI.getInstance();

    @Override
    public void afterPropertiesSet() throws Exception {
        OpenFlagrProvider openFlagrProvider = new OpenFlagrProvider(flagrClient);
        api.setProvider(openFlagrProvider);
    }

    public Client getFlagrApiClient() {
        return api.getClient();
    }

}

Make use of OpenFeature

Finally, other modules can make use of this OpenFlagrProvider to perform toggle evaluation by getting a Client interface (not implemented by the exampleClient, but is by OpenFeatureClient which will make use of the given exampleFeatureProvider):

Client client = featureToggleApiProvider.getFlagrApiClient();

String version = client.getStringValue(FLAG_KEY, "v1", ctx);
// or
boolean toggleOn = client.getBooleanValue(FLAG_KEY, false, ctx);

What are the benefits?

This is a basic introduction of how to integrate a toggle system using the OpenFeature specification (for more details and complete code, please check my GitHub repo). The toggle logic is extracted into another abstract layer, allowing the main application to remain focused on core business and deployment strategies. Even if we need to change the toggle system one day, the application won't need any changes, since we only need to develop the new exampleClient and exampleFeatureProvider (maybe there's an existing one, so no development work is needed. Check out the OpenFeature Ecosystem).

Summary

In this article, we've covered three points that we should know when we want to perform deployment strategies in a more flexible, easier, and cost-effective way with a feature toggle. First, our toggle system should be capable of dynamic configuration with persistence, provide a highly efficient dynamic evaluation method, and the evaluation should support targeting-key and constraints on the request context (payload). Then, we showed the toggle configuration and code snippets for different kinds of deployment strategies. Finally, we introduced the OpenFeature abstraction layer to keep the codebase clean and make it more maintainable and flexible.

References

For more information on deployment strategies and toggle knowledge, refer to the following resources:

Requirements for a feature toggle system

Deployment strategies with toggle

Deployment strategies with toggle

Shadow release (on/off)

Canary release (percentage-based rollouts)

A/B testing (constraints on context)

Minimizing toggle maintenance effort

Implementing the client

Configuring the client and OpenFeature

Make use of OpenFeature

What are the benefits?

Summary

References

Related Post