The Elements of CI/CD

In this email we will explore one of my favourite techniques to super boost your continuous deployment: feature flags! 🚀

Feature flags can be used for releasing features gradually, implementing kill switches for parts of your system during major events, beta testing and allowlisting, and even A/B testing. It's such a simple technique, and at the same time very powerful, that every engineer, and every team should be using.

I also would like to hear your opinion on the following questions, so please reply to this email with your suggestions:

  1. Would you like me to send emails like these more often than once a month? If yes, how often?
  2. Do you prefer long content emails like the past two I sent, or smaller ones and potentially more often?

The content below is just a small part of the dedicated section in the course, but it gives you a taste of what things are covered. 😉

Feature Flags

You should adopt and use feature flags as much as possible!

I am confident that you will learn a lot from this course, but if you have to only take away one thing, this is it; you should use feature flags.

Feature flags allow you to deploy changes to your software rapidly, and continuously, as many times a day as you want, without sacrificing the reliability of your product.

When you use feature flags, deploying changes is only half the story, since no customer will be exposed to those changes by default. The second half of the story, is that you selectively enable features to a subset of your customers. This sounds simple, but it really is a superpower! 💪🏼

Imagine that your product is an Android mobile application where users can search for whiskies, and ultimately buy bottles directly from the website. You are developing two new features that each needs a few weeks of development and iteration.

Two different features, developed in parallel by different (or same) engineers.

In some teams, you would work on these features on separate git branches, and only merge on the production branch that gets deployed when the feature is ready to go live. This means that you will have two separate branches diverging more and more over time, and you also don't know if something else breaks until you merge back to the production branch.

Too late.

With feature flags, you can continuously merge your code changes to the production branch, and more importantly deploy them to production, while at the same time ensuring that nobody will get exposed to them before they are ready.

We do this by wrapping these in-progress/unfinished features with special conditions that check if the feature is enabled, and if yes, for which users, and only proceed if the current user is allowlisted for the feature.

In our whisky application above, let's assume that feature A will not be enabled at all, and feature B will only be enabled for customers in England since the payment providers we implemented so far and are ready to be tested are for English customers.

Your codebase would call the following Java snippet when displaying the checkout screen:

record User(String id, String country) {};

List<PaymentProvider> getPaymentProviders(
  User user, 
  FeatureFlags ff) { 
  List<PaymentProvider> providers = new ArrayList();
  // ... some code that adds payment providers already supported

  if (ff.isEnabled(FeatureName.PayProviderXYZEnabled, user)) {
    providers.add(new PaymentProviderXYZ());
  }

  return providers;
}

The above snippet ensures that the PayProviderXYZForEngland is included in the checkout only when the feature flag FeatureName.PayProviderXYZEnabled is enabled for the user being handled.

A partial implementation of the FeatureFlags class for our whisky application can be the snippet below:

record FeatureFlag(
  Set<String> countries, 
  Set<String> userIds, 
  boolean enabledForAll, 
  boolean disabledForAll) {};

class FeatureFlags {
  enum FeatureName {
    PayProviderXYZEnabled,
    WhiskyRecommendationsEnabled
  }

  Map<FeatureName, FeatureFlag> rules = ImmutableMap.of(
    FeatureName.PayProviderXYZEnabled, new FeatureFlag(
      ImmutableSet.of("England"), 
      Collections.emptySet(), 
      false, 
      false
    ),
    FeatureName.WhiskyRecommendationsEnabled, new FeatureFlag(
      Collections.emptySet(), 
      Collections.emptySet(), 
      false, 
      false
    )
  );

  boolean isEnabled(FeatureName featureName, User user) { 
    FeatureFlag ff = this.rules.get(featureName);
    if (ff.disabledForAll()) {
      return false;
    }
    return ff.enabledForAll() || 
           ff.countries().contains(user.country()) || 
           ff.userIds().contains(user.id());
  }
}

In the above implementation of FeatureFlags our features can be selectively enabled for users based on their country, and their ID.

I have seen teams and companies implementing feature flags in vastly different ways. There are implementations as simple as the above, and there are implementations that have complex rules as we will explore in following sections.

You can see that in a few lines we have a functioning feature flags system that allows you to selectively execute code based on the user, or any other condition you need. This means that you can safely merge your code changes even if the features you work on are unfinished, or even incorrect, as long as you wrap their entry point execution call with a feature flag condition check.

Feature flags are a superpower, and by using them you:

The rest of the chapter will focus on popular use-cases for using feature flags. Going through these use-cases will hopefully make the benefits and power of feature flags clear.

Dynamic configuration

Feature Flags are one use-case of dynamic configuration in our software applications. We are going to explore more examples of dynamic configuration in the course.

In the code snippets implementing the FeatureFlags class for our imaginary Android application, we used hardcoded rules for the features we wanted to conditionally enable.

This means that in order to update these rules, e.g. adding new user IDs to existing flags or adding new flags, we would need to go through the application CI/CD pipeline, and deploy the application itself to use the updated rules. Depending on the nature of our product this might not allow us to iterate on the feature flags themselves easily, and we wouldn't be able to rollout many changes per minute/hour.

To solve this issue, we need to move the definition of rules outside our main applications, into their own artefacts, with their own CI/CD pipeline that can be executed independently of the applications' pipelines.

A usual approach I have seen in practice is to put the feature rules in text files (e.g. JSON, YAML, TOML) and have a CD/CD pipeline deploy them in S3 buckets in different regions. Our applications will have to be adapted to periodically (every 1 minute) fetch these configuration files, and recreate the rules inside the FeatureFlags class based on the latest configuration files. This exact flow was how one of our AWS Console feature flag systems worked a few years ago.

An alternative would be to use SaaS services like LaunchDarkly to configure your feature rules, and then in your application code you would call their APIs to get a decision.

There are myriad ways you can make your feature flags dynamically configured, but in all cases you want to have:

Not just for the server

Feature flag systems are not only for servers and backends. 🙌🏻

Even though the actual feature rules will indeed have to be served by some server API, at some point, they can be used for multiple application types.

You can use them on websites. When the static assets are served, you can inject the feature rules inside the HTML document. Or, you can provide a dedicated API that the website will call once loaded to fetch all enabled features for the user session, and update periodically. Or, you can update retrieve the feature rules every time the user logins or refreshes their session token.

You can use them on mobile applications. Hardcoded feature rules in the application itself can be used, but will only be able to be updated from inside the application itself (e.g. user enabling an experimental feature), or when the application itself is upgraded. Usually, you provide a dedicated API that the application will call once loaded to fetch all enabled features for the user, and update periodically.

Offline applications. In cases where there is no network connectivity or calling remote APIs is not possible, we can still use feature flags in the hardcoded fashion we explored previously. The user of the application will need to do specific actions to enable or disable the features. For example, in several CLI applications you need to provide specific commands to enable experimental features (e.g. Node.js). In Android systems, you can enable advanced Developer Mode by clicking a specific menu item X number of times.

In general, with a bit of imagination you can use feature flags in any kind of software application we build. And you should.

Beta testing and allowlisting

One important aspect of the CI/CD flywheel is to get feedback from customers as early as possible, and iterate over the application by fixing issues and implementing new features.

Feature flags allow us to expose incomplete and in-progress features to a subset of our customers.

This makes it easy to iterate on our applications without worrying that we are going to negatively impact the rest of our customers. Not only that, but we can do it straight in our production environments, using real data, real dependencies, and real customers. The feedback and confidence we can get by using production directly is great.

Feature flags are a great way to implement early access programs for your products, and even paid tester programs where you allow organisations and individuals to test your product, and give you feedback, before releasing it to the public.

Feature rollout

Probably the most popular use-case of feature flags is to gradually, and safely, rollout a new feature. 🤞🏻

While the feature is implemented the feature is enabled only for certain internal users (or nobody), and only in our staging environments. Once it's ready for beta testing we enable it for a few customers, and for specific production environments. Once it's ready for full release, we enable it for each of our production environments, ideally not all-at-once to avoid breaking all customers in case a bug slipped through.

If the rollout completed successfully, we remove the code that does the condition check and always use the newly launched feature. If the rollout of the feature caused some regression, we can update the rules to disable the feature, and deploy that in order to quickly disable the feature and revert to the working version of the application.

This is the most basic, and arguably the most important use of feature flags.

Deliver value to your customers, safely, reliably, continuously!

AB testing and experiments

A more complex use-case for feature flags is A/B testing (experiments).

A/B testing is when we want to experiment with different variations of the same feature, for example choosing the color of the checkout button between yellow, green, blue. A/B testing systems usually provide extra features on-top of what we already explored so far, but the underlying technology is often the same.

For example, at Amazon, for the retail website we had our own internal service for feature flags and experiments called Weblab [1][2].

In Weblab you could create a new feature, or experiment, where you didn't just specify the rules that enabled a feature. You could specify multiple treatments of the feature, and the rules per treatment.

For example, for the checkout button color example, you would have the Control (C) treatment, which is the default/existing case when the feature is disabled. Treatment T1 was the first option of the feature, e.g. yellow button. Treatment T2 was the second option of the feature, e.g. green button. Treatment T3 was the third option of the feature, e.g. blue button.

If you just wanted the feature flag functionality you were done, and in the code you would have something like below:

String treatment = flags.getTreatment(FeatureName.ButtonColor, user);
if (treatment == "C") { 
  this.buttonColor = this.colorDefault; 
} else if (treatment == "T1") {
  this.buttonColor = this.colorYellow;
} else if (treatment == "T2") {
  this.buttonColor = this.colorGreen;
} else if (treatment == "T3") { 
  this.buttonColor = this.colorBlue;
}

If you wanted the A/B testing (experimentation) functionality, Weblab would also track key metrics that you specified for each of the treatments.

In the example above, we could track the number of button clicks for each treatment, and therefore we would be able to get concrete data on which button color performed better.

The Weblab implementation showcases that when there is a robust feature flag implemented, there is a lot of interesting functionality that is now made possible.

Kill-switch

Another common use of feature flags is the kill-switch. 🛑

In many important feature rollouts, and big events (e.g. Black Friday, Superbowl, Christmas), you might want to have an easy way to enable/disable specific functionality in your application quickly. In the kill-switch use-case, we want to disable a feature, or immediately revert to a different implementation of a feature.

Having a kill-switch is a concept very similar to the rollout of a feature that went wrong and we disable it until fixed. But, instead of the feature flag being temporary, it's permanent.

One real-world example of a kill-switch feature flag I encountered was in Amazon Video. When we released super popular shows (e.g. The Grand Tour) the demand was very high. We had several kill-switches all over the codebase in order to quickly disable certain functionality that would allow the services to scale better if we had unexpectedly high traffic.

For example, one of those kill-switches would switch from using the normal server-rendering flow for the TV Show details page on the Amazon Video website, to using a more static view that would only show static information and allow you to stream the show.

Even though some features wouldn't be provided (e.g. customer reviews), this emergency measure would allow customers to watch the show, the main purpose of the page, without hammering the downstream services of the normal execution flow.

In general, most feature flags are meant to be temporary, to control the rollout of new features. There are certain cases where we want the ability to dynamically change our application's behavior, and that's where permanent feature flags come in place, as kill-switches.


If you like this content, please share the course website www.elementsofcicd.com with people you know that would be interested in similar content, and prompt them to subscribe to the waitlist 🙏🏼

Be good, be healthy, and I am looking forward to sending you the next email in a few weeks. Also, don't forget to reply with your answers to the questions at the top 🙃

Thanks!

For the most updated information on the course, visit ElementsOfCICD.com.


Original version of this email: https://ckarchive.com/b/68ueh8hkpzow8