Going from a single monolithic application to a set of small, independent microservices has clear benefits. Microservices enable reusability, make it easier to change and scale apps on demand. At the same time, they introduce new challenges. No longer is there a single monolith with all the business logic neatly contained and services communicating with simple local method calls. In the microservices world, communication has to go over the wire with REST, gRPC or some kind of eventing mechanism. You also need to find a way to get independent microservices to cooperate together toward a common goal.
Choreography vs Orchestration
Should there be a central orchestrator controlling all interactions between services or should each service work independently and interact through events? This is the central question in the choreography vs orchestration debate.
In choreography, each service emits and receives events as they need. There’s usually a central event broker to pass messages around, but it does not define or direct the flow of communication. This allows truly independent services at the expense of less traceable and manageable traffic flow and error/retry policies.
In orchestration, a central service defines and controls the flow of traffic between services. With centralization, it becomes easier to change and monitor the flow and apply consistent policies.
There are a number of cloud services and open source tools that can help for both choreography and orchestration approaches. In Google Cloud, Pub/Sub and Eventarc can be used for choreography of event-driven services, whereas Workflows is for centrally orchestrated services.
Orchestration with Workflows
Workflows is Google Cloud’s fully-managed orchestration service. It orchestrates not only Google Cloud services, such as Cloud Functions and Cloud Run, but also any publicly available HTTP-based APIs within Google Cloud and beyond.
Beyond orchestration of services, it has these built-in features:
- Flexible retry and error handling between steps for reliable execution.
- JSON parsing and variable passing between steps to avoid glue-code.
- Expression formulas for decisions that allow conditional step executions.
- Sub-workflows for modular and reusable workflows.
- Declarative authentication support for Google Cloud and external services.
- Connectors to Google Cloud services such as Pub/Sub, Firestore, Tasks, Secret Manager (and many more) for easier integration.
Next, let’s take a look at a case study!
Case study: Pic-a-Daily
Over the past year, we developed a picture sharing application, named Pic-a-Daily, to showcase Google Cloud serverless technologies such as Cloud Functions, App Engine, and Cloud Run. It also has a hands-on workshop to show how to build the application in a series of labs, whose code source is available as open source.
The Pic-a-Daily application evolved progressively. As new services were added over time, a loosely-coupled, event-driven architecture naturally emerged, as shown in this architecture diagram:
Services were loosely coupled, deployed and scaled independently with no single point of failure. However, as we kept adding more services, we started losing sight of the underlying business flow and processes. It became harder to isolate and debug problems when something failed in the system.
When Workflows became generally available at the beginning of the year, it offered us an opportunity to re-architect our application and use an orchestration approach. In orchestration, instead of each service responding to events, Workflows calls services in a predetermined order.
After some restructuring, the following architecture emerged with Workflows:
In the orchestrated approach, Workflows take care of executing the business flow in a series of steps. Some of the steps are executed by Workflows directly and some steps are delegated to Cloud Run and Cloud Functions services. There is a central workflows.yaml file capturing the business flow. It can be versioned and source controlled. You can see a visualization of the flow in Cloud Console and which executions failed at which step without having to dive through logs of each service:
Workflows also ensures that each service call completes properly and it can apply global error and retry policies.
Working with Workflows was refreshing in a number of ways and taught us some lessons that are worth sharing.
A central workflow definition with clear steps and executions of those steps allowed us to have much better visibility into service invocations.
In the original event-driven architecture, we had to deal with three types of events. In the orchestrated version, there was only a simple REST call and HTTP POST body to parse. This resulted in simpler code. We also got rid of entire services and simply had Workflows make the API calls for us, resulting in less code overall.
In the original event-driven architecture, we had to create Pub/Sub topics, set up Cloud Scheduler and Eventarc to wire-up services. All of this setup was replaced with a single workflows.yaml file in Workflows, resulting in much less setup.
Error handling was simplified. The flow of steps stops when an error occurs, so we were no longer in the dark about exactly which services succeeded or failed. We also now have the option of applying global error and retry policies at each step.
As we were redesigning the architecture, an interesting code vs. YAML question came up over and over: “Should we run this functionality with code in a service or should we let Workflows make this call from the YAML definition?” In Workflows, more of the logic lands in the workflow definition file in YAML, rather than code in a service. Code is usually easier to write, test, and debug than YAML, but it also requires more setup and maintenance than a step definition in Workflows.
The last aspect to mention is the loss of flexibility inherent in orchestration. An event-driven architecture is fairly extensible, compared to an orchestrated solution that mandates a strict series of steps.
At this point, you might be wondering: When should you choose choreography over orchestration? Both approaches are valid and have pros and cons. As a general rule, choreography is usually a better fit if services are not so closely related or if they can exist in different bounded contexts. Orchestration is usually better if your services usually go together in a certain order and you can describe the logic of your application as a flow chart that can be converted into a workflow definition.
We invite you to have a closer look at Workflows. If you want to study Pic-a-Daily, check out Serverless Workshop and its open source code on GitHub. It offers codelabs spanning Cloud Functions, Cloud Run, App Engine, Eventarc, and Workflows. In particular, lab 6 is the one in which we converted the event-based model into an orchestration with Workflows.
We look forward to hearing from you about your workflow experiments and questions. Feel free to reach out to us on Twitter at @glaforge and @meteatamel. And if you get a chance, you can attend our session at the upcoming JAX London conference, about Choreography vs Orchestration in serverless microservices (4–7 October 2021)!