Disclaimer: This content is AI-generated and may contain inaccuracies. Please verify with the original source.
Source: Listen to the podcast
The podcast features a conversation with Jeu George (JG), co-founder and CEO of Orkes, discussing microservices orchestration.
Guest Introduction and Background
Jeu George transitioned from mechanical engineering to software during his master’s, sparked by an internship at India’s space organization. He later worked at Microsoft and then Netflix, where the foundation for Orkes and its product, Conductor, was laid.
What is Microservices Orchestration?
JG explains that microservices orchestration evolved from the need to manage the increasing complexity of microservices, a trend pioneered by Netflix as an early cloud adopter. As companies like Netflix rapidly developed numerous small, decoupled services, a new problem emerged: coordinating these services, especially when one service needed to call multiple others in a sequence, and handling potential failures in these chains.
Microservices orchestration, exemplified by tools like Conductor, operates at the application layer. It’s distinct from:
- Container orchestration (which is a lower-level infrastructure concern like spinning up containers).
- Service meshes (which JG implies is different, though not deeply elaborated upon).
The orchestration layer glues different services together (e.g., service A calls B, then C), ensuring the overall workflow completes reliably.
Evolution and Need for Orchestration
Netflix encountered the challenges of managing numerous microservices early on due to its pioneering cloud adoption. This led to the creation of Conductor to manage the logic and flow between services. JG notes that many services were built just to call other services, creating complex dependencies and challenges around failure management and ensuring completion of tasks.
The need for such orchestration isn’t limited to large enterprises. Just as companies now opt for cloud providers instead of building data centers, or use managed databases instead of building their own, they are increasingly realizing the complexity and unreliability of building homegrown orchestration solutions. Moving from these homegrown systems to dedicated platforms like Conductor is a major trend.
Open Source and Conductor’s Journey
Netflix Conductor was initially an open-source project. JG, a first user of Conductor at Netflix, clarifies that Netflix didn’t deprecate it. Instead, Orkes, co-founded by the creators of Conductor, collaborated with Netflix to move Conductor to the Conductor OSS Foundation. This allowed for broader community involvement and growth. Netflix itself has significantly increased its internal usage of Conductor.
The business model for open-source has evolved:
- Initially, it was often about providing support (like Red Hat for Linux).
- With the cloud, it shifted to offering hosted, managed versions of open-source software (e.g., AWS hosting MySQL).
Orkes provides enterprise features, security, governance, and compliance on top of the open-source Conductor, ensuring backward compatibility. Customers choose this for reliability and to avoid the operational burden and lost opportunity cost of building and maintaining these features themselves.
Reliability and Failure Management
A primary driver for adopting orchestration platforms is reliability. The platform needs to ensure that once a workflow starts, it’s taken to completion. For critical operations like payments that must happen “once and only once,” the platform supports patterns like the Saga pattern to manage distributed transactions.
If parts of a distributed transaction fail:
- The system allows for retries with configurable logic (frequency, backoff).
- If retries fail or are not desired, users can define custom logic to “unwind” the transaction.
While the platform provides retry mechanisms, concurrency controls, and rate limiters, the idempotency of individual services (ensuring an operation has the same effect whether executed once or multiple times) is still the developer’s responsibility, though the platform can pass signals like idempotency keys. The platform also supports multi-AZ and multi-region deployments for high availability.
Scaling Microservices
The orchestration platform itself (like Orkes/Conductor) is built for high scale and reliability, drawing from Netflix’s demanding environment. For the services being orchestrated, the platform provides mechanisms like:
- Circuit breakers (to stop bombarding a failing service).
- Rate limiters.
- Exponential backoff for retries.
It can also provide auto-scaling signals to the orchestrated services via APIs, allowing them to adjust their capacity based on incoming workload (e.g., during traffic bursts like Prime Day).
AI Agents and Future Evolution
The platform has evolved from a workflow engine for asynchronous backend tasks to an application platform supporting real-time API orchestration and business process orchestration. Most recently, it has incorporated capabilities for building and managing AI agents and agentic workflows.
Key challenges in enterprise AI adoption that the platform addresses include:
- Model Control & Data Leakage: A “prompt engineering studio” helps test prompts and understand data exposure.
- Accessibility: Enabling engineers with no AI expertise to use AI capabilities like an API.
- Private Cloud & Internal Data: Allowing agents to run in the customer’s cloud and connect to internal APIs and knowledge bases securely.
- Agent Reliability: Combining deterministic tasks (e.g., sending an email via an existing API) with LLM capabilities. It allows for guardrails, human-in-the-loop verification, and the ability to pick and choose when to use AI versus traditional APIs within an agent’s workflow.
Orkes’ Differentiators
Orkes aims to replace homegrown solutions by offering superior reliability. Key differentiators include:
- Developer Choice: Supporting workflow creation via code, configuration, or UI.
- Language Agnostic: Supporting services written in various languages (Java, Go, C++, etc.).
- Deployment Flexibility: Offering Orkes cloud, bring-your-own-cloud, or on-premise deployments.
The core value proposition remains reliability.