Skip to content

Conversation

@acroca
Copy link
Member

@acroca acroca commented Oct 22, 2025

This proposal adds a way to manage versioning within a workflow code where workflow users would be able to branch their code to introduce changes that would lead to non-determinism.

@acroca acroca force-pushed the workflow-branch-versions branch 2 times, most recently from 39ce4ad to f727df4 Compare October 22, 2025 08:51
@acroca acroca force-pushed the workflow-branch-versions branch from f727df4 to 1ba2afe Compare October 22, 2025 09:08
Signed-off-by: Albert Callarisa <[email protected]>
@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 22, 2025

Thank you for taking the time to write this counterproposal for workflow versioning!

I certainly agree that minimizing code duplication is a worthwhile goal, but I have several concerns about the approach outlined - particularly in terms of runtime safety, developer experience, and alignment with Dapr's deterministic workflow model.

Runtime Fragility Due to Version Ranges

Your proposal introduces a brittle contract between the workflow history and the executing code via GetBranchVersion("branch_name", min, max). While one might hope that workflows run regularly and eventually synchronize to newer versions, I don't think we can risk making that assumption in practice. We should instead assume that developers have dormant, long-running workflow instances that haven't migrated. If the history contains a version outside of the provided range when they are invoked, the workflow will fail - introducing a tight coupling between deployment correctness and runtime stability that assumes strong consistency where none exists.

You imply that older versions can be removed over time as workflows naturally converge towards newer branches. However, this again relies on an assumption of eventual consistency that simply isn't guaranteed in Dapr's workflow model. Without a mechanism to ensure that all in-flight workflows have completed or migrated, removing older versions risks breaking workflows that still depend on them. In practice, this means that developers must retain all historical branches indefinitely, which undermines your proposed maintainability benefits.

Implicit Versioning Logic Hidden in Switch Statements

The reliance on switch statements to manage branching logic obscures the versioning model and makes it harder to reason about workflow behavior. It increases the risk of missing cases or introducing non-deterministic behavior, especially if inputs or outputs change subtly between versions. While your proposal rightly emphasizes guarding activity and child workflow calls, it overlooks the importance of the inputs to those calls. If those inputs change outside of a guarded switch, replay will fail due to mismatched persisted state — violating Dapr’s deterministic guarantees.

Even if developers are diligent about wrapping all changes in switch statements, the result is a tangled mess of branching logic that becomes increasingly unwieldy over time. And because of the eventual consistency issue mentioned above, you can’t ever remove old branches. This leads to bloated, fragile code that’s difficult to maintain — especially across teams working against the same file.

Lack of Type Safety and Explicitness

My proposal uses distinct types for each workflow version (e.g., MyWorkflow, MyWorkflow2, etc.) which:

  • Makes versioning explicit and traceable
  • Leverages the compiler to catch errors at build time
  • Encourages clean separation of logic per version

Your approach keeps all the logic in one file and relies on runtime branching, which:

  • Reduces clarity
  • Makes testing and validation harder
  • Encourages mixing concerns and increases the risk of subtle bugs (like those cited in the last section).

Observability Challenges

Embedding multiple versions within a single workflow complicates observability. Tools like dapr workflow history will only show invoked activities and child workflows — not which version path was taken. This requires developers to have intimate knowledge of the code to interpret logs. My proposal, by using distinct types, naturally segments workflow histories by version, making debugging and monitoring far more straightforward.

Developer Experience and Maintainability

My proposal promotes a mindset of "version the whole workflow," which aligns with Dapr's deterministic guarantees. Your proposal encourages partial in-workflow versioning, which may seem convenient in simple examples but will likely lead to:

  • Fragmented logic
  • Increased cognitive load
  • Higher likelihood of introducing bugs during replay

While your counterproposal offers flexibility, I believe it introduces too much complexity and risk for a system that mandates determinism and reliability. Full-type versioning:

  • Preserves deterministic integrity
  • Makes versioning implicit and traceable
  • Simplifies observability and debugging

I recommend we continue exploring full-type versioning as outlined in my proposal as the foundation, potentially laying more granular strategies atop it once the core model is proven stable.

EDIT: I've augmented my original proposal to add a section directly addressing my perceived shortcomings about this proposal and why I believe mine remains the better approach. I certainly look forward to additional discussion about the idea.

@olitomlinson
Copy link

Thank you for taking the time to make a proposal, it's very well written! 👏

I agree with @WhitWaldo primary point here Workflows will become a spaghetti with switch statements over time, given the long-running nature of them. It may feel elegant and simple in the early iterations of a Workflow, but it will likely become unwieldy.

In this Proposal, duplicating a workflow is cited as a negative. Respectfully however, I believe this to be a benefit. Having a static version of the code which represented that workflow at that time of authoring IMO is better. It's not an apples-to-apples comparison, but think about how SQL migrations typically exist, as discrete snapshots/files. Now imagine a SQL migration script that was frequently modified in-situ over several years using a similar branching/switch approach, it would be complete spaghetti, right?

@yaron2
Copy link
Member

yaron2 commented Oct 24, 2025

First, I believe we should not be ignoring prior work of others, and instead build on the years of research and work done by other comparable solutions in the code-first workflow space, pretty much like what we did with the Durable Task design from the early days. This very problem has been debated before and the solution proposed here is essentially the recommended way to version workflows by both Temporal and Azure Durable Functions (which carries a significantly larger developer base, if not the biggest in the code-first workflow market today).

While I personally don't like if/else/switch statements in code to reason about which execution path to take, it is preferable IMO to strictly name-based conventions. That being said, there's nothing stopping us from documenting both approaches and letting users decide (albeit with a recommendation and pros/cons for each path).

More to the point:

Your proposal introduces a brittle contract between the workflow history and the executing code via GetBranchVersion("branch_name", min, max)

Having the ability to do version lookup that is explicit and runtime-visible is a plus IMO. Developers see the branch version being used, rather than hidden behind types. That visibility improves auditability and supports operational measures (alerts, dashboards) rather than relying on implicit assumptions.

In addition, a proper retirement strategy is required for any type of versioning strategy. Recent tooling additions to give workflows better visibility (CLI additions to list/view workflows) help here.

The reliance on switch statements to manage branching logic obscures the versioning model and makes it harder to reason about workflow behavior. ... Even if developers are diligent about wrapping all changes in switch statements, the result is a tangled mess of branching logic that becomes increasingly unwieldy over time

This is the essence of divergence between the two proposals, and here it is clear similar tools have taken to recommend the if/else/switch statement approach, certainly not becuase of its elegance (I dislike it personally and I'm sure more do as well) but because the alternative (creating entirely new workflow type per version) leads to code duplication, increased number of types to maintain, and potential drift. From a maintenance standpoint, branching within a single workflow type reduces duplication: you only write the delta logic per branch, not recreate the workflow skeletons each time. This then bleeds into unit tests that also grow larger in size as the number of types increase per each new version.

Your approach keeps all the logic in one file and relies on runtime branching, which:

Reduces clarity
Makes testing and validation harder
Encourages mixing concerns and increases the risk of subtle bugs (like those cited in the last section).

Yes, a clear separation of logic is important — but that doesn’t necessarily require a new type for each version. You can still structure each branch version as its own class/module inside the same workflow namespace (e.g., BranchV1Handler, BranchV2Handler) and have the main workflow type simply delegate based on version. The runtime branching remains, but code remains modular and logically separated. I'd argue that the explicitness of versioning is preserved: the branching mechanism forces you to state “branch version X uses handler Y” and you record the version value early, so it's visible in code. With multiple types, you also need a mechanism to route incoming workflow instance IDs to correct type, which itself is a runtime plumbing challenge. The branching model brings that routing into the workflow code itself, making it explicit rather than implicit.

Embedding multiple versions within a single workflow complicates observability. Tools like dapr workflow history will only show invoked activities and child workflows — not which version path was taken. Distinct types naturally segment workflow histories by version.

When the workflow begins we can log or annotate the branch version in the workflow’s custom status or metadata (for example, workflow.SetCustomStatus("branchVersion="+version)). That wa tooling can pick up which version path was taken. The proposal can include this as a required instrumentation step, ideally implicit in the runtime. I'll defer to @acroca here.

Your proposal encourages partial in-workflow versioning … will lead to fragmented logic, increased cognitive load, higher likelihood of bugs during replay.

My take on this is different, the branching model is actually intended to reduce fragmentation by centralizing the workflow definition, avoiding proliferation of types and duplication of common logic. Devs will have fewer workflow types to understand and maintain.

I'd argue that workflows tend to evolve incrementally (small changes, bugfixes, enhancements) rather than wholesale new versions each time. The branching model aligns with that, enabling incremental changes without a full rewrite. The distinct-type alternative may force an unnecessary major version bump for each minor change, increasing burden.

@jjcollinge
Copy link

jjcollinge commented Oct 24, 2025

I'd also point out that developing Workflows at most companies is not done by a single individual developer and typically requires incremental changes made concurrently by different people. You want those changes to conflict within the same workflow definition so you can resolve them with clarity - if everyone is copying code and updating distinct version types, you introduce a high possibility that you miss changes made by other developers.

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 25, 2025

@yaron I certainly appreciate the thoughtful response!

While I personally don't like if/else/switch statements in code to reason about which execution path to take, it is preferable IMO to strictly name-based conventions. That being said, there's nothing stopping us from documenting both approaches and letting users decide (albeit with a recommendation and pros/cons for each path).

I've tried to be consistent that I don't think we need to have strictly name-based conventions for versioning. As I said at the top of this section in my proposal:

First, we introduce and document a new convention where workflow types are versioned by appending a numerical value to
the end of their names. Higher numerical values represent later versions (e.g. ExampleWorkflow100 represents a later
version of the ExampleWorkflow10 type). Other strategies might be accommodated by the SDKs themselves, but this is the
approach I'll use throughout the rest of this document.

That certainly doesn't limit my implementation strictly to that one approach - rather it opens the door for different SDKs to support whatever makes sense in their ecosystems. But it's also why I'm strongly advocating that the runtime be minimally involved in the process - different versioning schemes/styles will be better accommodated in different languages. The runtime here should maintain the state of which version was used by the app during which workflow executions (for replay purposes), but should otherwise not be involved in the scheme, leaving the various SDKs free to accommodate this as they locally see fit. In other words, I think it significantly limits the approaches that can be taken by the SDKs here for the runtime to be so involved in versioning, and it doesn't make sense here - versioning is strictly a "types-at-the-application-level" concept in Dapr (unlike other workflow frameworks) and as such should be managed by the SDK built atop the app - not by a centralized orchestrator dealing with the interplay between different applications.

I believe we should not be ignoring prior work of others, and instead build on the years of research and work done by other comparable solutions in the code-first workflow space, pretty much like what we did with the Durable Task design from the early days. This very problem has been debated before and the solution proposed here is essentially the recommended way to version workflows by both Temporal and Azure Durable Functions (which carries a significantly larger developer base, if not the biggest in the code-first workflow market today).

I certainly agree that it's worth learning how other frameworks address similar requirements, But I disagree that we should strive for implementation homogeneity because the problems they're solving for aren't strictly our problems. Here, I think we have unique opportunities for developing a cleaner solution strictly because of our sidecar/SDK approach.

I would posit that the if/else/switch approach looks great on paper (clear, concise divergences) and in toy examples and that it introduces wildly overcomplicated code in a real-world setting where you're not making single-line tweaks, but I have no hands-on experience with either of those frameworks myself and have only browsed their documentation.

When the workflow begins we can log or annotate the branch version in the workflow’s custom status or metadata

Certainly, and that's a route available to users regardless of which approach we take. But I wrote my proposal to make this a primarily SDK-based change with minimal work needed on the runtime because it's just not necessary - this proposal increasingly adds to that runtime work and resource load with no obvious value gain. Why does the runtime need to know what versions are available in an app? How would it (or even the SDK) possibly know the versions based on the information available when the app starts up? That's an app and workflow implementation detail and again, adds a brittle contract (both at runtime and development) without obvious value (except for its obvious necessity in building out the branches).

In addition, a proper retirement strategy is required for any type of versioning strategy. Recent tooling additions to give workflows better visibility (CLI additions to list/view workflows) help here.

I think it's an admirable goal and not one that I considered in scope for my proposal. I'd argue it's assumed in this proposal that developers already have that visibility and can meaningfully drop old versions without much pomp and circumstance and that's.. frankly not true today. Neither do the tooling improvements capture or reflect custom status changes (not a trivial change, I might add), which would be necessary to surface version data in this proposal.

My take on this is different, the branching model is actually intended to reduce fragmentation by centralizing the workflow definition, avoiding proliferation of types and duplication of common logic. Devs will have fewer workflow types to understand and maintain.

That's part of my point - just as developers do not "maintain" SQL migration types, neither do they maintain workflows (in my proposal). They produce them and if they need to make changes, they introduce new versions (and by design, explicitly never go back to make changes once they're published). I'd personally advise that older workflow types be kept in a separate solution folder so they're out of the way (just as we do for migrations) so the only types the developer sees in the solution tree are those they're actively using - and when they open any of them, they have a clean, easy-to-understand workflow. They don't need to know what happened in previous versions - they only need to see what is happening in their current deployments. Again, I think we can certainly introduce CLI tooling not unlike EF Core Migrations that can simplify adding new versions to improve my experience - I don't see how similar tooling could be introduced to help here as it's all strictly runtime-based.

@jjcollinge I don't know how that experience differs from Oli's introduction of SQL migrations. Those have been around far longer than Temporal or Durable Task Workflows and teams of developers have readily embraced the concept just fine because the tooling hides all the migration complexities. With this proposal, there's no curtain to hide the complexities behind - it's all right there managed by all the cooks hanging out in the same kitchen and waiting for someone to screw it up in a non-obvious way. My proposal makes this strictly a Dapr-maintainer problem - all the nuts and bolts of how it works are handled behind the scenes.

I think your point raises a great opportunity to help teams here. My approach is focused on .NET, so other approaches may be more appropriate for other languages, but I don't see why a CLI tool creating a new version couldn't potentially older workflow versions as read-only and apply an analyzer that validates the same. Then it's as easy as the team agreeing to work on a new version, someone running the CLI tool and everyone working on the same file as before - but without any of the stickiness of managing their own in-line branches.


Finally, I spent the afternoon putting together a good-faith implementation of an evolving workflow just to experience the difference developing against the approach taken in either of the two proposals and after four iterations, I stopped because the time it took to build each version using this approach was just growing exponentially and I encountered some implementation detail questions not answered in the proposal. While I started this by adding rich logging in either workflow, I ultimately abandoned updates there too because this approach required that I just maintain far too many of them and introduce duplicates to accommodate the different signatures.

I encountered some lingering questions about the approach while modeling this out:

  • From where does a version come from? I assume that when the logic is deployed to production, that's the "latest" version and the workflow is incrementing whatever version it was tracking to accommodate, but how does the developer know at build time which versions have been registered in the runtime so they can statically map out their switch statements? Just assume a minimum version of 1 as I think removing older versions is a complex topic and out of scope for either of our proposals.
  • Assuming I'm right and given that we can only validate the values provided by Dapr and applied to the branches at runtime, we can only know at runtime if it'll fail or not. If I accidentally introduce a bug into production that breaks my workflow and it's registered as the latest version, how do I add another version that is guaranteed to always supersede the buggy version and do I have to persist the bug in older version branches?
  • I read the proposal to suggest that different teams are setting up different branches throughout the workflow so they can independently version their chunks of code. Given that, what happens a version change needs to be applied across multiple branched sections bearing different names?

Circling back to my example then, you've never seen my code before because I just wrote it. That's perfect - it's like we can pretend you're a team member with no familiarity with the workflow code itself that's been tasked to tweak something about it. Seriously - versioning philosophy aside, and strictly focusing on the developer experience right then, which is easier to grasp and requires less documentation to describe?

-- My Approach --

  1. Duplicate the latest version of the workflow
  2. Change the type name to accommodate whatever the configured versioning convention is
  3. Apply the changes you want to the workflow knowing that it's type-safe - any mismatched types across workflow versions will be caught at compile time
  4. Copy all your existing unit tests into another class to match your new version.
  5. Remove the tests that are no longer relevant to your change(s) and add new ones where appropriate

Note that steps 1, 2 and 4 could easily be automated via CLI tooling.

-- This Approach --

  1. Map out all logical paths in the inline file to figure out which one(s) might need an update to reflect what's changing
  2. Change the context-retrieved version numbers to reflect that you're adding a new version to a least one named version branch
  3. Add another case to the switch statement with your updated code
  4. You may need to refactor that section of the workflow and carefully rebuild the previous versions (not necessarily knowing anything about them) if yours is not a trivial one-line change
  5. Add more unit tests to validate any refactoring changes made and to reflect the new updates

I don't immediately see any opportunity for tooling to assist here.

Here, thinking about how to logically handle versioning becomes one of the most important and increasingly challenging parts of writing and maintaining a workflow and again, per my proposal, it needn't be. I still maintain my approach will introduce fewer inadvertent bugs, be easy to supplement with tooling, minimize the cognitive load on developers to understand and apply the feature and make for a tremendously better developer experience.

Signed-off-by: Albert Callarisa <[email protected]>
@acroca
Copy link
Member Author

acroca commented Oct 27, 2025

The proposal can include this as a required instrumentation step, ideally implicit in the runtime. I'll defer to @acroca here.

I added a new section for observability and monitoring. In particular the new metric suggested in that section would be a good way to know if a specific branch version is safe to delete, once the gauge gets to 0 that specific branch version will not be used again.

I think this should make users more comfortable deleting old versions.

@yaron2
Copy link
Member

yaron2 commented Oct 27, 2025

@WhitWaldo It seems to me like we are focusing on different things when attempting to explain the rational for both proposals. I feel as if you are mostly coming from a place that focuses more on what the runtime should do vs what the SDK should do, while I'm trying to come at it from a single user experience question: Should Dapr Workflows allow users to version their workflows using patching. This current proposal allows users to patch existing workflows incrementally and address bugs instead of waiting and orchestrating complete cut-overs. That is the main reason why I think we should go down this route, and also why the other workflow orchestrators mentioned previously went this right. A user could always, if they wanted to, simply let the old workflows die and then create new types with new names.

Discussions about whether duplicating code and types are valid but in the end, it is more of a subjective user experience question. By adopting strictly this proposal, we lock users out of the ability to do that.

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 27, 2025

@yaron2
I'm approaching this strictly from a place of simplifying the developer experience to add a broadly customizable feature of significant value without introducing a world of potential "foot guns".

Take the Temporal documentation you referenced: it spends considerable effort explaining patching and its caveats. In contrast, my approach aligns more closely with Temporal's Workflow Versioning concept, which is intuitive, aligns naturally with existing workflows, and takes only a single paragraph to explain in their docs. Dar doesn't handle deployments, so excluding that, but the principle still applies.

Here's a brief rewrite of their paragraph that describes my concept:

Dapr's Workflow Versioning lets you define a versioning convention and write workflows as usual. Existing workflows replay old code paths; new ones follow updated paths - no patching required.

You mentioned cut-overs being complex, but in my weekend experiment, they were far simpler than patching. As long a workflow inputs/outputs remain consistent, you can rewrite as much or as little as needed. CLI tooling can smooth over some of the rougher "create file, rename, move old file somewhere to archive", but it's a trivial lift that provides a lightweight, repeatable solution.

Patching looks like it works fine for minor tweaks, but it breaks down with larger changes. My approach scales effortlessly - write workflows normally, opt into versioning whenever you want, get seamless support without needing complex interplay with the runtime.

@yaron2
Copy link
Member

yaron2 commented Oct 27, 2025

without introducing a world of potential "foot guns"

I'd argue every approach, including both proposals, is "foot gun enabled".

it spends considerable effort explaining patching and its caveats

Actually, it's the other way around. Patching is introduced after explaining the caveats of name based convention versioning.

In contract, my approach aligns more closely with Temporal's Workflow Versioning concept

It's not workflow versioning that you referenced, it's Worker Versioning, which leads on to the next point: Dapr does not have the same concept of a worker as Temporal does and does not share many of it's concerns regarding workflows and the underlying deployment hosts, thus this section of their docs is not really apples to apples with what is being discussed here. Further more, from what you referenced, their explicit recommendation is to use a feature called AutoUpgrade which they explain needs to go in parity with... workflow patching.

Patching looks like it works fine for minor tweaks, but it breaks down with larger changes

I disagree. When introducing a change to a workflow, large or small, the change to the business logic will be the same whether you use patching or name based convention. With a name based convention, you would actually need more code since you're duplicating the workflow definition, introducing a new type and more which results in more code to maintain as the changeset is bigger. With the patching approach, the only thing that grows except for the actual business logic sections is the if/else/switch statement, which makes it very clear to a developer about which version of the workflow triggers what logic.

@WhitWaldo
Copy link
Contributor

@yaron2

I was trying to be concise without writing another 1000 word essay - yes, I get the difference between Workers in Temporal versus our workflows, but again, disregard the deployment aspect of their workers for my purposes as they don't apply to Dapr.

They have pinned vs auto-upgrade versioning. Their auto-upgrade approach automatically moves to the latest version when available, as does mine, while retaining replay/re-run support in a "pinned" version. Pretty much run down the list of bullet points here and there are readily apparent parallels to my Dapr-specific workflow versioning approach.

They raise four key points about why workflow cutovers are a poor versioning approach in Temporal:

  1. One must register the new workflow types
  2. One must update all callers of the workflow to update the type
  3. This will leave to duplicate code
  4. It doesn't provide actual versioning support

Numbers 1, 2 and 4 are accommodated by my proposal and I've talked about #3 in-depth already (trivialized by CLI tooling as EF Core Migrations do today for SQL). In short, because of the unique advantages and architecture of Dapr over how Temporal works, we're able to offer a version of workflow cutovers that's trivially approachable in a way that Temporal simply cannot support because of their differing architecture.

I think our biggest difference on the mode/less code front is that yes, patching might require fewer lines of code, but especially as you get beyond trivial changes, it gets vastly more elaborate and complex to approach and reason about. When patching, you have to consider how to migrate code from one version to the next. When cutting over, there's no need for any such consideration - write from a blank slate or from copy/pasting what you previously had - your pick, but so long as the input/output types are consistent to the workflow, migration is seamless, transparent and trivial. It's even clearer "which version of the workflow triggers which logic" because outside of a replay scenario, it's always precisely the latest version you're looking at - no branching necessary because it always converges to the most recent version.

@mikeee
Copy link
Member

mikeee commented Oct 28, 2025

Echoing some of the above sentiments - I don't think patching/branching is maintainable in the long term. If codesprawl/codespaghetti is a concern, patching actually introduces more logic into a workflow which should be immutable and arguably makes the codebase much more unmaintainable due to its lack of separation of concerns from hidden complexities.

Distinctly registered workflows offer a clear separation of concerns and thus infinitely more maintainable beyond the short term - the only real issue of whether there are legacy long-running workflows in progress. With appropriate migrations and tooling this appears to be much more manageable.

@acroca
Copy link
Member Author

acroca commented Oct 28, 2025

I think there’s value in supporting both whole-workflow versioning and branch-based versioning. The two models can coexist, each fits different scenarios.

Today, users can already create a new workflow to replace an old one, effectively treating it as a whole new workflow. It’s workable but far from convenient: there’s no way to migrate long-running workflows, and ContinueAsNew loops can’t carry over. So it only helps in limited cases.

What’s missing right now is any way to safely introduce changes inside an existing workflow without breaking determinism, that's why I think this proposal is valuable and should be considered.

@yaron2
Copy link
Member

yaron2 commented Oct 28, 2025

I think there’s value in supporting both whole-workflow versioning and branch-based versioning. The two models can coexist, each fits different scenarios.

Today, users can already create a new workflow to replace an old one, effectively treating it as a whole new workflow. It’s workable but far from convenient: there’s no way to migrate long-running workflows, and ContinueAsNew loops can’t carry over. So it only helps in limited cases.

What’s missing right now is any way to safely introduce changes inside an existing workflow without breaking determinism, that's why I think this proposal is valuable and should be considered.

Indeed, that's the main point. Without patching, developers who introduce bugs into workflows face potentially catastrophic implications because they are unable to introduce a fix. With strictly name typed conventions, developers are required to sit and wait until the workflow instances have failed.

@yaron2
Copy link
Member

yaron2 commented Oct 28, 2025

@dapr/maintainers-dapr @dapr/maintainers-java-sdk @dapr/maintainers-js-sdk @dapr/maintainers-python-sdk @dapr/maintainers-go-sdk @dapr/maintainers-rust-sdk @dapr/maintainers-dotnet-sdk. @dapr/maintainers-php-sdk

Voting is now open for this proposal and #82. If you're a maintainer in any of the groups above, please comment with +1 binding in the proposal you support.

@yaron2 yaron2 changed the title Added proposal for workflow branch versions [Vote Open] Added proposal for workflow branch versions Oct 28, 2025
@yaron2
Copy link
Member

yaron2 commented Oct 28, 2025

+1 binding

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 28, 2025

I'll toss this additional thread into the mix for consideration - someone drafted a comparison between each of the different workflow versioning strategies in Temporal's community board. My version of workflows is more consistent with Worker Versioning and Workflow Name-based Versioning as I described in this comment.

I think the advantages and disadvantages, acknowledging the differences between Dapr and Temporal are accurate:

Worker Versioning/Name-Based Advantages

  • Changes are, by default, isolated from each other in a way that makes mistakes unlikely
  • Flexible - you can handle both compatible and incompatible changes
  • Conceptually simple

Worker Versioning/Name-Based Disadvantages

  • Operational burden to manage shift from older to newer versions (though I think this is trivially addressable with CLI tooling)
  • Code duplication

Patch/GetVersion API Disadvantages

  • Conceptually complex
  • Cognitive burden of needing to understand how both the "old" and "new" code paths work
  • If used indefinitely on the same workflow definition, can lead to a mess of branching

All the advantages cited in the Path/GetVersion in this thread are applicable to both our proposals. As they don't distinguish between our two ideas, I don't list them here.

Now, they certainly call out the feasibility of using patching with worker versioning to make changes and I don't want to rule that out, but I would strongly urge reconsideration of this patching proposal to rethink designing as a complimentary feature to my approach rather than the full replacement it is today.

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 28, 2025

@yaron2

Summarizing our conversation on the release call today, I think there's ample room for both workflow versioning proposals to exist side-by-side, but I would advocate for this one to be modified towards simplifying the patching concept. I think the introduction of the versioned branches for patches adds significant complexity, as only illustrated by the half hour we spent on the call discerning where the versions themselves come from.

For reasons of analogy, I view my proposal as the "hard fork" from one version to the other. By creating the new workflow, you can fundamentally change the whole thing or make minute changes here or there. but Dapr only moves forward between versions and never backwards, and state itself is not migrated, so it's just a routing problem concerning whether a replay is being run or not.

In the documentation, I'd like to be able to clearly delineate where one uses one model or the other as they both clearly have advantages in different situations, so I'd like to propose rethinking this proposal to simplify the patch operation. For those situations where the "hard fork" isn't a good fit (e.g. bug in current production code) and as a last-resort you need to apply a minimalistic patch to your workflow, I would like to see this introduced more as a "here's a patch - here's what used to run in the old code and here's what should run in the new code". But it's a bandage that stops the bleeding until users can migrate to the new hard fork.

By having all the complexities around "versioning" the branches, I think it adds unnecessary complexity to what can be a useful stop-gap fix for large ongoing problems and the decision tree becomes trivial:

  • If this is a logical bug that can lead to the workflow failing, patch it (if patch exists with "{name}", run new code, else run old code).
  • Otherwise, fork it and make whatever changes you'd like irrespective of what you did in previous workflow versions.

This would eliminate the need for the runtime to be aware of the dictionary of possible versions of branches, it would eliminate the opportunity for switch statements (as everything becomes a simple "if/then" patch) and it could be trivially communicated to those developers that need such capabilities where you'd use one or the other and how they're interoperable with one another.

With my proposed simplification changes to this proposal, it would get my +1 binding vote in addition to my proposal for the hard fork.

@WhitWaldo
Copy link
Contributor

@yaron2 Following up with a few more thoughts:

Without patching, developers who introduce bugs into workflows face potentially catastrophic implications because they are unable to introduce a fix.

Completely agree that a type swap doesn't help here. But neither does a patch, right? The patching path must also be deterministic meaning that the versions taken are persisted during replay meaning that logic cannot change between failed runs.

With the sole exception (pardon the pun) of an exception throwing on a workflow such that the history hasn't yet been saved on the event source log beyond that point (meaning you're not violating deterministic guarantees), and you're fixing the workflow so it can replay farther into its execution, why do you need patching in this case? This would be the singular scenario in which a developer could safely deploy an in-place update to the type with the fix, right?

Does that limit the patching proposal just to smaller changes for which one doesn't want to deploy a whole new type?

@yaron2
Copy link
Member

yaron2 commented Oct 28, 2025

With the type/name based convention method you can't update the code for workflow definitions and keep in-flight ones running. It's also either all or nothing which is extremely limiting, especially for safe deterministic evolutions over time of workflows. The branching/patching approach lets you mark specific decision points in the workflow definition so you can upgrade one part without touching others. This IMO is critical for very long running processes (Data pipelines, AI agents etc.)

WhitWaldo added a commit to WhitWaldo/dapr-proposals that referenced this pull request Oct 29, 2025
…r#92 and how they're uniquely addressed in this approach

Signed-off-by: Whit Waldo <[email protected]>
@WhitWaldo
Copy link
Contributor

@yaron2 (and others)

I respectfully submit #94 as a superseding proposal to both this and #82

@salaboy
Copy link

salaboy commented Oct 29, 2025

+1 binding

@salaboy
Copy link

salaboy commented Oct 29, 2025

From my point of view, both branching and the whole workflow definition version approach make sense and we should support both. To justify my vote, I can say that branching makes sense if you look into it as a dealing with feature flags. It can be argued that feature flags makes the code more complex, but they offer a tool for teams to keep improving their code bases while provide control on how you move from one version to the next.

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Oct 29, 2025

@salaboy I'd certainly appreciate it if you have thoughts on #94 where I have attempted to combine what I think are the best parts of #82 and #92. There, I approach using patches far more like binary feature flags than here.

@cgillum
Copy link

cgillum commented Oct 30, 2025

Adding my $0.02 to this conversation.

I think there’s value in supporting both whole-workflow versioning and branch-based versioning. The two models can coexist, each fits different scenarios.

Agreed with this. I haven't looked at #94 yet, but supporting both is what I would recommend.

What’s missing right now is any way to safely introduce changes inside an existing workflow without breaking determinism, that's why I think this proposal is valuable and should be considered.

Agreed, and I think this is the number one reason to consider supporting a "patch" approach.

Completely agree that a type swap doesn't help here. But neither does a patch, right? The patching path must also be deterministic meaning that the versions taken are persisted during replay meaning that logic cannot change between failed runs.

In order for a "patch" strategy to work, you must have version metadata persisted in each workflow instance. I assume this is the case (I haven't read through the whole proposal) but correct me if I'm wrong. But assuming each workflow has immutable version metadata that can be queried during execution (e.g. context.Version) then you can use this to effectively "upgrade" old workflows into the new logic. This is super important for bug fixes, etc. where you can't afford data loss.

EDIT: I read a bit more into this proposal and it seems I misunderstood it. I assumed each workflow would have just one version, but the actual design proposes branch versions, which seems a bit more complex. But I think the main problem to solve is ensuring that existing instances can have bug fixes applied, which I assume this proposal enables.

@WhitWaldo
Copy link
Contributor

@cgillum Welcome!

Yes, #94 is my proposal to blend the two concepts together into a more cohesive concept than the two standalone ideas (in #82 and here in #92).

The versioning metadata for all three of these proposals is stored in the orchestrator request and response messages and merely parroted back by the runtime based on if it's performing a rerun (to ensure a consistent path is taken) or not (to take the most up-to-date path available).

Yes, #92 proposes versioning the patches themselves. In #94, I remove this complexity in favor of an approach where type versioning is generally promoted as the "go-to" recommended strategy. Where that's too weighty for a bug fix, or a typo correction or what-have-you, patches would be promoted as the "advanced" use case. We'd have to heavily document what edge cases we're anticipating, how to conceptualize how it works and why there's no magic exception we're providing to the deterministic guarantees of the workflow. It'd be positioned as not necessarily being for everyone, but certainly a viable and reasonable tool if you're willing to learn more about how and why it works.

Dapr's advantage then would lie in providing an easy off-ramp out of the advanced use-case and back to a standard typed versioning, then rinse and repeat as necessary.

yaron2 pushed a commit that referenced this pull request Nov 24, 2025
* Added workflow versioning proposal combing named workflows with patching

Signed-off-by: Whit Waldo <[email protected]>

* Added a small FAQ to the end to address some of the concerns from #92 and how they're uniquely addressed in this approach

Signed-off-by: Whit Waldo <[email protected]>

* Added suggested protos changes

Signed-off-by: Whit Waldo <[email protected]>

* Documented suggested .NET SDK implmentation

Signed-off-by: Whit Waldo <[email protected]>

* Split out some of the C#-specific implementation details from the high-level presentation and moved into the C# section towards the bottom.

Signed-off-by: Whit Waldo <[email protected]>

* Added a note about why I'm including in-depth SDK implementation details where I wouldn't normally include them in dapr/proposals

Signed-off-by: Whit Waldo <[email protected]>

* Modified to replace the version string concept with a new prototype, updated other prototypes accordingly and cleaned up how it's described in the proposal. Replaces references to "version string" accordingly and removed patch name constraints.

Signed-off-by: Whit Waldo <[email protected]>

* Minor formatting changes

Signed-off-by: Whit Waldo <[email protected]>

* Added section documenting how this would be implemented in the JavaScript SDK

Signed-off-by: Whit Waldo <[email protected]>

* Language tweak to accommodate review concern

Signed-off-by: Whit Waldo <[email protected]>

* Updated protos to reflect putting the versioning state in the workflow events instead of making it part of the request/response to the runtime.

Signed-off-by: Whit Waldo <[email protected]>

* Introduced several of the changes discussed in the GitHub discussion

Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Albert Callarisa <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Cassie Coyle <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Cassie Coyle <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Cassie Coyle <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Albert Callarisa <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

* Updated optional patch information type in protos

Signed-off-by: Whit Waldo <[email protected]>

* Modified to use `OrchestratorStartedEvent` instead of `OrchestratorCompletedEvent`

Signed-off-by: Whit Waldo <[email protected]>

* Added messaging to clarify how to evaluate `IsPatched` and to handle mismatching values across `OchestratorStartedEvent` messages

Signed-off-by: Whit Waldo <[email protected]>

* Update 20251028-BRS-workflow-versioning.md

Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Whit Waldo <[email protected]>

---------

Signed-off-by: Whit Waldo <[email protected]>
Co-authored-by: Albert Callarisa <[email protected]>
Co-authored-by: Cassie Coyle <[email protected]>
Co-authored-by: Josh van Leeuwen <[email protected]>
@yaron2
Copy link
Member

yaron2 commented Nov 24, 2025

Closing as #94 is merged

@yaron2 yaron2 closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants