Skip to content

Conversation

@Bueller87
Copy link

@Bueller87 Bueller87 commented Nov 13, 2025

Summary

Adds support for deploying Cadence with PostgreSQL + Elasticsearch + Kafka for advanced visibility. This configuration enables powerful workflow search and filtering capabilities through Elasticsearch while using PostgreSQL as the primary persistence layer.

New Example Configuration: values.postgres-es7.yaml

This branch introduces a complete, production-ready example for deploying Cadence with:

  • PostgreSQL - Primary persistence for workflow state and history
  • Elasticsearch 7 - Advanced visibility with rich search capabilities
  • Kafka - Asynchronous visibility event processing for better performance
  • Kafka topic auto-provisioning - Eliminates manual setup and race conditions

Testing

  • Clean deployment with all pods healthy (no scheduling failures)
  • Kafka topics auto-provisioned with correct replication factors
  • Workflows visible in Cadence Web UI immediately after execution
  • No visibility-related errors in service logs
  • Repeatable deployments (tested multiple tear-down/redeploy cycles)

Caveats

  1. Single-broker Kafka: Configured for development/testing with replicationFactor: 1. For production, increase Kafka broker count and replication factors to 3+.
  2. Resource requirements: Deploys 9+ pods (Cadence services + PostgreSQL + Elasticsearch + Kafka). Ensure your cluster has adequate resources.
  3. Fresh deployments recommended: If upgrading from a previous configuration, a clean redeploy is recommended to ensure Kafka topics are properly provisioned.

Checklist

  • [ x] Chart.yaml: Chart version bumped
  • [ x] values.yaml: global.image.tag (should match with Chart.yaml appVersion)
  • [ x] Dependencies in Chart.yaml (check for updates with helm dependency update, aim for latest major-1 stable)
  • [ x] DCO signed
  • [ x] Test deployment locally
  • [ x] Run helm-docs to update documentation

shijiesheng and others added 11 commits November 14, 2025 10:52
)

Description of this PR
schema upgrade job should fail open on db creation to handle already exist errors
add example of cadence with postgres
Local Test

namespace and release name is still fixed unfortunately.

cd charts/cadence
helm install cadence-release . -n cadence-postgres --values examples/values.postgres.yaml

Signed-off-by: “Kevin” <[email protected]>
…dence-workflow#57)

* docs: Convert slack links to CNCF, add link to contributing guide

Signed-off-by: [email protected] <[email protected]>
Signed-off-by: “Kevin” <[email protected]>
Signed-off-by: CosminL-DEV <[email protected]>
Signed-off-by: “Kevin” <[email protected]>
Signed-off-by: “Kevin” <[email protected]>
When using Elasticsearch for visibility, the PostgreSQL visibility
database is not needed. Removed visibilityDbname and initdb script
that were creating an unused database.

Also updated dynamic config to use es instead of es-visibility
to match Cadence's internal visibility store naming convention.

Signed-off-by: “Kevin” <[email protected]>
The schema job should only create the SQL/Cassandra visibility
database when Elasticsearch is disabled. When using Elasticsearch
for advanced visibility, the SQL visibility database is not needed.

Changed condition from checking if visibilityDbname is set to
checking if ES_ENABLED is false, allowing ES-only deployments
to skip unnecessary database creation.

Signed-off-by: “Kevin” <[email protected]>
Changed visibilityIndex from 'cadence-visibility' to 'cadence-visibility-v1'
to match the Elasticsearch template pattern 'cadence-visibility-*'. This
ensures the template with proper keyword field mappings is applied to the
index, fixing visibility queries that require exact term matching on fields
like DomainID.

Signed-off-by: “Kevin” <[email protected]>
Set visibilityStore to empty string to prevent dual-write mode when using
Elasticsearch-only advanced visibility. This fixes 'Operation is not supported'
errors when Cadence tries to write to a non-existent SQL visibility database.

Signed-off-by: “Kevin” <[email protected]>
The visibility store name in dynamic config should be 'es' (a special keyword
for advanced visibility), not the datastore name 'es-visibility'. This matches
the Cadence upstream configuration pattern where advancedVisibilityStore is
'es-visibility' but system.writeVisibilityStoreName is 'es'.

Signed-off-by: “Kevin” <[email protected]>
Removed advancedVisibilityStore from persistence config to enable Kafka-based
async visibility mode. When advancedVisibilityStore is set, Cadence tries to
use dual-write mode (SQL + ES) which fails without a SQL visibility database.
With Kafka mode, the history service writes to Kafka, and the worker service
reads from Kafka and writes to ES.

Signed-off-by: “Kevin” <[email protected]>
Must explicitly override the default value of 'es-visibility' with an empty
string to disable direct ES writes and enable Kafka-based async visibility.

Signed-off-by: “Kevin” <[email protected]>
Cadence requires at least one of visibilityStore or advancedVisibilityStore to
be set. We keep advancedVisibilityStore='es-visibility' to define the datastore,
and use dynamic config system.writeVisibilityStoreName='es' to enable Kafka-based
async visibility instead of direct ES writes.

Signed-off-by: “Kevin” <[email protected]>
Added Kafka provisioning configuration to pre-create required topics:
- __consumer_offsets: Internal Kafka topic for consumer offset tracking
- cadence-visibility: Main visibility events topic
- cadence-visibility-dlq: Dead letter queue for failed visibility events

This prevents race conditions where topics get auto-created with incorrect
replication factors before the broker config is fully loaded. All topics are
created with replication-factor: 1 to match our single-broker setup.

Signed-off-by: “Kevin” <[email protected]>
Signed-off-by: “Kevin” <[email protected]>
Bueller87 and others added 4 commits November 20, 2025 13:16
Reverts the visibility store conditional logic to include the elasticsearch
enabled check. This prevents creating an unnecessary SQL visibility datastore
when users enable Elasticsearch without explicitly disabling visibilityStore.

Without this check, the default visibilityStore value from values.yaml would
be used even when Elasticsearch is enabled, creating both SQL and ES visibility
stores unintentionally.

Signed-off-by: “Kevin” <[email protected]>
Added kafka.enabled: false to values.postgres.yaml to prevent unnecessary
Kafka deployment when using SQL-only visibility. Kafka is only needed for
async visibility processing with Elasticsearch.

Signed-off-by: “Kevin” <[email protected]>
{{- if and .Values.config.persistence.visibilityStore (not .Values.config.persistence.elasticsearch.enabled) }}
# Visibility datastore
{{ if .Values.config.persistence.visibilityStore -}}
{{ .Values.config.persistence.visibilityStore | default "visibility" }}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this default visibility

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using Elasticsearch with Kafka, visibility events flow through Kafka to ES. The SQL visibility datastore is not used. (current config)

In fact I found when BOTH visibilityStore and advancedVisibilityStore are configured, Cadence enters dual-write mode where it tries to write visibility events to BOTH datastores.

I encountered multiple errors when I did this, e.g.
{"level":"warn","msg":"requested visibility mode is not available, falling back to db"}
{"level":"error","msg":"Error writing to visibility: Operation is not supported"}

Through debugging I saw this happening:
First, Cadence tried to write to BOTH SQL and ES.
But, the SQL cadence_visibility database didn't exist (schema job skipped it because ES was enabled).
Writes to the SQL visibility failed with "Operation is not supported",
and visibility was broken

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure there is no change needed here. I've reverted your change and the plan looks exactly the same. Please revert it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants