Skip to content

Conversation

@Allda
Copy link
Collaborator

@Allda Allda commented Oct 17, 2025

A new backup controller orchestrates a backup process for workspace PVC. A new configuration option is added to DevWorkspaceOperatorConfig that enables running regular cronjob that is responsible for backup mechanism. The job executes following steps:

  • Find a workspaces
  • Finds out that workspace has been recently stopped
  • Detect a workspace PVC
  • Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running a buildah inside the container and it will be delivered as a separate feature.

Issue: eclipse-che/che#23570

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

The feature has been tested locally and using integration tests. Following configuration should be added to the config to enable this feature:

config:                                                                         
  workspace:                                                                    
    backupCronJob:                                                              
      enable: true                                                              
      registry: kind-registry:5000/backup                                       
      schedule: '* * * * *'

After a config is added, stop any workspace and wait till a backup job is created.

$ kubectl get jobs
devworkspace-backup-2l679   Running    0/1           138m       138m
devworkspace-backup-2xvgl   Running    0/1           139m       139m
devworkspace-backup-45vxb   Running    0/1           145m       145m

The job creates a backup and push image to registry

+ set -e
+ exec /workspace-recovery.sh --backup
+ set -e
+ for i in "$@"
+ case $i in
+ backup
+ BACKUP_IMAGE=kind-registry:5000/backup/backup-default-common-pvc-test:latest
++ buildah from scratch
+ NEW_IMAGE=working-container
+ buildah copy working-container /workspace/workspacedfd9f53065ea452c//projects /
f099c09f924cf051a01d78cd34ca87a4c161d7c217df5ac627e90e66926fbe9f
+ buildah config --label DEVWORKSPACE=common-pvc-test working-container
+ buildah config --label NAMESPACE=default working-container
+ buildah commit working-container kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
+ buildah umount working-container
+ buildah push --tls-verify=false kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
stream closed: EOF for default/devworkspace-backup-zjzk5-82psq (backup-workspace)

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@openshift-ci
Copy link

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Allda
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Allda Allda force-pushed the 23570 branch 2 times, most recently from 42dd45c to dffd7e6 Compare October 17, 2025 11:06
@rohanKanojia
Copy link
Member

@Allda : Really appreciate you taking the time to contribute this in such a short time. 🎉

Could you please also fill out the “Is it tested? How?” section in the PR template? It’ll help reviewers and future contributors verify the change more easily.

Thanks again for your effort! 🙌

@rohanKanojia
Copy link
Member

I tested this PR and it seems to work.

  1. Created DevWorkspaceOperatorConfig with this BackupCronJobConfig (backup every 3 minutes)
config:
  workspace:
    backupCronJob:
      enable: true
      schedule: "*/3 * * * *"
  1. Created a DevWorkspace and wait for it to get running
  2. Stopped workspace
  3. Controller detected stopped workspace and started creating jobs for backups:
NAME               STATUS    COMPLETIONS   DURATION   AGE
backup-job-8tnsp   Running   0/1                      0s
backup-job-8tnsp   Running   0/1           0s         0s
backup-job-8tnsp   Running   0/1           16s        16s
backup-job-8tnsp   Running   0/1           17s        17s
backup-job-8tnsp   Running   0/1           18s        18s
backup-job-8tnsp   Complete   1/1           18s        18s
backup-job-kc8rm   Running    0/1                      0s
backup-job-kc8rm   Running    0/1           0s         0s
backup-job-kc8rm   Running    0/1           6s         6s
backup-job-kc8rm   Running    0/1           7s         7s
backup-job-kc8rm   Running    0/1           8s         8s
backup-job-kc8rm   Complete   1/1           8s         8s

@Allda Allda force-pushed the 23570 branch 3 times, most recently from 0bc74b1 to 8427ba5 Compare October 29, 2025 10:24
@Allda
Copy link
Collaborator Author

Allda commented Oct 29, 2025

/retest

// A registry where backup images are stored. Images are stored
// in {registry}/backup-${DEVWORKSPACE_NAMESPACE}-${DEVWORKSPACE_NAME}
// +kubebuilder:validation:Optional
Registry string `json:"registry,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if registry is not public and requires authentication and/or certificate to access.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently working on the second phase of this feature, where I cover all the use cases, including authentication. I wanted to submit a PR as early as possible to get initial feedback. The auth part should be ready soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry authorization was added to the controller. You can check the latest commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I think it make sense to move Registry and RegistryAuthSecret to a dedicated structure

return err
}

job := &batchv1.Job{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe we need a dedicated SA for this job and delegate only required permissions.
@dkwon17 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense to me, could you please take a look @Allda ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkwon17 Do you want to create a brand new SA for each namespace or for each job? Or is there any existing SA that I should use here? Also, what permissions should I delegate to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what permissions should I delegate to it?

Maybe we don't need any.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, I added a separate SA for the workspace namespace and use it for the Job definition.

backUpConfig := dwOperatorConfig.Config.Workspace.BackupCronJob

// Find a PVC with the name "claim-devworkspace" or based on the name from the operator config
pvcName := "claim-devworkspace"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PVC name will not always be claim-devworkspace,

There are two main types of storage strategies for DevWorkspaces, common (or, per-user), and per-workspace

Here are some more details about the storage strategies: https://eclipse.dev/che/docs/stable/administration-guide/configuring-the-storage-strategy/

For common, the default PVC name is claim-devworkspace, and for per-workspace, the PVC name is storage-<devworkspaceid>

I suggest using for example GetProvisioner to help determine the storage policy,

and to determine the PVC name, the code is currently determining that like so:

usingAlternatePVC, pvcName, err := checkForAlternatePVC(workspace.Namespace, clusterAPI)
if err != nil {
return err
}
if pvcName == "" {
pvcName = workspace.Config.Workspace.PVCName
}

perWorkspacePVC, err := syncPerWorkspacePVC(workspace, clusterAPI)
if err != nil {
return err
}
pvcName := perWorkspacePVC.Name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a dynamic PVC logic that selects the right PVC name based on the used type. Please check again.

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 64.13043% with 165 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.30%. Comparing base (d92e750) to head (2679783).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...trollers/backupcronjob/backupcronjob_controller.go 71.95% 87 Missing and 19 partials ⚠️
apis/controller/v1alpha1/zz_generated.deepcopy.go 0.00% 43 Missing ⚠️
main.go 0.00% 9 Missing ⚠️
internal/images/image.go 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1530      +/-   ##
==========================================
+ Coverage   34.09%   35.30%   +1.21%     
==========================================
  Files         160      161       +1     
  Lines       13348    13802     +454     
==========================================
+ Hits         4551     4873     +322     
- Misses       8487     8599     +112     
- Partials      310      330      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Allda great job!
discussed the overall PR with @dkwon17 and I believe we should target it to be merged in the DWO 0.39.0 version

Schedule string `json:"schedule,omitempty"`
}

type BackupCronJobConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkwon17
Does it make sense to create a completely new API for backup and don't use devworkspaceoperatorconfig?

}

func (r *BackupCronJobReconciler) copySecret(workspace *dw.DevWorkspace, ctx context.Context, sourceSecret *corev1.Secret, logger logr.Logger) (namespaceSecret *corev1.Secret, err error) {
log := logger.WithName("copySecret")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use a common logger name for a backup component, so it can be easily identified in the DevWorkspaceController logs.

tolusha and others added 14 commits November 5, 2025 12:51
A new backup controller orchestrates a backup process for workspace PVC.
A new configuration option is added to DevWorkspaceOperatorConfig that
enables running regular cronjob that is responsible for backup
mechanism. The job executes following steps:
- Find a workspaces
- Finds out that workspace has been recently stopped
- Detect a workspace PVC
- Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running
a buildah inside the container and it will be delivered as a separate
feature.

Issue: eclipse-che/che#23570

Signed-off-by: Ales Raszka <[email protected]>
A backup of workspace is done using Buildah and storing a content of the
workspace PVC into a container image. The image is later stored in a
registry and can be used to recover data.

A prototype script was updated and stored under project-backup
directory and is build alongside the controller.

The backup job calls the script and execute following steps:
- mount a volume with workspace data
- build container image using buildah
- push image to registry configured by the operator admin

Signed-off-by: Ales Raszka <[email protected]>
A new sub-object was added to the operator config that reflect a current
status of the backup controller and stores a last time the backup was
executed. This value is used to determine whether a backup of the
workspace is needed or if it already has been executed.

Signed-off-by: Ales Raszka <[email protected]>
A backup job use a PVC name from a default value or from the config if
user configured custom name.

Signed-off-by: Ales Raszka <[email protected]>
The backup job can now push to registries which requires auth token. The
token is provided as a secret in operator namespace and added to the
operator config.

Signed-off-by: Ales Raszka <[email protected]>
A backup job now determines the name of pvc based on used storage type.
It distinguish between different storage types (common and per-workspace) and
mount the volume dynamically.

Signed-off-by: Ales Raszka <[email protected]>
It turns out the capabilities from the prototype are not needed.

Signed-off-by: Ales Raszka <[email protected]>
A new SA is created for the backup jobs to limit the permission to just
what is necessary.

Signed-off-by: Ales Raszka <[email protected]>
@Allda
Copy link
Collaborator Author

Allda commented Nov 5, 2025

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants