Skip to content

dvc stage: params section with variable #10528

@ermolaev94

Description

@ermolaev94

Bug Report

Description

I have the following dvc pipeline with the following stage:

stages:
  process:
    foreach: ${datasets}
    do:
      cmd: >-
        python ${GEN_SCRIPTS_ROOT}/process_ds.py
        --ds-root ${DS_ROOT}/${item}/h5-corrected/
        --out ${PPL_PTH}/processed/${item}
        --config ${PPL_PTH}/config.yaml
        --num-workers 4
        --buffer-size 4
        --force
      deps:
        - ${GEN_SCRIPTS_ROOT}/process_ds.py
        - ${DS_ROOT}/${item}/h5-corrected/train/
        - ${DS_ROOT}/${item}/h5-corrected/val/
        - ${DS_ROOT}/${item}/h5-corrected/test/
      params:
        - ${PPL_PTH}/config.yaml:
            - processing
      outs:
        - ${PPL_PTH}/processed/${item}/train/
        - ${PPL_PTH}/processed/${item}/val/
        - ${PPL_PTH}/processed/${item}/test/
        - ${PPL_PTH}/processed/${item}/log.txt
      wdir: ${WDIR}

Its compiled version for one of the datasets is:

schema: '2.0'
stages:
  process@fractures_0124_seg:
    cmd: python ds_gen//process_ds.py --ds-root data/full_datasets//fractures_0124_seg/h5-corrected/
      --out pipelines/02_seg//processed/fractures_0124_seg --config pipelines/02_seg//config.yaml
      --num-workers 4 --buffer-size 4 --force
    deps:
    - path: data/full_datasets//fractures_0124_seg/h5-corrected/test/
      hash: md5
      md5: 7ceeec622eff202ebfd336857c49f6c8.dir
      size: 1032293048
      nfiles: 4
    - path: data/full_datasets//fractures_0124_seg/h5-corrected/train/
      hash: md5
      md5: 8d889e2240ac8522df681d291d4fe9b1.dir
      size: 9504569880
      nfiles: 4
    - path: data/full_datasets//fractures_0124_seg/h5-corrected/val/
      hash: md5
      md5: f2dddba6856f9f9b4f6ac07b3c4c3052.dir
      size: 925129464
      nfiles: 4
    - path: ds_gen//process_ds.py
      hash: md5
      md5: 243575ee6a8718300cb33c54b7f8ddff
      size: 1967
    params:
      pipelines/02_seg/config.yaml:
        processing:
          Resize:
            voxel_size:
              k: 2
          SpatialResize:
            shape:
            - 160
            - 160
            - -1
    outs:
    - path: pipelines/02_seg//processed/fractures_0124_seg/log.txt
      hash: md5
      md5: 3ecea9ba483e94e36cd8ac96b5d6ae89
      size: 16182
    - path: pipelines/02_seg//processed/fractures_0124_seg/test/
      hash: md5
      md5: 2ec2b5f1e794abe96e6f2c49f0dc3785.dir
      size: 126610024
      nfiles: 4
    - path: pipelines/02_seg//processed/fractures_0124_seg/train/
      hash: md5
      md5: ca53028b1b457add2ba51edd9ad4174e.dir
      size: 873577784
      nfiles: 4
    - path: pipelines/02_seg//processed/fractures_0124_seg/val/
      hash: md5
      md5: 721589477a653f1803a83010f379dd90.dir
      size: 104532792
      nfiles: 4

You can see here, that variables were correctly replaced by real values. But there is a problem:

$ dvc status dvc.yaml:process
process@fractures_0124_seg:                                                                                                                                                                                                                             
        changed deps:
                new:                config.yaml

and dvc commit --force doesn't help:

$ dvc commit dvc.yaml:process --force
(venv) ermolaev@df783b0a927d:~/projects/radml/cvl-cvisionrad-ml/ribs/pipelines/02_seg$ dvc status dvc.yaml:process                                                                                                                                      
process@fractures_0124_seg:                                                                                                                                                                                                                             
        changed deps:
                new:                config.yaml

But if I replace

      params:
        - ${PPL_PTH}/config.yaml:
            - processing

with the

      params:
        - pipelines/02_seg/config.yaml:
            - processing

Everything is ok. Note that there is no problem with variables in deps section.

Reproduce

Just create synth pipeline with the template variable in path to some params file.

Expected

I think that DVC should build & compare paths with the same logic for deps and params sections. It looks like DVC doesn't understand that variable in YAML is the same that dvc.lock has.

Environment information

Ubuntu

Output of dvc doctor:

$ dvc doctor
DVC version: 3.53.1 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.6
Supports:
        gdrive (pydrive2 = 1.19.0),
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
        Global: /home/ermolaev/.config/dvc
        System: /etc/xdg/dvc
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: gdrive, gdrive, gdrive, s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/7205a6ce3131e59a2db7211a94dd5faa

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: pipelinesRelated to the pipelines featurequestionI have a question?

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions