Skip to content

misc fixes#280

Merged
akoumpa merged 5 commits intomainfrom
akoumparouli/misc_fixes
Jul 2, 2025
Merged

misc fixes#280
akoumpa merged 5 commits intomainfrom
akoumparouli/misc_fixes

Conversation

@akoumpa
Copy link
Copy Markdown
Contributor

@akoumpa akoumpa commented Jul 1, 2025

Changes:

The main issue I faced was not getting the sbatch script written, i used something like the following code

    executor = run.SlurmExecutor(**slurm_config, tunnel=run.LocalTunnel(job_dir=job_dir))
    run_name = 'exp_name'
    with run.Experiment('exp', goodbye_message=False) as exp:
        exp.add(
            run.Script(
                path=script_path,
                args=[
                    "--config",
                    config_file,
                ],
                env=container_env,
                entrypoint="python",
            ),
            executor=executor,
            name=run_name[:37],  # DGX-C run name length limit
            tail_logs=False,
        )
        exp.run(sequential=True, detach=True, tail_logs=False)

akoumpa added 3 commits July 1, 2025 01:00
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Syntax(
self.GOODBYE_MESSAGE_BASH.format(
exp_id=self._id,
tasks=list(map(lambda job: job.id, self.jobs)),

Check warning

Code scanning / CodeQL

'break' or 'return' statement in finally Warning

'return' in a finally block will swallow any exceptions raised.

Copilot Autofix

AI 9 months ago

To fix the issue, the return statement should be moved outside the finally block. This ensures that exceptions raised during the execution of the try block or the finally block are not suppressed. The cleanup operations in the finally block will still execute, but the exception will propagate correctly.

The best approach is to refactor the __exit__ method to store the return value in a variable and return it after the finally block. This preserves the intended functionality while ensuring proper exception handling.


Suggested changeset 1
nemo_run/run/experiment.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_run/run/experiment.py b/nemo_run/run/experiment.py
--- a/nemo_run/run/experiment.py
+++ b/nemo_run/run/experiment.py
@@ -1136,2 +1136,3 @@
     def __exit__(self, exc_type, exc_value, tb):
+        return_value = None
         try:
@@ -1146,5 +1147,5 @@
                     self.status()
-                return
+                return_value = None
 
-            if self._launched:
+            elif self._launched:
                 if hasattr(self, "_direct") and self._direct:
@@ -1154,5 +1155,5 @@
                     self.status()
-                    return
+                    return_value = None
 
-                if hasattr(self, "_waited") and self._waited:
+                elif hasattr(self, "_waited") and self._waited:
                     self.console.rule(
@@ -1161,10 +1162,10 @@
                     self.status()
-                    return
-
-                self.console.rule(
-                    f"[bold magenta]Waiting for Experiment {self._id} to finish",
-                )
-                self.status()
+                    return_value = None
 
-                self._wait_for_jobs(jobs=self.jobs)
+                else:
+                    self.console.rule(
+                        f"[bold magenta]Waiting for Experiment {self._id} to finish",
+                    )
+                    self.status()
+                    self._wait_for_jobs(jobs=self.jobs)
         finally:
@@ -1197,2 +1198,3 @@
                     )
+        return return_value
 
EOF
@@ -1136,2 +1136,3 @@
def __exit__(self, exc_type, exc_value, tb):
return_value = None
try:
@@ -1146,5 +1147,5 @@
self.status()
return
return_value = None

if self._launched:
elif self._launched:
if hasattr(self, "_direct") and self._direct:
@@ -1154,5 +1155,5 @@
self.status()
return
return_value = None

if hasattr(self, "_waited") and self._waited:
elif hasattr(self, "_waited") and self._waited:
self.console.rule(
@@ -1161,10 +1162,10 @@
self.status()
return

self.console.rule(
f"[bold magenta]Waiting for Experiment {self._id} to finish",
)
self.status()
return_value = None

self._wait_for_jobs(jobs=self.jobs)
else:
self.console.rule(
f"[bold magenta]Waiting for Experiment {self._id} to finish",
)
self.status()
self._wait_for_jobs(jobs=self.jobs)
finally:
@@ -1197,2 +1198,3 @@
)
return return_value

Copilot is powered by AI and may make mistakes. Always verify output.
@akoumpa akoumpa changed the title Akoumparouli/misc fixes misc fixes Jul 1, 2025
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
@akoumpa akoumpa merged commit 16895e0 into main Jul 2, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants