Add user defined YAML tools #19434

mvdbeek · 2025-01-21T15:00:36Z

This work enhances the existing YAML tool format to bring it close to feature parity with XML tools, and strips inherently unsafe elements, which should eventually allow a subset of trusted users to bring their own tools.

I'll follow up with a more extensive description, but here's a screenshot of the embedded tool editor.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton · 2025-01-23T15:12:26Z

lib/galaxy/tool_util/models.py

+    container: str
+
+
+class AdminToolSource(ToolSourceBase):


Totally fine just dropping the Admin version here if it is easier. The point was to establish some abstractions that could be used by the feature you're implementing and CWL - you've gone a lot farther with the user tools so I'm happy to drop any old admin-only, insecure functionality if it would help. Either way though is fine.

jmchilton · 2025-01-23T15:18:16Z

lib/galaxy/webapps/galaxy/api/dynamic_tools.py

+    # and is scoped to to individual user and never adds to global toolbox
+    dynamic_tools_manager: DynamicToolManager = depends(DynamicToolManager)
+
+    @router.get("/api/unprivileged_tools")


I feel like I would prefer a name like 'user_tools' but if it is a big change - feel free to ignore. Also sorry I'm reviewing by commit so if any of these changes are undone in future commits feel free to ignore.

Also like the last comment - feel free to just dump the dynamic tools API endpoint and replace it with this. I think I like that name better also.

Absolutely open to changing how we name things, we did already have 3 or 4 different versions when we hacked on this in Berlin. I'll keep that for a last pass when it's nearing completion though.

jmchilton · 2025-01-23T15:54:29Z

This is amazing. I don't hate any of it. The CWL fields I think we would struggle to fill out with our runtime is the only part that caused significant stress but I didn't see an attempt to fill those out. I would have semantic questions like is name going to be the name or collection identifier, etc.. but I don't think those are details you've tackled yet unless I missed the commit.

I would have started with locking tools down at the XML layer and have dozens of test cases around making sure tool action expressions cannot be evaluated, etc.. for unprivileged tools but I understand that part is pretty unsexy and I think there is some chance that having a fully defined model means those things might be completely unreachable and so that might have been unnecessary work. I think we need to at least audit all the features before the final merge.

I created a list of things I'd like to see to broken out into smaller PRs to clean up the core as I was reviewing the commits. None of this is essential - if it works, it works - but any of that extra effort would be appreciated and would ease follow up reviews I think and help isolate potential problems.

Other John will want the migration pulled into its own commit, I’d like it if you used abstractions from migration util instead of alembic directly.
Fix YAML and tool output default handling in f7c0014 (Rebased with d1e7bb8)
Mapping type fixes in model/init.py in c2b72e0
Component refactor in 9f8ed6d
Typing for trans in _workflow_to_dict_run in b3a98d7
get to get_one in tools/init in b3a98d7
Fix 1d0977c I think?
782caae (container parsing from dicts)
de27de2 (collection type in parameter models)
f756bcb (better discriminators in parameter models)
95b2eb4 (fix conditionals in YAML tools)
ec05106 (Fix job parameter summary for inputs without label)
Modeling existing requirements in parameters/interface in 6a98fe8 (would be hard to unwind with new requirement though)

nsoranzo

Suggestions from my current work on the cwl-1.0 branch.

nsoranzo · 2025-01-23T15:51:30Z

lib/galaxy/managers/tools.py

 )

 from galaxy import (
    exceptions,
    model,
 )
 from galaxy.exceptions import DuplicatedIdentifierException
-from galaxy.model import DynamicTool
+from galaxy.managers.context import ProvidesUserContext


Suggested change

from galaxy.managers.context import ProvidesUserContext

lib/galaxy/managers/tools.py

lib/galaxy/webapps/galaxy/api/dynamic_tools.py

lib/galaxy/workflow/modules.py

nsoranzo · 2025-01-23T15:52:57Z

lib/galaxy_test/api/test_tools.py

@@ -1381,6 +1400,62 @@ def test_dynamic_tool_no_id(self):
        output_content = self.dataset_populator.get_history_dataset_content(history_id)
        assert output_content == "Hello World 2\n"

+    # This works except I don't want to add it to the schema right now,
+    # since I think the shell_command is what we'lll go with (at least initially)


Suggested change

# since I think the shell_command is what we'lll go with (at least initially)

# since I think the shell_command is what we'll go with (at least initially)

jmchilton · 2025-01-23T16:13:12Z

I think we need to at least audit all the features before the final merge.

Maybe the way to address this is worrying about documentation. A tool translation guide maybe where each feature from XML is listed (under contents in https://docs.galaxyproject.org/en/master/dev/schema.html) and how to port it to YAML and if there are any security considerations. I did a lot of work in syncing XSD and YAML model docs in #18787 - I think we will want something like that for the broader tools right? We will need to keep model docs and XSD docs synchronized but also have separate customizations for each. It is kind of a hard problem but worth thinking about and maybe capturing security concerns at this point.

mvdbeek · 2025-01-23T16:34:01Z

I would have semantic questions like is name going to be the name or collection identifier, etc.. but I don't think those are details you've tackled yet unless I missed the commit.

We already had to do that for the conditional step work, the implementation is in galaxy.workflows.modules.to_cwl. The keys for the inputs object is the input name, the inner elements are the element identifiers (since there's no name anyway for nested collections). For leaf elements (i.e class: File), basename is defined as value.dataset.created_from_basename or element_identifier or value.name ... which is debatable.

I think we need to at least audit all the features before the final merge.

Definitely, I didn't want to go down that route before getting a bit of feedback that we're moving in the right direction, and convincing myself that authoring yaml tools with the schema and shell_command autocompletion is a nice experience. Yes, the input model helps in avoiding some things, but right now you still have cheetah available e.g in the label section of the outputs for instance, that probably needs to go as well.

I agree that there is a lot of work left, and some work that can be pulled out separately, I'll do that as I keep making progress here.

how to port it to YAML and if there are any security considerations

Since we parse both XML and YAML into ToolSource, and we can build the representation out of the ToolSource there is also an opportunity to convert existing XML tools (minus the cheeath, configfiles etc things). That's a nice way of exploring how it'll feel to build complex tools in the YAML variant.

think we will want something like that for the broader tools right? We will need to keep model docs and XSD docs synchronized

Indeed, that's probably the next thing to do to make the json schema really helpful. I have for instance sprinkled a few annotation into the model, and I just copied the relevant part from the XSD. We should keep that in sync somehow.

nsoranzo · 2025-01-24T13:53:06Z

lib/galaxy/model/__init__.py

+    hidden: Mapped[Optional[bool]] = mapped_column(default=False)
+    active: Mapped[Optional[bool]] = mapped_column(default=True)


These 2 columns are also present in "DynamicTool", is the duplication because the plan is to make DynamicTools sharable by associating multiple user ids to the same tool id?

The primary benefit is that the command section does not have access the app or any dangling reference to the database connection or any other secrets. There are two flavors here, one uses base_command and arguments, and allows building up an (escaped) argv list, the other is a shortcut for writing shell scripts and feels maybe a bit more like writing a very simple cheetah section. base_command: ```yml name: base_command tool class: GalaxyTool version: 1.0.0 base_command: cat arguments: - $(inputs.input.path) - '>' - output.fastq inputs: - type: data name: input outputs: output: type: data from_work_dir: output.fastq name: output ``` shell_command style: ```yml name: shell_command tool class: GalaxyTool version: 1.0.0 shell_command: cat '$(inputs.input.path)' > output.fastq inputs: - type: data name: input outputs: output: type: data from_work_dir: output.fastq name: output ```

…ific component

Will probably need this later for efficiency and ignoring `$()` outside of shell_command.

Simply re-use models by index and set values. It's currently a high-water mark situation, and there will be a warning once 200 models (i.e. 200 embedded fragments) are created, but that seems pretty unlikely.

lib/galaxy/managers/tools.py

Co-authored-by: Nicola Soranzo <[email protected]>

github-actions bot added area/testing area/API area/jobs area/workflows area/database Galaxy's database or data access layer area/tool-framework area/tool-dependencies area/testing/api area/client labels Jan 21, 2025

jmchilton reviewed Jan 23, 2025

View reviewed changes

nsoranzo reviewed Jan 23, 2025

View reviewed changes

nsoranzo reviewed Jan 24, 2025

View reviewed changes

mvdbeek added 14 commits January 31, 2025 18:03

Create UserToolSource pydantic model

b075616

Minor fix to restore admin-style dynamic tools

702fc94

Add user_tool role types

293e329

Add unprivileged_tools endpoint and models

aa5da13

Add separate unprivileged tools endpoint

1bf1821

Test tool run

fde110b

Bypass global toolbox for unprivileged tools

7de50eb

Refactor InvocationScrollList to generic ScrollList + Invocation spec…

3c7dd0d

…ific component

Return unprivileged tool api return types

9a8ae3c

Enable access to dynamic tools via toolbox.get_tool

d5f1348

YamlToolSource parser fixes

75598a0

Into add unprivileged tools endpoints

30434ef

Into test tool run

70c6947

mvdbeek added 27 commits January 31, 2025 18:11

Rebuild client schema

a0bf4f6

Install schema-to-ts from my package

5c52eb0

Correctly dispose of resource when unmounting editor

ab1afad

Add Syntax highlighting for JS embedded in $()

ca2fa01

Move monarch language config to separate file

04f9d8c

Rename expressionLib code extraction function

8149428

Add function that extracts embedded JS fragments

71e226c

Skip ahead to relevant key

d48b98c

Will probably need this later for efficiency and ignoring `$()` outside of shell_command.

Refactor fragment model creation

591316d

Implement provideHover for embedded js fragments

16f73ad

Drop now unused translateYamlPositionToEmbedded

88a07e0

Implement diagnostics for js fragments

e87bc3d

Drop log

3094c6a

Consume @eos token

0f2cd6b

Include $() in bracket definition

47e2135

Tweak editor performance by not adding es5 type defs on each change

d2583a3

Fix dispose call

04a8b1b

Fix model management

78cbe26

Simply re-use models by index and set values. It's currently a high-water mark situation, and there will be a warning once 200 models (i.e. 200 embedded fragments) are created, but that seems pretty unlikely.

log

ac60e5a

Add an explicit end sequence for embedded fragments

1af6321

Pop out of jsEmbedded on end of line

08af479

Document limitations of syntax highlighting on embedded fragments

9c8b8fe

Enable (Javascript) requirements on frontend/backend

cf34595

Rebuild schemas

39285c2

Fix token colorization for shell_command / expression_lib

d0ca903

Separate tool name and description vertically

7d53fba

Add a few annotations to test out schema info rendering

a0c1e15

mvdbeek force-pushed the user_defined_tools branch from ec05106 to a0c1e15 Compare January 31, 2025 17:48

nsoranzo reviewed Jan 31, 2025

View reviewed changes

lib/galaxy/managers/tools.py Outdated Show resolved Hide resolved

fix rebase

2004ab7

Co-authored-by: Nicola Soranzo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add user defined YAML tools #19434

Add user defined YAML tools #19434

mvdbeek commented Jan 21, 2025

jmchilton Jan 23, 2025

jmchilton Jan 23, 2025

mvdbeek Jan 23, 2025

jmchilton commented Jan 23, 2025

nsoranzo left a comment

nsoranzo Jan 23, 2025

nsoranzo Jan 23, 2025

jmchilton commented Jan 23, 2025

mvdbeek commented Jan 23, 2025

nsoranzo Jan 24, 2025

	# since I think the shell_command is what we'lll go with (at least initially)
	# since I think the shell_command is what we'll go with (at least initially)

		hidden: Mapped[Optional[bool]] = mapped_column(default=False)
		active: Mapped[Optional[bool]] = mapped_column(default=True)

Add user defined YAML tools #19434

Are you sure you want to change the base?

Add user defined YAML tools #19434

Conversation

mvdbeek commented Jan 21, 2025

How to test the changes?

License

jmchilton Jan 23, 2025

Choose a reason for hiding this comment

jmchilton Jan 23, 2025

Choose a reason for hiding this comment

mvdbeek Jan 23, 2025

Choose a reason for hiding this comment

jmchilton commented Jan 23, 2025

nsoranzo left a comment

Choose a reason for hiding this comment

nsoranzo Jan 23, 2025

Choose a reason for hiding this comment

nsoranzo Jan 23, 2025

Choose a reason for hiding this comment

jmchilton commented Jan 23, 2025

mvdbeek commented Jan 23, 2025

nsoranzo Jan 24, 2025

Choose a reason for hiding this comment