Skip to content

feat(skills): support loading skills from GitHub/Git URLs#2091

Open
dgallitelli wants to merge 8 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading
Open

feat(skills): support loading skills from GitHub/Git URLs#2091
dgallitelli wants to merge 8 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading

Conversation

@dgallitelli
Copy link
Copy Markdown
Contributor

@dgallitelli dgallitelli commented Apr 8, 2026

Summary

Closes #2090

  • Add support for remote Git repository URLs as skill sources in AgentSkills and Skill.from_url()
  • Support https://, git@, and ssh:// URLs with optional @ref suffix for branch/tag pinning (e.g., https://github.com/org/skill@v1.0.0)
  • Repositories are shallow-cloned (--depth 1) and cached locally at ~/.cache/strands/skills/ (configurable via cache_dir parameter)

Usage

from strands import Agent, AgentSkills

# Load skill directly from a GitHub repo
plugin = AgentSkills(skills=["https://github.com/dgallitelli/aws-data-agent-skill"])

# With version pinning
plugin = AgentSkills(skills=["https://github.com/org/skill-repo@v1.0.0"])

# Mixed with local skills
plugin = AgentSkills(skills=[
    "https://github.com/org/skill-repo",
    "./local-skills/my-skill",
])

# Or use the Skill classmethod directly
from strands import Skill
skills = Skill.from_url("https://github.com/org/skill-repo@main")

Changes

File Change
_url_loader.py (new) URL parsing, git clone with shallow depth, hash-based caching
skill.py Added Skill.from_url() classmethod
agent_skills.py Updated _resolve_skills() to detect URL strings; added cache_dir parameter
test_url_loader.py (new) 22 tests for URL detection, parsing, cache keys, and clone behavior
test_skill.py 6 tests for Skill.from_url()
test_agent_skills.py 4 tests for URL resolution in AgentSkills

Design decisions

  • Git clone via subprocess: No new dependencies — uses git which is universally available. Handles auth naturally (SSH keys, credential helpers, tokens).
  • Shallow clone: --depth 1 minimizes bandwidth and disk for skill repos.
  • Hash-based caching: sha256(url + ref)[:16] as cache directory name. Repeated loads are instant.
  • Graceful degradation: Failed URL clones log a warning and skip (consistent with how nonexistent filesystem paths are handled).
  • No new dependencies: Only stdlib modules (subprocess, hashlib, shutil).

Test plan

  • All 171 skill tests pass (32 new tests added)
  • Ruff lint and format pass
  • Pre-existing mypy error is unchanged (unrelated @hook decorator typing)
  • Manual test with a real public skill repo (e.g., aws-data-agent-skill)
  • Manual test with a nested real public skill repo (e.g., obra/superpowers/brainstorming)

Davide Gallitelli and others added 2 commits April 9, 2026 00:06
Add support for remote Git repository URLs as skill sources in
AgentSkills, enabling teams to share and reference skills directly
from GitHub without manual cloning.

Closes strands-agents#2090

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse GitHub web URLs like /tree/<ref>/path to extract the clone URL,
branch, and subdirectory path. This enables loading skills from
subdirectories within mono-repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dgallitelli dgallitelli force-pushed the feat/skills-github-url-loading branch from 01a4fc3 to 83966f6 Compare April 9, 2026 00:06
@github-actions github-actions bot added size/l and removed size/l labels Apr 9, 2026
@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Apr 9, 2026

/strands review

@mkmeral mkmeral self-assigned this Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/vended_plugins/skills/agent_skills.py 75.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!


logger = logging.getLogger(__name__)

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" is evaluated at module import time. In containerized environments (Lambda, Docker) where HOME may not be set, Path.home() raises RuntimeError.

Suggestion: Evaluate this lazily, e.g., inside clone_skill_repo when cache_dir is None. You could use a sentinel or simply inline Path.home() / ".cache" / "strands" / "skills" inside the function:

cache_dir = cache_dir or Path.home() / ".cache" / "strands" / "skills"

This also respects XDG_CACHE_HOME if you want to be a good citizen on Linux:

import os
cache_dir = cache_dir or Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache")) / "strands" / "skills"

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"

# Patterns that indicate a string is a URL rather than a local path
_URL_PREFIXES = ("https://", "http://", "git@", "ssh://")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: http:// is included in URL prefixes, allowing skill repos to be cloned over plaintext HTTP. This exposes users to man-in-the-middle attacks where a malicious actor could inject arbitrary code into the cloned skill.

Suggestion: Remove "http://" from _URL_PREFIXES, or at minimum log a warning when an http:// URL is detected in clone_skill_repo. Git itself warns about this, but the SDK should guide users toward secure defaults (tenet: "the obvious path is the happy path").

key = cache_key(url, ref)
target = cache_dir / key

if not target.exists():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Race condition — if two threads/processes call clone_skill_repo concurrently with the same URL, both pass the not target.exists() check and attempt to clone into the same directory. One will fail or produce a corrupt state.

Suggestion: Use an atomic pattern: clone to a temporary directory first, then os.rename() (atomic on the same filesystem) to the target path. If the target already exists by the time rename happens, just delete the temp clone. Something like:

import tempfile

with tempfile.TemporaryDirectory(dir=cache_dir) as tmp:
    tmp_target = Path(tmp) / "repo"
    # ... clone into tmp_target ...
    try:
        tmp_target.rename(target)
    except OSError:
        # Another process won the race — that's fine, use their clone
        pass

return hashlib.sha256(key_input.encode()).hexdigest()[:16]


def clone_skill_repo(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There is no mechanism to refresh a cached clone. When no ref is specified, the default branch is cloned. If the remote repo is updated, the user gets stale content indefinitely with no obvious way to refresh (other than manually deleting ~/.cache/strands/skills/).

Suggestion: Consider adding a force_refresh: bool = False parameter (or documenting the cache invalidation story clearly). At minimum, document in the docstring that users should delete the cache directory or pin a ref for reproducibility. This is especially important since the cache key is a hash — users can't easily find and delete the right directory.

List of Skill instances loaded from the repository.

Raises:
RuntimeError: If the repository cannot be cloned or ``git`` is not
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The Raises section documents RuntimeError but the method also raises ValueError (via is_url check on line 377). Additionally, calling Skill.from_url() with a non-URL raises ValueError, but using the same string through AgentSkills(skills=["./path"]) treats it as a local path. This inconsistency could confuse users.

Suggestion: Add ValueError to the Raises section in the docstring.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: This PR introduces new public API surface (Skill.from_url() classmethod, cache_dir parameter on AgentSkills.__init__). Per the API Bar Raising guidelines, this should have the needs-api-review label since it adds a new public classmethod that customers will use directly.

Key questions an API reviewer should evaluate:

  1. Should from_url return list[Skill] (inconsistent with from_file which returns a single Skill)? The list return type is due to the possibility of multi-skill repos, but this creates an asymmetric API. Consider whether a separate from_url_directory would be cleaner, or if from_url should raise when multiple skills are found unless the user explicitly opts in.
  2. Is cache_dir the right parameter name and placement? It's only relevant when URL sources are used but appears as a top-level AgentSkills.__init__ parameter alongside unrelated parameters like state_key and max_resource_files.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: The _url_loader module is not mentioned in the directory structure in AGENTS.md. Per the AGENTS.md instructions: "After making changes that affect the directory structure (adding new directories, moving files, or adding significant new files), you MUST update this directory structure section."

Suggestion: Add _url_loader.py under the vended_plugins/skills/ section in the AGENTS.md directory tree.

from ._url_loader import clone_skill_repo, is_url, parse_url_ref

if not is_url(url):
raise ValueError(f"url=<{url}> | not a valid remote URL")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The from_url ValueError uses an f-string in the exception message: f"url=<{url}> | not a valid remote URL". This is consistent with the structured logging style used elsewhere, but the AGENTS.md specifically says "Don't use f-strings in logging calls." Since this is an exception message (not a logging call), this is acceptable — but the logging calls throughout the module correctly use %s format, which is good.

However, the RuntimeError raised in clone_skill_repo at line 189 also uses an f-string: f"url=<{url}>, ref=<{ref}> | failed to clone...". This is fine since it's an exception message, not a log statement.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Review Summary

Assessment: Request Changes

This is a well-structured feature that fills a real user need. The design decisions (subprocess git, shallow clone, hash-based caching) are sound and well-documented in the PR description. The test coverage is thorough with 32 new tests.

Review Categories
  • Security: http:// URL support exposes users to MITM risks; Path.home() at import time can fail in containers. The subprocess usage is properly handled with shell=False.
  • Reliability: Race condition in cache directory creation when concurrent processes clone the same URL. No cache invalidation mechanism for unpinned refs means stale content with no obvious path to refresh.
  • API Design: This PR adds public API surface (Skill.from_url(), cache_dir on AgentSkills) and should have the needs-api-review label. The from_url returning list[Skill] is asymmetric with from_file returning Skill — worth discussing with an API reviewer.
  • Documentation: No documentation PR linked — this adds new public API that users need to know about. AGENTS.md directory tree and set_available_skills docstring need updating.

The core implementation is solid — addressing the security and reliability items above would make this ready to merge.

- Security: remove http:// support (MITM risk), only allow https/ssh/git@
- Reliability: atomic clone via tempdir + rename to prevent race conditions
- Reliability: evaluate cache dir lazily (not at import time) for containers
- Usability: respect XDG_CACHE_HOME for cache directory
- Usability: add force_refresh parameter to re-clone stale unpinned refs
- Docs: add ValueError to Skill.from_url() Raises section
- Docs: add _url_loader.py to AGENTS.md directory tree
- Tests: add coverage for XDG_CACHE_HOME, force_refresh, race condition,
  http:// rejection (184 tests total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot removed the size/l label Apr 10, 2026
- Fix mypy error: annotate response.read().decode() return type
- Add defensive https:// check in fetch_skill_content()
- Fix stale "Git URL" docstring in _resolve_skills()
- Fix stale "git clone, caching" description in AGENTS.md
- Add note that non-GitHub URLs must point directly to SKILL.md
- Add tests for unrecognized GitHub URL fallthrough and http:// rejection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
return source.startswith("https://")


def resolve_to_raw_url(url: str) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't think we should be doing the raw url translation. I'd say this is an application concern, that they should pass correct URLs to the Skill.from_url

This PR should be targeted at loading skills from URL, should not be opinionated on what that url is. At the end of the day, it's up to the devs to pass in the correct url

Skill.from_url should essentially be: get url as input, make GET request to get content from the url, pass it down to Skill.from_content. We should not have any complex resolution

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed all the GitHub URL resolution logic in the latest push. Skill.from_url() is now just: validate https://, GET the URL, pass to from_content(). ~65 lines total in _url_loader.py, no opinions on URL format.

dgallitelli pushed a commit to dgallitelli/docs that referenced this pull request Apr 13, 2026
Document Skill.from_url() and HTTPS URL support in AgentSkills:
- New "Loading skills from URLs" section with examples
- Add from_url() to programmatic creation examples
- Update configuration table to mention URL sources
- Note about resource directory limitations for URL-loaded skills

Companion to strands-agents/sdk-python#2091

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dgallitelli
Copy link
Copy Markdown
Contributor Author

Documentation PR: strands-agents/docs#756

Per review comment, Skill.from_url() should not be opinionated about
URL format. Remove all resolve_to_raw_url logic — the method now simply
GETs the provided URL and passes content to from_content(). Users are
responsible for providing a direct link to raw SKILL.md content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Returns:
Dict mapping skill names to Skill instances.
"""
from ._url_loader import is_url
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we explicitly need url loader import in method? if not, can we move to top?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest push — removed _url_loader.py entirely, inlined the fetch logic into Skill.from_url(), and moved all imports to the top of the module.

ValueError: If ``url`` is not an ``https://`` URL.
RuntimeError: If the SKILL.md content cannot be fetched.
"""
from ._url_loader import fetch_skill_content, is_url
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not have imports in methods, unless necessary. let's move this to the top

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest push — removed _url_loader.py entirely, inlined the fetch logic into Skill.from_url(), and moved all imports to the top of the module.

return source.startswith("https://")


def fetch_skill_content(url: str) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need another file? can we just move this under Skill.from_url (or like as helper methods there)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest push — removed _url_loader.py entirely, inlined the fetch logic into Skill.from_url(), and moved all imports to the top of the module.

Per maintainer feedback:
- Remove _url_loader.py — inline fetch logic directly into Skill.from_url()
- Move imports to top of module (not inside methods)
- Use source.startswith("https://") directly in _resolve_skills()
- Remove AGENTS.md entry for deleted file

The entire URL loading feature is now ~15 lines in skill.py with no
separate module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix stale "auto-resolved to raw content" in AgentSkills docstring
- Add test for non-SKILL.md content (HTML page → ValueError)
- Add from_url() example to Skill class docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Review Summary

Assessment: Comment

The implementation is now clean and minimal — Skill.from_url() is ~15 lines of straightforward urllib fetch → from_content, with no external dependencies, no caching, no GitHub-specific resolution. All prior review feedback has been addressed except one minor docstring gap.

Remaining Items
  • Docstring: set_available_skills still doesn't mention URL support (carried over from prior round)
  • Test coverage: Codecov flags the duplicate-URL-skill warning branch as uncovered — a small test would close the gap

The doc PR (strands-agents/docs#756) is linked. Nice work iterating on the design to reach this simple, focused solution. 👍

try:
skill = Skill.from_url(source, strict=self._strict)
if skill.name in resolved:
logger.warning("name=<%s> | duplicate skill name, overwriting previous skill", skill.name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Codecov flags the duplicate-skill-name warning on line 308 (if skill.name in resolved) as uncovered. There's no test case where two URL sources resolve to a skill with the same name.

Suggestion: Add a test in TestResolveUrlSkills:

def test_resolve_duplicate_url_skills_warns(self, caplog):
    """Test that duplicate skill names from URLs log a warning."""
    import logging
    from unittest.mock import patch

    with (
        patch(
            f"{self._SKILL_MODULE}.urllib.request.urlopen",
            return_value=self._mock_urlopen(self._SAMPLE_CONTENT),
        ),
        caplog.at_level(logging.WARNING),
    ):
        plugin = AgentSkills(skills=[
            "https://example.com/a/SKILL.md",
            "https://example.com/b/SKILL.md",
        ])

    assert len(plugin.get_available_skills()) == 1
    assert "duplicate skill name" in caplog.text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentSkills: support loading skills from GitHub URLs

2 participants