Skip to content

Conversation

@dsubak
Copy link
Collaborator

@dsubak dsubak commented Aug 12, 2025

What are the relevant tickets?

https://github.com/mitodl/hq/issues/8028

Description (What does it do?)

This is a proof of concept for a jupyter extension which uses gh-scoped-credentials to automatically push the running notebook's ipynb file, all files under the root datasets directory and associated dependencies to a specified repo.

Screenshots (if appropriate):

Demo video:

Screen.Recording.2025-08-11.at.4.45.28.PM.mov

How can this be tested?

Prerequisites for setup:

  • Jupyterhub running locally
  • A github application configured as detailed in gh-scoped-creds readme
  • A test repo to persist data to. I would strongly suggest this be empty as the extension will attempt to create requirements.txt, runtime.txt and the notebook file in the root of the repo.
  • You need to specify the following overrides.json in your application's settings directory:
{
  "jupyter2repo:plugin": {
    "GH_APP_ID": "YOUR_ID_FROM_SECRETS",
    "GH_APP_URL": "https://your.app/url"
  }
}

The settings directory can be found using jupyter lab path

E2E test

  • Install the extension in jupyterhub. In my case, this was a matter of running pip install -e ol-notebook-extensions/jupyter2repo, but your mileage will vary depending on how you're running Jupyterhub.
  • Start jupyterhub
  • Navigate to your workspace and select a notebook.
  • Select Activate Command Palette from the View menu and select Save Notebook to Github. Follow the instructions on screen.

Additional Context

This is intended as a proof of concept. It is sufficient to validate the approach from a technical perspective, but below are a non-exhaustive list of changes we can consider to improve the user experience or shore up the functionality:

  • We can have a form take all three inputs at once. As we've settled on a per-course repo layout within the MITx org, we should be able to take the app-specific values as config values at install time. We should only need one bit of user supplied info.
  • We can remove some inputs, if we settle on a specific setup for target repos (i.e. if we require all target repos to be within the mitx org, we would only need the target repo). See above.
  • There's no input validation at the moment, so it's easy to break by entering incorrect or malicious data. This is not safe to run in production at the moment.
  • This makes no provision for storing any data files that may be important to the notebook. Discussion is ongoing at https://github.com/mitodl/hq/issues/8028#issuecomment-3184844533 It will persist any files put under a datasets directory - this will be the convention going forward.
  • Due to where it stores the files, this is currently only capable of storing one notebook per repo. We could change the conventions for where we persist the files to organize multiple notebooks and kernels. We've decided upon a repo per-course, so we should be able to persist multiple notebooks and a single kernel per repo. See discussion at https://github.com/mitodl/hq/issues/8028#issuecomment-3180597032
  • It also assumes that the target repo is already created before this is run. We likely want a template repo or other simple way to stand up repos with the desired structure.
  • Right now, this uses an older version of typescript and has skipLibCheck specified - this is due to some recent updates that resulted in some dependency installation problems in the extension template repos.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @dsubak, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a proof-of-concept JupyterLab extension designed to automate the process of pushing a running Jupyter notebook, along with its Python dependencies (requirements.txt) and runtime environment (runtime.txt), to a specified Git repository. The extension leverages gh-scoped-credentials for GitHub authentication and provides a user-friendly interface through the JupyterLab command palette. It's currently a technical validation and not production-ready, with known limitations regarding data storage, multi-notebook support per repository, and input validation.

Highlights

  • JupyterLab Extension Development: Introduces a new jupyter2repo JupyterLab extension.
  • Automated Git Push: Enables automatic pushing of the active Jupyter notebook (.ipynb), generated requirements.txt, and runtime.txt to a designated Git repository.
  • GitHub Authentication Integration: Utilizes gh-scoped-credentials to handle GitHub application authentication, prompting the user for client ID and app URL.
  • Interactive User Workflow: Provides a "Save Notebook to Github" command in the JupyterLab palette, guiding the user through input dialogs for repository details and confirming permissions.
  • Real-time Feedback: Incorporates a "Log Panel" to display execution output and messages from the kernel during the save and push process.
  • Project Structure and Build Setup: Establishes a comprehensive project structure including pyproject.toml, package.json, README.md, and RELEASE.md for Python and Node.js packaging and release management.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dsubak dsubak marked this pull request as draft August 12, 2025 13:52
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a proof-of-concept JupyterLab extension to push notebooks to a git repository. The implementation has several critical security vulnerabilities related to command and code injection due to unsanitized user input being used to construct shell commands and Python code. I've provided suggestions to mitigate these risks. Additionally, there are some bugs in logging, inconsistencies in hardcoded values, and incomplete package metadata. My review focuses on improving security and correctness.

Comment on lines 120 to 121
const auth_command = `import gh_scoped_creds
gh_scoped_creds.main(['--client-id','${ghClientID}', '--github-app-url', '${ghAppUrl}'])`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ghClientID and ghAppUrl variables, which come from user input, are directly embedded into a Python code string. This creates a serious code injection vulnerability. A malicious user could provide input that breaks out of the string literal and executes arbitrary Python code in the kernel. To fix this, you should ensure the values are properly serialized as Python string literals. Using JSON.stringify() is a good way to achieve this, as it will produce valid Python string literals.

Suggested change
const auth_command = `import gh_scoped_creds
gh_scoped_creds.main(['--client-id','${ghClientID}', '--github-app-url', '${ghAppUrl}'])`;
const auth_command = `import gh_scoped_creds
gh_scoped_creds.main(['--client-id', ${JSON.stringify(ghClientID)}, '--github-app-url', ${JSON.stringify(ghAppUrl)}])`;

Comment on lines +10 to +13
"homepage": "",
"bugs": {
"url": "/issues"
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The homepage and bugs.url fields are either empty or contain a relative URL. For the package to be easily discoverable and for users to find help, these should be populated with the full, absolute URLs to the project's homepage and issue tracker. Based on the PR description, I've inferred the repository.

    "homepage": "https://github.com/mitodl/hq",
    "bugs": {
        "url": "https://github.com/mitodl/hq/issues"
    },

Comment on lines +27 to +30
"repository": {
"type": "git",
"url": ".git"
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repository.url is set to .git, which is not a valid repository URL for package managers. This should be the full HTTPS or Git URL to the repository.

    "repository": {
        "type": "git",
        "url": "https://github.com/mitodl/hq.git"
    }

@blarghmatey blarghmatey requested a review from Copilot August 12, 2025 19:31

This comment was marked as outdated.

dsubak added 4 commits August 13, 2025 13:05
…rl. Those values now must be set via overrides.json, which should be provisioned at install time. This should make it safer to use these as inputs for shell commands
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a proof-of-concept JupyterLab extension to save a notebook and its dependencies to a GitHub repository. The implementation is well-structured for a PoC, with clear comments outlining prerequisites and limitations. The core logic in index.ts effectively uses kernel execution to perform git operations and provides user feedback through a dedicated log panel.

My review focuses on improving robustness, maintainability, and user experience. I've identified a few areas for improvement:

  • The package.json should be updated with complete repository information.
  • The log panel creation could be made idempotent to prevent UI clutter.
  • There are opportunities to make the git operations more robust, such as using more descriptive commit messages, not hardcoding the branch name, and adding safeguards for file system operations.
  • I've also included suggestions to follow through on the refactoring opportunities you've marked with TODOs.

Overall, this is a great start for the extension. The changes I've suggested should help mature the code from a PoC to a more production-ready state.

return;
}

//TODO - pull this log panel init into it's own function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

As you've noted with the TODO, pulling the log panel initialization into its own function is a good idea. When doing so, consider making the log panel creation idempotent. Currently, a new 'Log Panel' is created every time the command is executed, which can clutter the UI with multiple panels. A better approach would be to create the panel once and reuse it, or bring it into focus on subsequent executions. You could achieve this by storing the widget instance in a variable outside the execute function and checking if it exists and is not disposed before creating a new one.

"jupyterlab",
"jupyterlab-extension"
],
"homepage": "",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The homepage field is empty. Additionally, bugs.url (line 12) and repository.url (line 30) have placeholder values. For better package discoverability and maintainability, these should be updated to point to the actual project repository URL.

title: 'Provide repo to push requirements to'
});
const ghTargetRepo = ghTargetRepoResponse.value;
// TODO: Pull all these validations into a function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

As you've noted with the TODO, pulling these validations into a separate function would improve the readability and maintainability of the execute method. This function could take ghClientID, ghAppUrl, and ghTargetRepo as arguments and return a boolean indicating if they are all valid, while also logging appropriate error messages.

dsubak and others added 2 commits August 13, 2025 16:19
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@dsubak dsubak requested a review from Copilot August 13, 2025 20:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a proof-of-concept Jupyter extension that enables automatic pushing of running notebooks and their dependencies to GitHub repositories. The extension integrates with GitHub Apps using gh-scoped-credentials for authentication and provides a command palette entry for users to save their work to a specified GitHub repository.

Key changes:

  • Complete JupyterLab extension implementation with TypeScript frontend and Python backend
  • GitHub integration using scoped credentials for secure authentication
  • Automatic packaging of notebooks, dependencies (requirements.txt), and dataset files

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/index.ts Core extension logic implementing GitHub authentication, file collection, and repository push functionality
tsconfig.json TypeScript configuration with skipLibCheck enabled for dependency compatibility
package.json Project dependencies and build configuration for the JupyterLab extension
pyproject.toml Python packaging configuration with build system and metadata
schema/plugin.json Settings schema for GitHub App configuration values

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

// TODO: We should skip this if we're already credentialed because it's definitely the most annoying part of the process
// Could do this either by cloning the repo and attempting a push --dry-run or by snagging the access token from the temporary file or from the git config
// Need to play around more with this
const auth_command = `!gh-scoped-creds --client-id='${ghClientID}' --github-app-url='${ghAppUrl}'`;
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User input is directly interpolated into shell commands without proper escaping or validation. This creates a command injection vulnerability if the settings contain malicious values. Consider using proper parameter escaping or a safer execution method.

Suggested change
const auth_command = `!gh-scoped-creds --client-id='${ghClientID}' --github-app-url='${ghAppUrl}'`;
const auth_command = `
import subprocess
subprocess.run([
"gh-scoped-creds",
"--client-id", ${JSON.stringify(ghClientID)},
"--github-app-url", ${JSON.stringify(ghAppUrl)}
], check=True)
`.trim();

Copilot uses AI. Check for mistakes.
git push origin main
echo 'Cleaning up checkout'
cd ..
[ -n "${targetDirectory}" ] && rm -rf "${targetDirectory}"
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone command uses user input directly without proper validation or escaping. Even though there's a regex validation, this could still be vulnerable to command injection if the validation is bypassed. Consider using a safer method to execute git commands with validated parameters.

Suggested change
[ -n "${targetDirectory}" ] && rm -rf "${targetDirectory}"
// Escape user-controlled variables for shell safety
function shellEscape(arg: string): string {
// Simple shell escaping: wrap in single quotes and escape existing single quotes
return `'${arg.replace(/'/g, `'\\''`)}'`;
}
const safeGhTargetRepo = shellEscape(ghTargetRepo);
const safeNotebookFilename = shellEscape(notebookFilename);
const safeTargetDirectory = shellEscape(targetDirectory);
const gitCommands = `
%%bash
git clone ${safeGhTargetRepo} ${safeTargetDirectory}
cp ${safeNotebookFilename} ${safeTargetDirectory}/
cp -R datasets ${safeTargetDirectory}
cd ${safeTargetDirectory}
pip freeze > requirements.txt
echo $(python -c "import sys; print(f'python-{sys.version_info.major}.{sys.version_info.minor}')") > runtime.txt
git add requirements.txt
git add runtime.txt
git add $(basename ${safeNotebookFilename})
git add datasets
echo 'Pushing to github'
git commit -m 'Update from notebook ${notebookFilename}'
git push origin main
echo 'Cleaning up checkout'
cd ..
[ -n ${safeTargetDirectory} ] && rm -rf ${safeTargetDirectory}

Copilot uses AI. Check for mistakes.
"strict": true,
"strictNullChecks": true,
"target": "ES2018",
"skipLibCheck": true
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The skipLibCheck option is enabled, which disables type checking for declaration files. While this may resolve immediate build issues, it can hide type errors in dependencies and reduce type safety. Consider addressing the underlying TypeScript compatibility issues instead of skipping library checks.

Suggested change
"skipLibCheck": true
"target": "ES2018"

Copilot uses AI. Check for mistakes.
"type": ["string"],
"title": "GitHub App URL",
"readOnly": true,
"default": "test2"
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default values for GitHub App credentials are set to placeholder strings ("test", "test2"). These should be empty strings or not have defaults at all to avoid accidentally using test credentials in production environments.

Suggested change
"default": "test2"
"readOnly": true
},
"GH_APP_URL": {
"type": ["string"],
"title": "GitHub App URL",
"readOnly": true

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants