Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add CI workflow as Github Action w/ security linting/scanning #45

Open
weklund opened this issue Feb 3, 2025 · 11 comments

Comments

@weklund
Copy link

weklund commented Feb 3, 2025

template pulled from https://github.com/Voyz/ibeam/blob/master/.github/ISSUE_TEMPLATE/enhancement.md

Describe the feature

Enable flake8 (for linting) and bandit (for security checks) in GitHub Actions to improve code quality and security as a CI (continuous integration) step. Initially, the workflow will be informative and non-blocking. Once we gain trust in the signal, we can enforce these checks to block PRs until issues are resolved.

Expected interaction

  • When a PR is created or updated, the CI workflow will trigger and run linting with Flake8 and security scanning with bandit
  • Initially, test results will be informative only and not block merges.
  • As confidence in the workflow/scanning grows, we can toggle it to block PRs at some point in time.
  • In the future, we can expand CI to include testing and test coverage as mentioned here OAuth support #34 (comment)

Example: .github/workflows/ci.yml:

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: ibind CI workflow

on:
  push:
    branches: [ "master" ]
  pull_request:
    branches: [ "master" ]

permissions:
  # required for all workflows
  security-events: write
  # only required for workflows in private repositories
  actions: read
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    #----------------------------------------------
    #       check-out repo and set-up python
    #----------------------------------------------
    - uses: actions/checkout@v4
    - name: Set up Python 3.12
      uses: actions/setup-python@v3
      with:
        python-version: "3.12"      # <---- unknown if we need to change or add multiple versions/runs

    #----------------------------------------------
    #       Load cached venv if cache exists
    #----------------------------------------------
    - name: Load cached venv
      id: cached-poetry-dependencies
      uses: actions/cache@v4
      with:
          path: .venv
          key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}

    - name: Install dependencies
      if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
      run: |
        pip install -r requirements.txt

    - name: Lint with flake8
      run: |
        flake8 . --exclude ./.venv/lib --max-complexity=10
        continue-on-error: true
    #----------------------------------------------
    #         Run security scans (TODO: Get Github advanced security)
    #----------------------------------------------
    - name: Perform Bandit Analysis
      run: |
        bandit -r . -ll
      continue-on-error: true

Possible implications

  • This feature introduces automated checks that could surface existing security issues and linting errors
  • We can iterate on this through multiple PRs as we learn more.

Additional context

There is currently no CONTRIBUTING.md file in the repository to provide guidance on contributing. I'm open to feedback on how best to help out :)

@weklund weklund changed the title Add CI workflow as Github Action [Enhancement] Add CI workflow as Github Action Feb 3, 2025
@weklund weklund changed the title [Enhancement] Add CI workflow as Github Action [Enhancement] Add CI workflow as Github Action w/ security linting/scanning Feb 5, 2025
@weklund
Copy link
Author

weklund commented Feb 5, 2025

As of 2025/02/05, I have made changes to above feature request.

I have some detailed questions to make sure this is implemented correctly:

  • Do we have a defined list of python versions that ibind officially supports?
  • Is it expected that we just make virtual environments? Need to know if the workflow can just pip install or if we want to use a tool like poetry
  • For Flake8 config, can we start with dlint and secure coding standard ?
    - Linting rules can quickly turn into a religious discussion that the ibind community can have later 😁

@Voyz
Copy link
Owner

Voyz commented Feb 6, 2025

Do we have a defined list of python versions that ibind officially supports?

No, not currently. This would be a good task to carry out.

Is it expected that we just make virtual environments? Need to know if the workflow can just pip install or if we want to use a tool like poetry

I personally only use pip install and have been happily coding for years without poetry, while I did see my colleagues suffer from having to use it in the past. That being said I have nothing against a poetry file living alongside if it would make some devs' lives easier, but I wouldn't encourage introducing it as the main venv setup without having a proper discussion about pros and cons.

For Flake8 config, can we start with dlint and secure coding standard ?

I defer to your advice here - can you speak of what other options could be worth considering and if this list is a good starting point?

Linting rules can quickly turn into a religious discussion that the ibind community can have later 😁

Haha, cool, let's do it quickly before anyone notices! 🤣

I would only say I'd prefer we don't do black. Pep8 ok? Some other suggestions? Frankly, I've been using the default Pycharm/Intellij IDEA formatter for years, which I think is pep8, and never had to look for alternatives.

@weklund
Copy link
Author

weklund commented Feb 6, 2025

No, not currently. This would be a good task to carry out.

Sweet! Given >=3.8 is listed in pyproject.toml I could run a matrix with 3.8, 3.9, 3.10, 3.11, 3.12, and 3.13 and see what we think of results in PR review.

I personally only use pip install and have been happily coding for years without poetry, while I did see my colleagues suffer from having to use it in the past. That being said I have nothing against a poetry file living alongside if it would make some devs' lives easier, but I wouldn't encourage introducing it as the main venv setup without having a proper discussion about pros and cons.

Yep completely understand the preference for pip and the concerns about introducing Poetry, especially if it might add complexity/breaking behavior for some developers. I agree that we shouldn't disrupt the current workflow.

That said, I’d like to propose a "crawl, walk, run" approach to introducing Poetry in a way that doesn’t break anything for current developers but allows us to evaluate its benefits in a controlled manner.

Here’s my thinking:

  1. Crawl: Add Poetry to CI (No Developer Impact)
    Enable the CI workflow to install Poetry and use it to install dependencies (poetry install).
    Keep the existing requirements.txt so we aren't breaking anything.

  2. Walk: Optional Poetry Support for Developers
    Document how developers can optionally use Poetry for local development if they prefer it.
    This gives developers the flexibility to try Poetry without forcing it on everyone.
    Developers who want to try Poetry can benefit can evaluate its features (e.g., dependency resolution, virtualenv management etc).

  3. Run: Full Discussion and Decision
    After gathering feedback from the CI workflow and optional local usage, we can have a full discussion about whether to adopt Poetry as the local dependency management tool.

With this approach, we won't introduce any disruption to the current dev flow, we'll be able to provide flexibility to folks to want to try it out, then when we want to have the full discussion we can base it on our actual experiences trying it out.

Happy to take any feedback or thoughts here. :)

I defer to your advice here - can you speak of what other options could be worth considering and if this list is a good starting point?

My thoughts here are to have a starting point to introduce ibind to these types of checks and we evaluate the findings they come up with. Just start with something, especially since we don't plan on these blocking PRs. There's probably a very in depth discussion with a pros/cons matrix for the multiple scanning tools available. I'm thinking the open source versions of tools like SonarCube or Snyk. I'm sure I'm missing a few.

I picked dlint and secure coding standard to start as they are plugins to Flake8, which makes setup trivial.

Haha, cool, let's do it quickly before anyone notices! 🤣

I would only say I'd prefer we don't do black. Pep8 ok? Some other suggestions? Frankly, I've been using the default Pycharm/Intellij IDEA formatter for years, which I think is pep8, and never had to look for alternatives.

Just to clarify, I’m suggesting we use flake8 with security-focused plugins like dlint and flake8-secure-coding-standard for linting, not formatting. These tools help catch insecure coding practices (e.g., unsafe use of eval() or pickle) without affecting how the code is formatted. Your formatter shouldn't be affected :)

@salsasepp
Copy link

Sweet! Given >=3.8 is listed in pyproject.toml I could run a matrix with 3.8, 3.9, 3.10, 3.11, 3.12, and 3.13 and see what we think of results in PR review.

Note that 3.8 is already EOL, 3.9 will reach EOL this year. Why support those? Going for 3.10+ would allow for gentle code modernization (e.g. X | Y union types, typing.Self and others).

I personally only use pip install and have been happily coding for years without poetry

I'm a happy poetry user and have used ibind[oauth] with poetry since I discovered this project a few weeks ago. Thanks for not breaking it! :-) Here's my current pyproject.toml line for your convenience:
ibind = {version = "0.1.10-rc17", extras = ["oauth"], allow-prereleases = true}

@Voyz
Copy link
Owner

Voyz commented Feb 7, 2025

Thanks for sharing your thoughts guys 👍

Regarding Python version - I don't have a strong opinion on this. If anyone can think of a good reason to support <3.10 then let's talk about it. Otherwise, I suggest we follow @salsasepp's advice and do 3.10+.

That said, I’d like to propose a "crawl, walk, run" approach to introducing Poetry in a way that doesn’t break anything for current developers

Thanks for describing these steps, if we decide to start introducing it they would be very useful. I admire your attention to introducing changes in a non-disruptive way.

Before we do though, could we first have a brief discussion on why would we even try going this direction? As a non-user I don't understand the benefits of introducing Poetry. If you could list some existing issues in areas of IBind development that could be improved by introducing Poetry I think I'd get a better understanding behind it. Thanks for bearing with me on this 🙏

My thoughts here are to have a starting point to introduce ibind to these types of checks and we evaluate the findings they come up with.

Superb, thanks for expanding on these. Let's follow your advice and introduce the two you've listed.

Just to clarify, I’m suggesting we use flake8 with security-focused plugins like dlint and flake8-secure-coding-standard for linting, not formatting.

Right! Sorry, linting and formatting go hand in hand in my head hence I assumed we'd do dlint and flake8-secure-coding-standard alongside the formatter. Your comment about it quickly turning into religious discussion made me think of formatting exactly, I didn't think linting is also that susceptible to subjectivity!

In either case, would it be worth doing CI formatting too? I'm indifferent, wondering what your opinion on this would be.

@weklund
Copy link
Author

weklund commented Feb 8, 2025

I'm up for >=3.10 :)

I imagine we would want to specify in our next release that we're bumping the python version in pyproject.toml, in case it is disruptive for production workloads on 3.8 or 3.9.

Before we do though, could we first have a brief discussion on why would we even try going this direction? As a non-user I don't understand the benefits of introducing Poetry. If you could list some existing issues in areas of IBind development that could be improved by introducing Poetry I think I'd get a better understanding behind it. Thanks for bearing with me on this 🙏

No problem at all, I wouldn't want us to add something just because it's flashy, it should solve a clear problem :)

So the 2 biggest things I like about poetry are
a/ hands free virtual environments poetry shell
b/ reproducible builds (lock file)

a/
I actually don't write a lot of python in my work job. Most of it is Java, Typescript, and a bit of Rust. I never have to think about where my project's deps are being installed. I know that it's localized to the project scope. Having to remember about Python's virtual environments and making sure I'm not doing a pip install that's global is hard.

When I am writing Python, I love the fact that just typing poetry shell I don't have think about anything because it handles venv for me.

My dev flow when cloning a python repo at work is git clone, poetry install, poetry shell.

b/
I think this matters the most when you have multiple folks contributing to code, but also when you have CI running unit tests, I can't tell you how many times I've had unit tests fail in CI that worked on my local machine because the build was just slightly off on a single dependency. Put another way, the "works on my machine" issue with troubleshooting never happens with lock files, along with it's complex dependency resolution built-in.

Now another argument to be made could be to make the maintainer's life (yours) easier. Now I won't strongman this argument since I haven't done it with a python package myself, but allegedly poetry makes package versioning and publishing to pypi much easier. Not sure what you're current process is for releases so I'll just denote some poetry features:

poetry version patch|minor|major|prepatch|preminor|premajor|prerelease seems an awesome ways to tick versions so you're not manually doing it. This could also allow in the future to have completely automated releases based on some criteria. (Obviously not in scope for this work, but opens up these possibilities for the future)

poetry publish --build|--dry-run Could even be part of a 'release' workflow in the future so it's automated along with the version ticking.

Happy to hear your thoughts :)

In either case, would it be worth doing CI formatting too? I'm indifferent, wondering what your opinion on this would be.

I'm up for CI'ing all the things, was just introducing the ideas slowly haha

@Voyz
Copy link
Owner

Voyz commented Feb 9, 2025

@weklund many thanks for the detailed writeup on Poetry 🙌 Really appreciate you pointing out these different areas. Here are my thoughts on what you brought up:

In general, I have two observations:

  1. Benefits of Poetry would shine even more on a more complex project - IBind has only two dependencies, four if you install with OAuth, and all are relatively common and easy to install. Should the number was larger or the dependencies be more cumbersome to install, Poetry could possibly solve real pains here. That, or I'm not aware of issues with installing the 4 deps currently present.
  2. The choice seems to have a lot subjective perspective, rather than objective benefits. It feels like being accustomed to doing things in one way, or the other.

Point A

I never have to think about where my project's deps are being installed. I know that it's localized to the project scope.

I understand your thinking here, having to worry about one thing less can be great.

To clarify what you said: 'it's localized to the project scope' - is it though? My understanding is that Poetry installs to some global pypoetry\Cache folder. Correct this if I'm wrong. If that indeed is the case, then this sounds like a personal preference: installing to pypoetry\Cache instead of ~\.virtualenvs.

In Poetry I understand that you just run something like poetry install and don't care which venv it got installed to. Without it you first run ~\.virtualenvs\my_project\Scripts\activate, then pip install. Is the pain here having to type out the activate line or remembering the path?

Having to remember about Python's virtual environments and making sure I'm not doing a pip install that's global is hard.

Could you expand on what makes it hard to you?

Having 15 years of developing with Python I'm gonna need to take a couple steps back here to understand how it may be troublesome to newer developers - appreciate your patience here 🙏 I've personally never found this hard; it's a relatively simple system that you don't need to think much about which boils down to one short sentence:

Unless you're in an active venv shell, all installs are global.

If I want a local install, I need a venv. Yes, it is one extra thing to keep in mind, and things will go not as expected if you forget it, but I wouldn't call it 'hard'. To me it's really simple to keep this one sentence in mind, so I'm curious what could be the pain point from your perspective here.

My dev flow when cloning a python repo at work is git clone, poetry install, poetry shell.

Once again, it's one step extra without Poetry, is that the pain point?

git clone, python -m venv ~\.virtualenvs\my_project, ~\.virtualenvs\my_project\Scripts\activate, pip install -r requirements.txt

And once again, I've done this so many times I'm struggling to see the issue with it. It's remembering one bunch of commands or the other. I can recognise that poetry shell looks nicer than python -m venv ~\.virtualenvs\my_project, ~\.virtualenvs\my_project\Scripts\activate - points for clarity and simplicity. Not sure if that's enough of an argument for the transition, but it certainly makes it look nicer.

...

Btw. doesn't your IDE handle the venv automatically for you? It does for me - I specify the venv when I set up the project and everything from then on happens in that venv for that project - which is possibly why I don't mind doing the activate thing once in a blue moon when I do things outside of the IDE. Possibly my view on this would be different if I really had to do this for every single project over and over again.

Point B

Put another way, the "works on my machine" issue with troubleshooting never happens with lock files, along with it's complex dependency resolution built-in.

Fair enough! If that really does happen here then that would be enough to convince me. Has it happened to you when trying to install deps in IBind? Or when building CI?

I would be cautious of saying 'never happens' as I have the opposite argument here that comes from the exact same place - I've seen Poetry failing to install something in CI or other type of virtual machine; removing it and switching to pip resolved the problem. These are anecdotical and case specific, so not much to add to the argument; let's focus on present situation - can we see any current build issue on your machine or CI that Poetry would solve?

And just to clarify, pip freeze allows you to create a non-Poetry version of the lock file that can also be shared.

allegedly poetry makes package versioning and publishing to pypi much easier.
Not sure what you're current process is for releases

Again, possibly on a more complex project it could. Currently the steps are really easy:

  1. Modify the version in pyproject.toml
  2. Run
    python -m build
    twine upload dist\ibind-[VERSION].tar.g
    

I've considered putting a shell script to do this on several projects, but to be fair - I could never be bothered to; it's too easy as it is.

I agree that if Poetry would turn this into poetry version and poetry publish it would simplify this a bit, but by no means this is making the process much easier. It's never a problem to begin with.


I did some reading on what the internet says on the subject - it seems to be aligned with what you describe. I think main argument brought up is the dependency solving and avoiding the issues when building. There doesn't seem to be a clear winner and the choice does seem somewhat subjective after all; although for larger complex projects, or ones with a large amount of developers, Poetry seems to be recommended.
https://osayiakoko.hashnode.dev/a-comprehensive-guide-to-python-poetry-for-beginners
https://news.ycombinator.com/item?id=38844798
https://community.sap.com/t5/application-development-blog-posts/why-you-should-use-poetry-instead-of-pip-or-conda-for-python-projects/ba-p/13545646
https://nanthony007.medium.com/stop-using-pip-use-poetry-instead-db7164f4fc72
https://www.reddit.com/r/Python/comments/uxcsa1/poetry_over_pip/
https://medium.com/@utkarshshukla.author/managing-python-dependencies-the-battle-between-pip-and-poetry-d12e58c9c168
https://www.exxactcorp.com/blog/Deep-Learning/managing-python-dependencies-with-poetry-vs-conda-pip

(Whatever the outcome, I appreciate you making me take this time to learn more about Poetry!)


In conclusion, it seems a lot like a matter of remembering to do things one way or the other, rather than bringing in game-changing simplification or solutions.

If the main argument is that without Poetry one needs to learn and remember the ~\.virtualenvs\... paths, then while I agree it is extra work I'd compare that with having to learn Poetry's syntax, commands, configuration, etc. It's work in either case. A non-Poetry user would need to learn a few commands to start using it; a Poetry user would need to learn a few commands to start not using it. I'm not fussed about having to be the one learning, but if the only benefit is just to switch one syntax to another and simplify a couple of commands, then it doesn't feel like enough of an argument as I think each extra dependency - and Poetry is one too - should be justified. If there's more to it than just learning a few different commands, please elaborate.

If the main argument is the ease of build across platforms or developer machines, then I'd want to hear if you're running into any problems currently. If so, let's consider it, otherwise I suggest we consider introducing Poetry the moment we observe these happening, but not preemptively.

@Voyz Voyz mentioned this issue Feb 11, 2025
@weklund
Copy link
Author

weklund commented Feb 12, 2025

No problem! I love going deep on these topics. Improves our understanding and helps us "truth seeking" or making the right decision for our needs. 😁

but if the only benefit is just to switch one syntax to another and simplify a couple of commands, then it doesn't feel like enough of an argument as I think each extra dependency - and Poetry is one too - should be justified.

+1000 let's not switch just for some commands to remember :)

I believe there's value past just some commands.

Benefits of Poetry would shine even more on a more complex project - IBind has only two dependencies, four if you install with OAuth

Well theoretically we're about to add several for linting, secure code lint plugin, bandit, etc etc. I believe in the pip world you would have a requirements.txt and a requirements-dev.txt ? I'm not 100% sure but we can easily accomplish this with poetry.

To clarify what you said: 'it's localized to the project scope' - is it though? My understanding is that Poetry installs to some global pypoetry\Cache folder.

Ah whoops sorry I always switch the config to project directory in my .zshrc

poetry config settings.virtualenvs.in-project true

Once again, it's one step extra without Poetry, is that the pain point?

Yep it's more of a papercut :P

Btw. doesn't your IDE handle the venv automatically for you?

Neovim doesn't haha, but I am tinkering with Cursor. I believe it's a fork of VS Code.

Fair enough! If that really does happen here then that would be enough to convince me. Has it happened to you when trying to install deps in IBind? Or when building CI?

In fairness this hasn't happened at all with ibind, and it does seem less likely to happen given the low amount of deps needed.

You’re right that pip freeze can create a lock file of sorts, but Poetry’s poetry.lock goes a step further by ensuring consistent dependency resolution across environments. This is great if we want to run a test suite against different python versions, and different OS architectures.

re: architectures, I've personally spent days debugging an issue where I was building a AWS lambda function in Python, and the bundled deps were using a Darwin OS rust binary that fails to import the Cryptography package when loading the lambda function during the lambda runtime on a linux OS. Happy to deep dive if needed, but this could theoretically happen with the oauth implementation depending on user's runtime.

I know we're using pycryptodome not cryptography, but theoretically there should be some underlying binary to secure the compute for signature generation. Catching an obscure issue like this is better in CI than from a user.

Sorry I don't have anything tangible with my current use of ibind, so the thoughts are "what if" scenarios.

Thanks again for the thoughtful discussion—I really appreciate your openness to exploring this! Let me know what you think. :)

@weklund
Copy link
Author

weklund commented Feb 12, 2025

Regarding flake8, I've actually been doing a deep dive into python linters, and asking my team about a few python open source repos we manage. Seems ruff actually covers the same secure coding linting rules, while being much more performant.

ruff can fulfill our linting and formatting needs, so it's one tool with 2 capabilities. Again I think just to keep the scope of PR small, just adding secure coding linting scope, so we can follow up with a formatting discussion. (I did already check, and it seems ruff pycodestyle will cover pep8 formatting rules 😀 )

@Voyz
Copy link
Owner

Voyz commented Feb 12, 2025

Well theoretically we're about to add several for linting, secure code lint plugin, bandit, etc etc. I believe in the pip world you would have a requirements.txt and a requirements-dev.txt ?

Good point. Yes, you have these two.

Yep it's more of a papercut :P
Neovim doesn't haha, but I am tinkering with Cursor. I believe it's a fork of VS Code.

Haha all good, it's helpful to understand.

Thanks for expanding on the IDE too. So, how would you install a new package in your current setup? In CLI?

In fairness this hasn't happened at all with ibind, and it does seem less likely to happen given the low amount of deps needed.

Understood 👍

You’re right that pip freeze can create a lock file of sorts, but Poetry’s poetry.lock goes a step further by ensuring consistent dependency resolution across environments. This is great if we want to run a test suite against different python versions, and different OS architectures.

You're right that poetry.lock ensures consistent dependency resolution, but we can achieve the same result without Poetry by using pip-compile. For testing across different Python versions and OS architectures, the tool managing dependencies isn’t the key factor; what matters is ensuring compatibility through proper dependency locking and CI setup, which can be done in various ways. If Poetry simplifies this in some way, can you elaborate?

Cryptography

Oh absolutely, I've been having issues with this package too. But it wasn't related to Python, more to missing some c++ compilers, or something along these lines. Does Poetry handle non-package dependency installation, like compilers, too?

In fact, I've removed cryptography package from the OAuth PR exactly for that reason, it was proposed there originally.

I know we're using pycryptodome not cryptography, but theoretically there should be some underlying binary to secure the compute for signature generation. Catching an obscure issue like this is better in CI than from a user.

It's superb you're thinking about this already. We'll still be able to do CI without Poetry though, catching obscure issues like this, right?

Sorry I don't have anything tangible with my current use of ibind, so the thoughts are "what if" scenarios.

Fair enough. And I think it's a valid argument to think about things in advance, but here I'd say if there's nothing to fix with Poetry at the moment then let's focus on other things and revisit this topic should problems that Poetry could address would appear.

Just to clarify - I'm not hating Poetry, I really don't have a strong opinion here, and would gladly give it a shot if good arguments appear for it. I'm just being cautious in introducing it on a 'well everyone is using it!' basis, as I'm reading around that it is a bit of a learning curve and it brings in issues of its own, while as far as I know we have no issues with the current setup.

Thanks again for the thoughtful discussion—I really appreciate your openness to exploring this! Let me know what you think. :)

Likewise! 🙌 Thanks for your patience as well as researching and presenting various arguments for it. I'm enjoying this conversation and the learning alongside it a lot!

@Voyz
Copy link
Owner

Voyz commented Feb 12, 2025

Regarding flake8, I've actually been doing a deep dive into python linters, and asking my team about a few python open source repos we manage. Seems ruff actually covers the same secure coding linting rules, while being much more performant.

Amazing! 👏👏👏 Green light to go ahead with it, and superb job managing to reduce two tools into one. Great job! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants