Skip to content

Developer Tips for Python code in HATCHet

Vineet Bansal edited this page Jun 16, 2022 · 9 revisions

HATCHet has evolved over time, and includes certain patterns that can aid with addition of new features (and refactoring/cleanup of existing HATCHet code). A non-exhaustive list of these recommendations would include:

  • Always try to add URLs/"magic constants" and any other tweakable knobs in hatchet.ini, in an existing or new section. If a key is added in a section [mysection] as mykey = 42, it is then available to the entire HATCHet codebase as follows:
  from hatchet import config
  print(config.mysection.mykey + 1)  # prints 43

Note that values are automatically cast as int, float or str, as appropriate. A missing value is interpreted as None. Looking for a missing key will raise an Exception. Further, at runtime, this configuration can be overridden (by the developer or the end-user), by setting HATCHET_MYSECTION_MYKEY environment variable.

  • Do you wish to check for certain conditions and raise Exceptions if they are not met? Use the ensure function in hatchet.utils.Supporting if possible.

  • Want to run an external command, or a series of commands, optionally saving stdout/stderr, and check if they ran correctly? See the hatchet.utils.Supporting.run function to see if it will work for you.

  • Trying to write your own multiprocessing code? Struggling with capturing exceptions thrown in the parallel code? This is often done in the HATCHet codebase, so we've recently introduced a submodule that may help. The essential idea is this:

  from hatchet.utils.multiprocessing import Worker

  class MyWorker(Worker):
    def __init__(self, x):
      """
      Save any attributes you wish to maintain internal state,but don't ever modify them in work()!
      """
      self.x = x

    def work(self, *args):
      # Do any work you wish to do, consulting class attributes if you wish
      #   return a result
      return self.x + args[0]
      

  worker = MyWorker(x=42)
  work = [1, 2, 3, 5]  # arguments specifying the work that needs to be done
  # run up to 8 parallel instances of worker; this call waits for all workers to finish.
  results = worker.run(work=work, n_instances=8)
  # Results will conform to the same ordering as the input arguments
  #   regardless of how long each worker took to complete the job
  assert results == [43, 44, 45, 47]

  # Any exception thrown in any worker is propagated to the worker.run() call

  • Do you see that you're writing (or have detected in HATCHet code) the same logic more than once? Consider introducing a new module or function in hatchet.utils and re-using it. Use sensible default arguments to lessen code noise!

  • No commented-out code please! Commented-out code tends to rot much faster since it goes untouched during testing, results in unused variables and imports, and it's almost always unclear why it even exists. If you wish to point something out in the code or maintain a TODO for your future self, please use Github Issues instead. One easy way to do this is to navigate to the source on Github, click on the line number -> Reference in new issue..

Clone this wiki locally