From 0434399a7bd830bc18fbdf5b53f8338b2f848c6e Mon Sep 17 00:00:00 2001 From: Raphael Gaschignard Date: Sat, 26 Oct 2024 21:33:55 +1000 Subject: [PATCH 1/4] Add Unasyncify Codegen PEP (pre-draft) --- draft/unasyncify-codegen.rst | 201 +++++++++++++++++++++++++++++++++++ 1 file changed, 201 insertions(+) create mode 100644 draft/unasyncify-codegen.rst diff --git a/draft/unasyncify-codegen.rst b/draft/unasyncify-codegen.rst new file mode 100644 index 00000000..e803e406 --- /dev/null +++ b/draft/unasyncify-codegen.rst @@ -0,0 +1,201 @@ +====================== +DEP XXXX: Unasyncify Codegen +====================== + +:DEP: XXXX +:Author: Raphael Gaschignard +:Implementation Team: TODO (Raphael Gaschignard + others?) +:Shepherd: TODO +:Status: Draft +:Type: Feature +:Created: 2024-10-26 +:Last-Modified: 2014-10-26 + +.. contents:: Table of Contents + :depth: 3 + :local: + +Abstract +======== + +This DEP proposes to add code generation tooling (Unasyncify codegen) to Django as a way to implement DEP 0009 without duplicating code. + +Unasyncify codegen takes an async function and transforms it into a synchronous variant. This involves "eraising" ``await`` s, making sure to call sync versions of async functions found in the function body, and other syntactic-level transformations as needed. + +The code generation lets us have one canonical implmentation (the async one) for APIs using this code generation, while still being able to easily add code that is only used in the async or sync variants. + +The codegen would be run by Django developers and checked into the codebase. This would allow for complete visibility into what the tool does, and ensure that users of Django are not presented with obfuscated code when looking at internal calls. + +Motivation +========== + +In order to implement DEP 0009, we want to implement asynchronous APIs for various synchronous APIs. Generally speaking, the color problem means that implementing async APIs and sync APIs involve writing extremely similar-looking code twice. + +Code duplication is error-prone, because one has to be very confident that there is no implementation drift between each variant. Having a single canonical version would be helpful! + +But if the sync version of ORM APIs are the canonical ones, we cannot take full advantage of what ``async`` offers. Inversely, if the async version of ORM APIs were considered canonical, then sync ORM APIs would likely have to pay a serious performance cost (through something like ``async_to_sync``). + +Psycopg is another project facing a similar difficulty for implementing both synchronous and asynchronous APIs without having either side pay a penalty for the other existing. +`Their strategy `_ was to treat the asynchronous version as canonical, and to generate the code for the synchronous version from the async source. +There is no runtime cost for the sync code relative to the async code, but there is only one implementation that needs to be maintained. + +Progress on DEP 0009 has meant that async variants of APIs have a certain naming convention. This naming convention, combined with the straightforward nature of async-to-sync generation, makes Django's code base amenable to a purely syntactic transformation. + +Specification +============= +A new module is added, ``django.utils.codegen``. Inside it are several decorators: + +* ``generate_unasynced(*, async_unsafe=False)``, a decorator which marks functions to be transformed +* ``from_codegen``, a decorator that marks functions that were generated through unasync codegen. + +A new script is added, ``scripts/run_codegen.sh``, that scans Django's source tree for functions that need unasync codegen applied. + +A function annotated with ``@generate_unasynced()`` is marked to have the unasync codegen applied to it. + +* The annotated function must be an ``async def``'d function +* Its name must follow Django's "async variant" naming convention, either beginning with ``a`` or, in the case of internal functions, ``_a`` + +Unasync codegen will add a new function to the same block, above the existing function implementation. + + +The generated function has the following changes: + +* This function is named as the "sync variant", removing the ``a`` + ``aconnect`` becomes ``connect`` + ``_ainternal_method`` becomes ``_internal_method`` +* The ``@generate_unasynced`` decorator is removed +* A ``@from_codegen`` decorator is added. This both indicates to the reader of code that this is from code generation, and lets code generation be idempotent +* The generated function is synchronous (``def foo``) +* Certain names are replaced in the body, whether they are used as identifiers or attributes. + Currently the mapping is as follows: + + * ``aconnection`` -> ``connection`` + (This allows for simple remapping of ``self.aconnection`` to ``self.connection``) + * ``ASYNC_TRUTH_MARKER`` -> ``False`` + (``ASYNC_TRUTH_MARKER is True``, but we can use this marker to have code know if it is in the async or sync variant of a function) + * Other names might be added based on needs we discover working on this. +* ``if ASYNC_TRUTH_MARKER`` blocks are flattened in the sync variant + Concreteley, this means that the following:: + + if ASYNC_TRUTH_MARKER: + do_thing_a() + else: + do_thing_b() + + Gets flattened to just:: + + do_thing_b() + + +* ``await``'ed expressions are replaced with non-await versions: + + * Within an ``await`` 'ed expression, function calls are examined to see if their name starts with `a`. If so, we replace this with function calls without the `a` + + Concretely, this means that ``await objects.aget(foo=bar)`` will get transformed to ``objects.get(foo=bar)``. + This transformation only happens inside of expressions within an ``await``, so something like ``my_dict.add(foo=bar)`` *will not* be transformed. + + This transformation also only looks at function calls, and not attributes. This is why we also have a separate transformation to handle rewriting ``aconnection`` to ``connection``. + + This does mean that ``await objects.aget(foo=obj.afunc())`` would get transformed to ``objects.get(foo=obj.func())`` (note the change from ``afunc`` to ``func``). + One can avoid this by extracting the call:: + + result = obj.afunc() + await objects.get(foo=result) + + Or one can also use something like ``getattr``:: + + await objects.get(foo=getattr(obj, 'afunc')()) + + This sort of workaround is sufficient to avoid having to have any more complicated "opt out of function renaming" issues. See the Rationale section for a note on this function renaming choice. + +* ``async for`` loops are replaced with ``for`` loops +* ``async with`` blocks are replaced with ``with`` blocks +* If, inside the ``generate_unasynced`` decorator, we have specified ``async_unsafe=True``, then the generated function will have ``@async_unsafe`` applied to it as well. + + +What follows is a concrete example of what the transformation generates. + +Given the following:: + + @generate_unasynced(async_unsafe=True) + async def aconnect(self): + """Connect to the database. Assume that the connection is closed.""" + # Check for invalid configurations. + self._pre_connect() + if ASYNC_TRUTH_MARKER: + # Establish the connection + conn_params = self.get_connection_params(for_async=True) + else: + # Establish the connection + conn_params = self.get_connection_params() + self.aconnection = await self.aget_new_connection(conn_params) + await self.aset_autocommit(self.settings_dict["AUTOCOMMIT"]) + await self.ainit_connection_state() + connection_created.send(sender=self.__class__, connection=self) + + self.run_on_commit = [] + +The following is added *above the ``aconnect``* definition:: + + @from_codegen + @async_unsafe + def connect(self): + """Connect to the database. Assume that the connection is closed.""" + # Check for invalid configurations. + self._pre_connect() + # Establish the connection + conn_params = self.get_connection_params() + self.connection = self.get_new_connection(conn_params) + self.set_autocommit(self.settings_dict["AUTOCOMMIT"]) + self.init_connection_state() + connection_created.send(sender=self.__class__, connection=self) + + self.run_on_commit = [] + + +By running the ``scripts/run_codegen.sh`` script, Django's source tree is scanned for functions with the ``generate_unasync`` decorator, and will rewrite files with that decorator applied according to the above rules. + +Developer Flow Changes +====================== + +With this change, async functions annotated with ``@generate_unasyncify`` will be considered the "canonical" versions, wheras generally (at the time of this writing) the synchronous version has been the canonical version. + +Because of this, developers will need to make sure they make changes to the asynchronous versions of functions annotated with ``@generate_unasyncify``, and not make manual changes to functions with ``@from_codegen`` applied. + +An added step in CI will make sure that unasyncify codegen is applied. This also will help capture whether manual changes to the synchronous versions are unintentionally committed. + +Developers working on annotated code will need to run ``scripts/run_codegen.sh`` and commit changes from this codegen. This has the added benefit of reviewing the result of the codegen, and supervising that the transformation matches what we want. + + + +Rationale +========= + +Factoring out everything but the code flow in a way to minimize code duplication, while doable in a case-by-case way, ultimately means that code would need to be concerned more with async/sync compatibility than with readability as a whole. Factoring out small fragments of code for the sake of async/sync compatibility will make it harder to spot other issues. And even beyond that, the simple act of trying to keep function signatures in sync could lead to issues. + +Run-time trickery to try and have a single implementation for both variants bring up the performance question. Load-time transformation of a single implementation would be costly. + +Code generation is, fundamentally, legible. Though developers aren't directly writing the generated functions, the results will show up in code review, will be diffed against existing implementations, and won't be obfuscated when looked at by users of Django. + +For the specific choice of function call renaming by looking at the name: Using ``a``-prefixed names as a proxy for "async variants of sync APIs" works unreasonably well based on Django's code base. It prevents having to generate a whole list of functions and lets the code transformation remain purely syntactic. + +But importantly, this transformation (that happens *only in ``await`` expressions* and *only on names that are being called*) is legible. One can see the transformation happen (because it is checked in), and if someone identifies this issue, they can apply a workaround. Unlike any runtime routing, issues downstream of this rewrite will be visible immediately. + +This codegen is aimed at supporting Django's efforts at maintaining these APIs, so we can rely on Django's specific naming conventions. It is not aimed at supporting other project's efforts at maintaining async and sync variants. + +Backwards Compatibility +======================= + +Because the annotation and transformations associated to it are opt-in, there are no backwards compatibility concerns. Discussion of handling backwards compatibility related to implementing DEP 0009 are out of scope, in the author's opinion, though very important. + +Reference Implementation +======================== + +`This pull request `_ includes an implementaiton of code generation to move from having sync and async implementations of functions handling database cursors, to a single async implementation (with the sync implementation being derived through code generation). + +This code generation uses `libCST `_, which allows for code transformations that in particular preserve comments and whitespace layouts. +This implementation was done in a couple of hours, almost entirely thanks to the existence of ``libCST``. The simplicity of the implementation should be an indicator of the feasibility. + + +.. rubric:: Footnotes +.. [#color-problem] shortly: I can call sync functions from async functions but not async functions from sync ones. Idea originating from `This blog post `_. From f27f97e5016913cc53e6c61af8171f893359fd11 Mon Sep 17 00:00:00 2001 From: Raphael Gaschignard Date: Sat, 26 Oct 2024 22:33:41 +1000 Subject: [PATCH 2/4] ASYNC_TRUTH_MARKER details --- draft/unasyncify-codegen.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/draft/unasyncify-codegen.rst b/draft/unasyncify-codegen.rst index e803e406..3c50ab45 100644 --- a/draft/unasyncify-codegen.rst +++ b/draft/unasyncify-codegen.rst @@ -48,6 +48,12 @@ A new module is added, ``django.utils.codegen``. Inside it are several decorator * ``generate_unasynced(*, async_unsafe=False)``, a decorator which marks functions to be transformed * ``from_codegen``, a decorator that marks functions that were generated through unasync codegen. +This module also includes the following: + +* ``ASYNC_TRUTH_MARKER`` + This is ``True``. But during codegen this gets replaced with ``False``. This allows us to easily mix async and sync "branches" within the same implementation for the times we need it. + This is a bit similar to ``typing.TYPE_CHECKING``. + A new script is added, ``scripts/run_codegen.sh``, that scans Django's source tree for functions that need unasync codegen applied. A function annotated with ``@generate_unasynced()`` is marked to have the unasync codegen applied to it. From 27f498d26c3fcb341d1490cdb39c13bd47472d4d Mon Sep 17 00:00:00 2001 From: Raphael Gaschignard Date: Tue, 29 Oct 2024 16:09:57 +1000 Subject: [PATCH 3/4] Add notes on testing and libCST --- draft/unasyncify-codegen.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/draft/unasyncify-codegen.rst b/draft/unasyncify-codegen.rst index 3c50ab45..21b89562 100644 --- a/draft/unasyncify-codegen.rst +++ b/draft/unasyncify-codegen.rst @@ -172,7 +172,13 @@ An added step in CI will make sure that unasyncify codegen is applied. This also Developers working on annotated code will need to run ``scripts/run_codegen.sh`` and commit changes from this codegen. This has the added benefit of reviewing the result of the codegen, and supervising that the transformation matches what we want. +Transformation Of Test Code +=========================== +TODO: write out a spec for generating test code in a similar way. The main change required is to the function renaming +technique, but otherwise the same principles apply to test code generation as to the rest. + +The core point here being that you could write an async variant of a test and then generate the sync variant, Rationale ========= @@ -202,6 +208,16 @@ Reference Implementation This code generation uses `libCST `_, which allows for code transformations that in particular preserve comments and whitespace layouts. This implementation was done in a couple of hours, almost entirely thanks to the existence of ``libCST``. The simplicity of the implementation should be an indicator of the feasibility. +Usage of libCST +=============== + +Usage of ``ast`` directly does not preserve comments (although it does preserve docstrings). ``libCST`` does this, as well as offering "management command"-like functionality to syntax transformations, sidestepping design questions involved in how an AST transformation tool should be designed. + +While one could get away with not having line comments inside of functions who are targets for transformations, simply passing in an entire file to ``ast`` for transformation would remove comments from the rest of the file as well! There would need to be some targetted processing of functions to be able to apply only the changes we are interested in. + +If preserving comments within functions that are targetted for code generation is a goal, then this strategybecomes unfeasible without ``libCST``. + +if ``libCST`` is used, it would only be a requirement for people looking to make changes to annotated functions, in the same way that `gettext` is required only for people looking to make changes to text for translation. .. rubric:: Footnotes .. [#color-problem] shortly: I can call sync functions from async functions but not async functions from sync ones. Idea originating from `This blog post `_. From faa09b9ddd909b5ac6aa59d8a6183d9fce0fbdd3 Mon Sep 17 00:00:00 2001 From: Raphael Gaschignard Date: Tue, 29 Oct 2024 16:31:54 +1000 Subject: [PATCH 4/4] Develop out the testing point --- draft/unasyncify-codegen.rst | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/draft/unasyncify-codegen.rst b/draft/unasyncify-codegen.rst index 21b89562..7bf01630 100644 --- a/draft/unasyncify-codegen.rst +++ b/draft/unasyncify-codegen.rst @@ -4,8 +4,8 @@ DEP XXXX: Unasyncify Codegen :DEP: XXXX :Author: Raphael Gaschignard -:Implementation Team: TODO (Raphael Gaschignard + others?) -:Shepherd: TODO +:Implementation Team: Raphael Gaschignard +:Shepherd: TBD :Status: Draft :Type: Feature :Created: 2024-10-26 @@ -46,6 +46,7 @@ Specification A new module is added, ``django.utils.codegen``. Inside it are several decorators: * ``generate_unasynced(*, async_unsafe=False)``, a decorator which marks functions to be transformed +* ``generate_unasynced_test()``, a decorator which marks test functions to be transformed * ``from_codegen``, a decorator that marks functions that were generated through unasync codegen. This module also includes the following: @@ -175,10 +176,20 @@ Developers working on annotated code will need to run ``scripts/run_codegen.sh`` Transformation Of Test Code =========================== -TODO: write out a spec for generating test code in a similar way. The main change required is to the function renaming -technique, but otherwise the same principles apply to test code generation as to the rest. +Tests can be annotated with ``generate_unasynced_test``. The only difference between this and ``@generate_unasynced`` is the naming conventions. + +A test function uses the async variant name as the "canonical" name, and adds the ``_sync`` suffix to the sync variant. So ``test_property`` will be transformed to ``test_property_sync``. + +An example:: + + @from_codegen + async def test_thing_sync(self): + self.assertEqual(Model.objects.count(), 0) + + @generate_unasynced_test() + async def test_thing(self): + self.assertEqual(await Model.objects.acount(), 0) -The core point here being that you could write an async variant of a test and then generate the sync variant, Rationale ========= @@ -205,6 +216,8 @@ Reference Implementation `This pull request `_ includes an implementaiton of code generation to move from having sync and async implementations of functions handling database cursors, to a single async implementation (with the sync implementation being derived through code generation). +It does not include the test function generation. + This code generation uses `libCST `_, which allows for code transformations that in particular preserve comments and whitespace layouts. This implementation was done in a couple of hours, almost entirely thanks to the existence of ``libCST``. The simplicity of the implementation should be an indicator of the feasibility.