Allow stubgen recursively from CMake #463

laggykiller · 2024-03-10T05:15:55Z

No description provided.

laggykiller · 2024-03-10T05:20:04Z

As always, pypy3.9 decides to randomly fail during test ¯\(ツ)/¯

Looks like pypy has newer release (v7.3.15), perhaps updating it in ci.yml would solve this problem?

wjakob · 2024-03-10T09:41:48Z

Looks like pypy has newer release (v7.3.15), perhaps updating it in ci.yml would solve this problem?

Good idea, I will try 🤞

Regarding the PR: RECURSIVE does not necessarily require OUTPUT_DIR to be specified. In that case, stubgen will place the generated stub directly into the module directory, which I daresay is usually what you want. Perhaps you could mention this in the documentation RST file?

laggykiller · 2024-03-10T09:48:45Z

But what if user specify both --output-dir and --output-file? Looks like it prefers --output-file over --output-dir, should I also add a check in stubgen in this PR so it throws an error if both specified?

wjakob · 2024-03-10T10:01:37Z

That combination doesn't make sense. And output file with recursive also doesn't make sense. (Indeed, stubgen also has some checks for this inside). I was just referring to a part of the text in the docs you wrote that seemed to suggest that RECURSIVE requires OUTPUT_DIR.

laggykiller · 2024-03-10T10:55:32Z

Please check!

wjakob · 2024-03-11T08:43:36Z

I'm still pondering the best API to expose stub generation in Python.

For example, I think that there is a way of getting RECURSIVE to work even when not using INSTALL_MODE. However, CMake would need to be told which output files are generated as part of the traversal. So in that case, there would be two main ways of building stubs:

nanobind_add_stub(
    ...
    MODULE my_ext
    OUTPUT_FILE my_ext/__init__.pyi
)

and

nanobind_add_stub(
    ...
    MODULE my_ext
    RECURSIVE
    OUTPUT_FILES my_ext/__init__.pyi my_ext/submodule/__init__.pyi
)

If INSTALL_TIME is specified, then OUTPUT_FILE or OUTPUT_FILES is optional.

Once things are tracked at that level of granularity, here is nothing against allowing more modules as input even in non-recursive mode, e.g.:

nanobind_add_stub(
    ...
    MODULES my_ext_1 my_ext_2
    OUTPUT_FILES my_ext_1/__init__.pyi my_ext_2/__init__.pyi
)

laggykiller · 2024-03-11T09:19:13Z

You have mentioned that...

Some of my own nanobind-based projects are designed to be importable/usable from the build directory without requiring an extra install step

Do you have any examples of those projects?

I am not sure how those kind of project work, but maybe calling a python script from nanobind-config.cmake that recurse and find out what stub files should be generated would help?

laggykiller · 2024-03-12T05:21:37Z

In my opinion,

nanobind_add_stub(
   ...
   MODULE my_ext
   RECURSIVE
   OUTPUT_FILES my_ext/__init__.pyi my_ext/submodule/__init__.pyi
)

is less clean than

nanobind_add_stub(
   ...
   MODULES my_ext my_ext.submodule
)

If we cannot figure out how to find out what output files need to be generated for non-INSTALL_TIME, I think it is better for us to not implement RECURSIVE for that mode (As the user still need to specify things one-by-one, we are just shifting the headache from MODULE to OUTPUT_FILES, and arguably requiring more keystrokes). Instead, we require user to specify MODULE one-by-one, and find out the OUTPUT_FILES automatically based on MODULE suplied by user. This should be easy with something like:

set(OUTPUT_FILES "")
foreach(M ARG_MODULES)
   string(REPLACE "." "/" M_NEW1)
   string(CONCAT ${M_NEW1} "/__init__.pyi" M_NEW2)
   set(OUTPUT_FILES "${OUTPUT_FILES} ${M_NEW2}")
endforeach()

wjakob · 2024-03-12T10:00:52Z

There is a problem I see with the loop you posted:

set(OUTPUT_FILES "")
foreach(M ARG_MODULES)
   string(REPLACE "." "/" M_NEW1)
   string(CONCAT ${M_NEW1} "/__init__.pyi" M_NEW2)
   set(OUTPUT_FILES "${OUTPUT_FILES} ${M_NEW2}")
endforeach()

In a submodule my_ext.sub, how will you know whether to generate my_ext/sub/__init__.pyi or my_ext/sub.pyi? The difference is important especially in larger binding projects containing a mixture of Python and C++ code.

You were interested in knowing about a project that creates a build-time importable library. See the nanobind_v2 branch of the Dr.Jit project: https://github.com/mitsuba-renderer/drjit/tree/nanobind_v2. This is a project with very complicated bindings, it uses essentially all features of nanobind and is what caused me to start writing nanobind in the first place.

laggykiller · 2024-03-12T12:21:09Z

Two ideas.

Idea 1

I think we actually don't need to know the stubgen output files path in order to let CMake only regenerate stub files if source was changed.

From nanobind-config.cmake

    add_custom_command(
      OUTPUT ${NB_STUBGEN_OUTPUTS}
      COMMAND ${NB_STUBGEN_CMD}
      WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}"
      DEPENDS ${ARG_DEPENDS} "${NB_STUBGEN}" "${ARG_PATTERN_FILE}"
      ${NB_STUBGEN_EXTRA}
    )
    add_custom_target(${name} ALL DEPENDS ${NB_STUBGEN_OUTPUTS})

What actually determines if stubgen.py should be run is DEPENDS in add_custom_command(). As seen from drjit, ARG_DEPENDS contains the target drjit-python which is associated with the cpp source files. Hence if cpp source files associated with target drjit-python are modified, add_custom_command() that depends on target drjit-python would be determined to be run.

This means this should work:

    add_custom_command(
      OUTPUT "whatever_file.txt"
      COMMAND ${NB_STUBGEN_CMD}  # stubgen.py also creates whatever_file.txt
      WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}"
      DEPENDS ${ARG_DEPENDS} "${NB_STUBGEN}" "${ARG_PATTERN_FILE}"
      ${NB_STUBGEN_EXTRA}
    )
    add_custom_target(${name} ALL DEPENDS "whatever_file.txt")

Or even better (Less confident about this though):

    add_custom_command(
      OUTPUT whatever
      COMMAND ${NB_STUBGEN_CMD}
      WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}"
      DEPENDS ${ARG_DEPENDS} "${NB_STUBGEN}" "${ARG_PATTERN_FILE}"
      ${NB_STUBGEN_EXTRA}
    )
    set_source_files_properties(whatever PROPERTIES SYMBOLIC true)
    add_custom_target(${name} ALL DEPENDS whatever)

CMake should not panic if the command generates more file than expected, and probably won't delete those extra files?

Idea 2 (EDIT: This should NOT work)

If above does not work, this shall work:

I searched "cmake add_custom_command with unknown output name" in google, found:

What if we:

Add flag A in stubgen.py that would dump stub files path to stdout, another flag B for only drying running, walking through the module to be stubgened and only figure out what are the submodules and what stub files would be generated
Run stubgen.py with flag A and B in execute_process(). Store the stdout of stubgen.py that contain list of stub files path in OUTPUT_VARIABLE (NB_STUBGEN_OUTPUTS)
Now we know NB_STUBGEN_OUTPUTS, run add_custom_command() add_custom_target()

Yes, this means we technically need to run stubgen.py twice, but one of the run should be fast (?) and maybe it is too greedy for us to both want CMake to be fast + be lazy and not specify what files to be generated?

laggykiller · 2024-03-12T14:01:19Z

The first idea worked! Now it should be possible to recursively generate stub even for non-install time, and only generated if the source was changed.

I have made a new commit. To test, clone my fork, change tests/CMakeLists.txt

nanobind/tests/CMakeLists.txt

Lines 70 to 76 in df8996a

    
           nanobind_add_stub( 
        
             py_stub 
        
             MODULE py_stub_test 
        
             OUTPUT ${PYI_PREFIX}py_stub_test.pyi 
        
             PYTHON_PATH $<TARGET_FILE_DIR:test_stl_ext> 
        
             DEPENDS py_stub_test.py 
        
           )

To the following:

nanobind_add_stub(
  py_stub
  MODULE py_stub_test
  # OUTPUT ${PYI_PREFIX}py_stub_test.pyi
  OUTPUT_DIR ${PYI_PREFIX}
  RECURSIVE
  VERBOSE
  PYTHON_PATH $<TARGET_FILE_DIR:test_stl_ext>
  DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/py_stub_test.py
)

Now cmake build nanobind with test. py_stub_test.pyi should be regenerated only if py_stub_test.py was changed.

Only catch for this method is if py_stub_test.pyi was deleted, it would not be regenerated unless cmake from clean, *_stub.tmp was deleted or changes done to py_stub_test.py

Note: workflow should be successful if change DEPENDS py_stub_test.py to DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/py_stub_test.py in tests/CMakeLists.txt

wjakob · 2024-03-12T14:30:13Z

Interesting, I guess that makes sense and is reminiscent of dummy timestamp files used in Makefiles. One thing this won't do is clean up things correctly. (e.g. ninja --clean)

laggykiller · 2024-03-12T17:07:08Z

One way is to use set_property() to set ADDITIONAL_CLEAN_FILES for .pyi file, but I can't think of how. Glob pattern in set_property() seems not working, and there seems to be no way to store a list of stub file paths generated from add_custom_command() and use set_property() on those stub files. Thinking about this has used more time than I wanted to spend, and I am too busy to think about this problem further...

If there is no way we can mark the .pyi file for deletion during clean, would you still prefer this method? ~~Or would you prefer the method of adding execute_process() before add_custom_command() and add_custom_target()?~~

laggykiller · 2024-03-12T17:37:49Z

This article provides a semi-working solution: https://discourse.cmake.org/t/how-to-cleanup-random-byproducts/3154

Please try the latest commit. Recursively generated files are cleaned reliably (Ninja) or from second time onwards (Unix Makefile)

However, note that file(GLOB) has some bad rep: https://discourse.cmake.org/t/is-glob-still-considered-harmful-with-configure-depends/808

Given that using file(GLOB) only gives us semi-working solution and potential performance issue, you might not want to add that to code. Please share your opinion on this.

laggykiller · 2024-03-12T18:23:12Z

After some thought, seems like the second method of calling execute_process() before running add_custom_command() and add_custom_target() is not an option, as execute_process() is run during configuration time, which .py file and nanobind python modules are absent in the build directory, hence cannot be used for stubgen during configuration time.

Seems like the first idea is the best we can do...

wjakob · 2024-05-23T14:18:03Z

Hi @laggykiller,

sorry about leaving this dormant for a long time. This PR went through a few different phases and I was wondering:

What are the benefits and downsides of the current approach? As far as I can tell, you write a dummy file that helps compute the staleness of outputs. This makes it unnecessary to specify an output file during stub generation. Are these dummy files only used in certain circumstances or always?
You mentioned having added a solution for cleaning files but with some limitations ( Recursively generated files are cleaned reliably (Ninja) or from second time onwards (Unix Makefile)) -- is that still part of the PR? It's a downside if things aren't cleaned up but not a deal-breaker.
Is it still possible to specify explicit file outputs and retain the current functionality?
Why does the commit mention that recursive mode is only available at install time?

laggykiller · 2024-05-23T14:44:37Z

What are the benefits and downsides of the current approach?

Currently this PR allow RECURSIVE mode to be used whether INSTALL_TIME is set or not (Sort of), which is a benefit.

The downsides are:

Recursively generated files are cleaned reliably (Ninja) or from second time onwards (Unix Makefile), which may feel buggy.
file(GLOB) has bad rep of being slow (https://discourse.cmake.org/t/is-glob-still-considered-harmful-with-configure-depends/808), but in my opinion the performance cost should not be significant?

Are these dummy files only created and used in certain circumstances or always?

Dummy files only used for non-INSTALL_TIME.

You mentioned having added a solution for cleaning files but with some limitations ( Recursively generated files are cleaned reliably (Ninja) or from second time onwards (Unix Makefile)) -- is that still part of the PR?

Yes it is part of the PR (5982f36)

Is it still possible to specify explicit file outputs and retain the current functionality?

Yes, if user uses OUTPUT instead of OUTPUT_DIR. Everything works as before if user does not use OUTPUT_DIR.

Why does the commit mention that recursive mode is only available at install time?

I forgot to remove that sentence in the PR, fixed.

I was very busy and many things happened after my last comment here, so I may not remember all the details about the PR.

BrunoB81HK · 2024-11-05T17:28:24Z

What is the status for this feature? This would be very useful for me and I hope you can find time to add it soon.

laggykiller added 2 commits March 10, 2024 12:55

Allow recurse stubgen through cmake

bd7b599

Update documentation

f0f927e

laggykiller mentioned this pull request Mar 10, 2024

Tracking issue: stub generation #420

Closed

laggykiller added 2 commits March 10, 2024 17:54

Allow RECURSIVE without OUTPUT_DIR in CMake stubgen

e43bf9e

Throw error in stubgen if -o used with -O

c64477f

wjakob force-pushed the master branch 2 times, most recently from 56d7e93 to e80edb1 Compare March 11, 2024 17:04

Allow RECURSIVE for non-INSTALL_TIME

2b523c8

Fix allow RECURSIVE for non-INSTALL_TIME

3553850

laggykiller added 2 commits March 13, 2024 01:36

Clean stub files that were generated recursively

5982f36

Fix test

c4c19f9

laggykiller added 2 commits March 13, 2024 01:43

Fix test

feea86e

Fix test

8446001

wjakob force-pushed the master branch 2 times, most recently from dce91b4 to 183f039 Compare March 18, 2024 12:22

wjakob force-pushed the master branch 2 times, most recently from c30294a to af57451 Compare March 22, 2024 08:46

wjakob force-pushed the master branch from 5dea297 to 4148e83 Compare April 2, 2024 14:23

wjakob force-pushed the master branch 4 times, most recently from d7117a4 to 983d6c0 Compare May 22, 2024 15:28

Remove "RECURSIVE could only be used in INSTALL_TIME mode" from docs

29d1e97

wjakob force-pushed the master branch from d022f72 to d78ccba Compare August 21, 2024 01:48

wjakob force-pushed the master branch 2 times, most recently from f9e5e0b to 30e96b7 Compare September 9, 2024 14:54

Groverkss mentioned this pull request Sep 18, 2024

[libshortfin] Fix python stub generation. nod-ai/SHARK-Platform#195

Merged

wjakob force-pushed the master branch from 96cca6c to ee23846 Compare September 20, 2024 02:01

wjakob force-pushed the master branch 2 times, most recently from f3e2796 to bff96e2 Compare October 4, 2024 03:20

wjakob force-pushed the master branch from 046c7a1 to e262b7c Compare October 16, 2024 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow stubgen recursively from CMake #463

Allow stubgen recursively from CMake #463

laggykiller commented Mar 10, 2024

laggykiller commented Mar 10, 2024 •

edited

Loading

wjakob commented Mar 10, 2024

laggykiller commented Mar 10, 2024 •

edited

Loading

wjakob commented Mar 10, 2024

laggykiller commented Mar 10, 2024

wjakob commented Mar 11, 2024

laggykiller commented Mar 11, 2024

laggykiller commented Mar 12, 2024 •

edited

Loading

wjakob commented Mar 12, 2024

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

wjakob commented Mar 12, 2024

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024

wjakob commented May 23, 2024

laggykiller commented May 23, 2024 •

edited

Loading

BrunoB81HK commented Nov 5, 2024

Allow stubgen recursively from CMake #463

Are you sure you want to change the base?

Allow stubgen recursively from CMake #463

Conversation

laggykiller commented Mar 10, 2024

laggykiller commented Mar 10, 2024 • edited Loading

wjakob commented Mar 10, 2024

laggykiller commented Mar 10, 2024 • edited Loading

wjakob commented Mar 10, 2024

laggykiller commented Mar 10, 2024

wjakob commented Mar 11, 2024

laggykiller commented Mar 11, 2024

laggykiller commented Mar 12, 2024 • edited Loading

wjakob commented Mar 12, 2024

laggykiller commented Mar 12, 2024 • edited Loading

Idea 1

Idea 2 (EDIT: This should NOT work)

laggykiller commented Mar 12, 2024 • edited Loading

wjakob commented Mar 12, 2024

laggykiller commented Mar 12, 2024 • edited Loading

laggykiller commented Mar 12, 2024 • edited Loading

laggykiller commented Mar 12, 2024

wjakob commented May 23, 2024

laggykiller commented May 23, 2024 • edited Loading

BrunoB81HK commented Nov 5, 2024

laggykiller commented Mar 10, 2024 •

edited

Loading

laggykiller commented Mar 10, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented Mar 12, 2024 •

edited

Loading

laggykiller commented May 23, 2024 •

edited

Loading