PNPM: Handle projects with nested subprojects #9863

oheger-bosch · 2025-01-29T15:40:22Z

This PR changes the parsing of the PNPM list output to handle the special case of nested projects; here the command does not produce syntactically correct JSON. This is related to #9784.

I ran a scan with these changes on an internal project that was failing before, and the generated reports looked at least reasonable. However, I am no expert for PNPM and thus cannot be sure whether the nested project is actually handled correctly. The successful fun tests should prove that no unrelated functionality was broken.

codecov · 2025-01-29T15:55:09Z

Codecov Report

Attention: Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 68.21%. Comparing base (26381f0) to head (54167a6).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...package-managers/node/src/main/kotlin/pnpm/Pnpm.kt	62.50%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #9863      +/-   ##
============================================
+ Coverage     68.12%   68.21%   +0.08%     
- Complexity     1292     1293       +1     
============================================
  Files           250      250              
  Lines          8840     8849       +9     
  Branches        917      920       +3     
============================================
+ Hits           6022     6036      +14     
+ Misses         2431     2424       -7     
- Partials        387      389       +2

Flag	Coverage Δ
funTest-docker	`65.14% <75.00%> (+<0.01%)`	⬆️
funTest-non-docker	`33.37% <ø> (ø)`
test-ubuntu-24.04	`35.98% <33.33%> (+0.22%)`	⬆️
test-windows-2022	`35.96% <33.33%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fviernau · 2025-01-29T15:56:52Z

plugins/package-managers/node/src/main/kotlin/pnpm/ModuleInfo.kt

+    JSON.decodeToSequence<List<ModuleInfo>>(
+        ByteArrayInputStream(json.toByteArray()),
+        DecodeSequenceMode.WHITESPACE_SEPARATED
+    ).toList().flatten()


Hm, looking at [1] it seems that "nested project" is just a project which is in a descendant dir of another project.
If ORT analyzes the parent dir or any ancestor, IIUC it would find both definition files / projects and analyze them separately.

Calling flatten, as above, I'd expect the dependencies of the nested project to become dependencies of the outer one. Why is this intended?

(I wonder if instead of flatten(), first() has to be called. (So that the inner project gets ignored, as it will be analyzed separately anyway)

[1] https://github.com/oss-review-toolkit/ort/tree/fbb3a47bb65f25c0b67c4d312cd0adb3e29960c9/plugins/package-managers/node/src/funTest/assets/projects/synthetic/pnpm/nested-project

Hm, looking at [1] it seems that "nested project" is just a project which is in a descendant dir of another project.
If ORT analyzes the parent dir or any ancestor, IIUC it would find both definition files / projects and analyze them separately.

BTW, some analyzers already have code to filter out definition files from nested vendor directories, which is a similar case, also see #9864.

So maybe also for PNPM, nested definition file should be filtered out, as the pnpm CLI takes care by itself to regard them?

BTW, some analyzers already have code to filter out definition files from nested vendor directories, which is a similar case, also see #9864.

Can you give a hint why you think this is a similar case? (I wonder whether these projects are actually independent besides being in subdirs)

So maybe also for PNPM, nested definition file should be filtered out, as the pnpm CLI takes care by itself to regard them?

There is also the option of disregarding the nested subproject wrt to a definition file resolveDependencies().

Note: For workspaces, it is necessary to skip the analysis of the individual definition files and analyze them from the workspace root in some cases, because the workspace root is where the lockfile resides. How does the lockfile situation look here?

I have run tests with both approaches, using flatten() and first(), against the real project that was failing before. The results are almost identical; the only difference is that in case of flatten() the nested project shows up in the list of projects in the ORT result, while this is not the case with first().

So, what do you prefer? For me both options would be fine as long as the Analyzer run is successful.

On a side note, is it safe to use first()? Can we assume that there is a defined order in the output of PNPM?

So, what do you prefer?

I prefer to analyze the nested project separately, if possible (consider lockfile location), and thus use first / single approach.

On a side note, is it safe to use first()? Can we assume that there is a defined order in the output of PNPM?

Note sure, but it certainly should be possible to find the single item which corresponds to the project in the current working dir. I expect this to be simpler and more correct?!

The preferences of the reviewers are somewhat contradictory with regard to the flatten() vs first() approach.
Would the following be an approach everybody can agree with:

parsePnpmList() just returns a Sequence<List<ModuleInfo>>. Then no information is lost at this point.

PNPM filters the sequence for the list of paths of the top-level project and processes only those.

@fviernau, it is not clear to me based on which criteria this filtering can be done. How can the list that corresponds to the definition file be identified? If there is only a single path involved, I would go for the shortest path, but if there are multiple, I am not sure how to handle this.

Would the following be an approach everybody can agree with:

That would work for me.

PNPM filters the sequence for the list of paths of the top-level project and processes only those.

As long as this filtering is done, so that the nested projects deps are not considered for the out project, it's fine for me.

nit: Regarding the sequence return type. Does it safe any JSON parsing?

Regarding the sequence return type. Does it safe any JSON parsing?

The docs of decodeToSequence indicate so: "Transforms the given stream into lazily deserialized sequence of elements of type T".

nit: Regarding the sequence return type. Does it safe any JSON parsing?

I guess this depends on how the filtering logic can be implemented. If all elements of the sequence have to be processed and compared to each other to find the correct one, then probably not.

sschuberth

Since this seemingly is a regression, did you verify that new results for nested projects match the ones from before the regression?

plugins/package-managers/node/src/main/kotlin/pnpm/ModuleInfo.kt

plugins/package-managers/node/src/test/kotlin/pnpm/ModuleInfoTest.kt

...package-managers/node/src/funTest/assets/projects/synthetic/pnpm/nested-project/package.json

fviernau · 2025-01-30T07:44:49Z

plugins/package-managers/node/src/funTest/kotlin/pnpm/PnpmFunTest.kt

+        result.projects.map { it.id } shouldContainExactlyInAnyOrder listOf(
+            Identifier("PNPM::pnpm-nested-sub-project:1.0.0"),
+            Identifier("PNPM::pnpm-package-with-nested-project:1.0.0")
+        )


Can we make the test similar to the Resolve workspace dependencies correctly in YarnFunTest, so that it uses an entire expected result file.

(If the approach is to ignore the entries for the nested project, I'm ok to not add this test)

The test has been changed.

oheger-bosch · 2025-01-30T08:10:45Z

Since this seemingly is a regression, did you verify that new results for nested projects match the ones from before the regression?

Yes, I compared the new results against the ones from the last successful run of the affected project (scanning the same revision). In fact, the new results now contain significantly more NPM packages (472 vs 194). Since the processing logic of PNPM has been fully rewritten, I cannot easily tell what is causing this difference. And given the size of the dependency graph, it is also not trivial to further analyse the changes.

sschuberth · 2025-01-30T08:14:22Z

In fact, the new results now contain significantly more NPM packages (472 vs 194).

But IIUC, the first() vs. flatten() discussion from above does not have an impact on this discrepancy, correct?

oheger-bosch · 2025-01-30T08:17:32Z

In fact, the new results now contain significantly more NPM packages (472 vs 194).

But IIUC, the first() vs. flatten() discussion from above does not have an impact on this discrepancy, correct?

Yes, this is correct. Both approaches return the larger set of dependencies.

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

oheger-bosch · 2025-01-30T14:54:31Z

Test failures seem to be unrelated.

addressed,

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

plugins/package-managers/node/src/test/kotlin/pnpm/ModuleInfoTest.kt

sschuberth · 2025-01-30T15:10:01Z

plugins/package-managers/node/src/test/kotlin/pnpm/ModuleInfoTest.kt

+
+            val moduleInfos = parsePnpmList(input)
+
+            moduleInfos.toList() shouldContainExactly expectedResults


...you can omit the .toList() conversion here and import io.kotest.matchers.sequences.shouldContainExactly instead of the collection version.

Changed. Both test cases now operate on sequences.

Damn, looks like this now leads to "This sequence can be consumed only once"?

Yes, switched back to returning a List.

I probably lost track a bit, but I'm surprised that the test did not need to change again. Anyway, I'll approve 😄

Ah, that's because parsePnpmList() is tested, not listModules().

fviernau · 2025-01-31T07:07:44Z

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

+    val moduleInfoIterator = iterator()
+    val first = moduleInfoIterator.nextOrNull() ?: return emptyList()
+
+    fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir }


Would the following do the same?

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = first { list -> list.any { File(it.path).absoluteFile == workingDir } }

I don't think so. This will not handle the corner case that no element in the sequence satisfies the condition. The current code then returns the first element as fallback, while the proposed one will fail, right?

Hm, and this one?

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } }.orEmpty()

Another tries:

If we want to assume that there is it least one item:

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } } ?: first()

Otherwise:

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } } ?: firstOrNull().orEmpty()

I hope it's equivalent now?!

On the other hand: I cannot really follow the fear why the condition File(it.path).absoluteFile == workingDir would not work in some unknown cases. I mean, the items do have a path property. And thus the condition should work. I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.

I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.

I currently don't have the time to think about this filter logic in detail.

fviernau · 2025-01-31T07:09:28Z

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

+ * analyzed. This function tries to detect the corresponding [ModuleInfo]s based on the [workingDir]. If this is not
+ * possible, as a fallback the first list of [ModuleInfo] objects is returned.
+ */
+private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> {


consider naming this findModulesFor

The function was renamed.

fviernau · 2025-01-31T07:19:23Z

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

+    val moduleInfoIterator = iterator()
+    val first = moduleInfoIterator.nextOrNull() ?: return emptyList()
+
+    fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir }


Hm, and this one?

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } }.orEmpty()

oheger-bosch · 2025-01-31T07:33:58Z

@fviernau: Well, this is still not equivalent to the existing code.

My line of thinking is as follows: I am not 100 percent sure whether the condition File(it.path).absoluteFile == workingDir to find the outer project will work in all constellations. Therefore, if there is no match, I want to at least return some paths that can be analysed. Obviously, in the vast majority of cases, PNPM will return only a single list of paths - otherwise, this regression had been detected earlier. So, in this case, it seems reasonable to do the analysis based on this list. For special cases, like nested projects, returning some list seems to be better than nothing. Or would you want to skip the analysis then completely?

fviernau · 2025-01-31T07:49:59Z

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt

+    val moduleInfoIterator = iterator()
+    val first = moduleInfoIterator.nextOrNull() ?: return emptyList()
+
+    fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir }


Another tries:

If we want to assume that there is it least one item:

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } } ?: first()

Otherwise:

private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> = firstOrNull { list -> list.any { File(it.path).absoluteFile == workingDir } } ?: firstOrNull().orEmpty()

I hope it's equivalent now?!

On the other hand: I cannot really follow the fear why the condition File(it.path).absoluteFile == workingDir would not work in some unknown cases. I mean, the items do have a path property. And thus the condition should work. I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.

oheger-bosch · 2025-01-31T07:59:21Z

@fviernau, it is not guaranteed that the Sequence can be iterated multiple times; therefore, calling first() as fallback can fail. This is the reason why I am using an Iterator; I could not find a better solution that handles all cases.

Please, this issue is bugging us. Could we defer the "find the most elegant Kotlin solution contest" to a later time?

fviernau · 2025-01-31T08:31:45Z

Could we defer the "find the most elegant Kotlin solution contest" to a later time?

You will do it?

sschuberth · 2025-01-31T08:36:42Z

The unrelated test failure will get addressed by #9879.

oheger-bosch · 2025-01-31T10:05:37Z

Rebased on main to get the fixes for the failing tests.

If PNPM encounters nested projects, the output of the `list` command is not well-formed JSON, but consists of multiple arrays. Change the `parsePnpmList()` function to handle this format correctly. In `Pnpm`, only analyze the top-level project, since nested projects will be handled by the `pnpm` command transparently. Fixes oss-review-toolkit#9784. Signed-off-by: Oliver Heger <[email protected]>

The test verifies that nested projects are now handled correctly. Signed-off-by: Oliver Heger <[email protected]>

oheger-bosch requested a review from a team as a code owner January 29, 2025 15:40

fviernau requested changes Jan 29, 2025

View reviewed changes

sschuberth reviewed Jan 29, 2025

View reviewed changes

fviernau previously requested changes Jan 30, 2025

View reviewed changes

oheger-bosch force-pushed the pnpm_nested_projects branch from 78b62f6 to 930e814 Compare January 30, 2025 13:10

github-advanced-security bot found potential problems Jan 30, 2025

View reviewed changes

plugins/package-managers/node/src/main/kotlin/pnpm/Pnpm.kt Fixed Show fixed Hide fixed

oheger-bosch force-pushed the pnpm_nested_projects branch from 930e814 to 9d07cba Compare January 30, 2025 13:58

oheger-bosch requested review from fviernau and sschuberth January 30, 2025 14:00

oheger-bosch force-pushed the pnpm_nested_projects branch from 9d07cba to 3954294 Compare January 30, 2025 14:06

sschuberth reviewed Jan 30, 2025

View reviewed changes

oheger-bosch force-pushed the pnpm_nested_projects branch from 3954294 to 7344b61 Compare January 31, 2025 06:49

oheger-bosch requested a review from sschuberth January 31, 2025 06:50

fviernau requested changes Jan 31, 2025

View reviewed changes

oheger-bosch force-pushed the pnpm_nested_projects branch from 7344b61 to db06ab3 Compare January 31, 2025 07:18

oheger-bosch requested a review from fviernau January 31, 2025 07:19

fviernau requested changes Jan 31, 2025

View reviewed changes

fviernau previously approved these changes Jan 31, 2025

View reviewed changes

sschuberth previously approved these changes Jan 31, 2025

View reviewed changes

sschuberth mentioned this pull request Jan 31, 2025

bug(pnpm): Add test to check/prove that pnpm builds are failing #9771

Closed

oheger-bosch force-pushed the pnpm_nested_projects branch from db06ab3 to 2b1dc53 Compare January 31, 2025 10:05

sschuberth enabled auto-merge (rebase) January 31, 2025 10:07

oheger-bosch added 2 commits January 31, 2025 12:59

test(pnpm): Add a funTest for a project with a nested subproject

54167a6

The test verifies that nested projects are now handled correctly. Signed-off-by: Oliver Heger <[email protected]>

oheger-bosch dismissed stale reviews from sschuberth and fviernau via 54167a6 January 31, 2025 12:00

oheger-bosch force-pushed the pnpm_nested_projects branch from 2b1dc53 to 54167a6 Compare January 31, 2025 12:00

sschuberth approved these changes Jan 31, 2025

View reviewed changes

sschuberth merged commit fe56fb4 into oss-review-toolkit:main Jan 31, 2025
25 checks passed


		val moduleInfos = parsePnpmList(input)

		moduleInfos.toList() shouldContainExactly expectedResults

PNPM: Handle projects with nested subprojects #9863

PNPM: Handle projects with nested subprojects #9863

Conversation

oheger-bosch commented Jan 29, 2025

codecov bot commented Jan 29, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fviernau Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sschuberth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oheger-bosch commented Jan 30, 2025

sschuberth commented Jan 30, 2025

oheger-bosch commented Jan 30, 2025

oheger-bosch commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fviernau Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oheger-bosch commented Jan 31, 2025

fviernau Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

oheger-bosch commented Jan 31, 2025

fviernau commented Jan 31, 2025

sschuberth commented Jan 31, 2025

oheger-bosch commented Jan 31, 2025

codecov bot commented Jan 29, 2025 •

edited

Loading

fviernau Jan 30, 2025 •

edited

Loading

fviernau Jan 31, 2025 •

edited

Loading

fviernau Jan 31, 2025 •

edited

Loading