-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PNPM: Handle projects with nested subprojects #9863
PNPM: Handle projects with nested subprojects #9863
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9863 +/- ##
============================================
+ Coverage 68.12% 68.21% +0.08%
- Complexity 1292 1293 +1
============================================
Files 250 250
Lines 8840 8849 +9
Branches 917 920 +3
============================================
+ Hits 6022 6036 +14
+ Misses 2431 2424 -7
- Partials 387 389 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
JSON.decodeToSequence<List<ModuleInfo>>( | ||
ByteArrayInputStream(json.toByteArray()), | ||
DecodeSequenceMode.WHITESPACE_SEPARATED | ||
).toList().flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looking at [1] it seems that "nested project" is just a project which is in a descendant dir of another project.
If ORT analyzes the parent dir or any ancestor, IIUC it would find both definition files / projects and analyze them separately.
Calling flatten, as above, I'd expect the dependencies of the nested project to become dependencies of the outer one. Why is this intended?
(I wonder if instead of flatten()
, first()
has to be called. (So that the inner project gets ignored, as it will be analyzed separately anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looking at [1] it seems that "nested project" is just a project which is in a descendant dir of another project.
If ORT analyzes the parent dir or any ancestor, IIUC it would find both definition files / projects and analyze them separately.
BTW, some analyzers already have code to filter out definition files from nested vendor directories, which is a similar case, also see #9864.
So maybe also for PNPM, nested definition file should be filtered out, as the pnpm
CLI takes care by itself to regard them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, some analyzers already have code to filter out definition files from nested vendor directories, which is a similar case, also see #9864.
Can you give a hint why you think this is a similar case? (I wonder whether these projects are actually independent besides being in subdirs)
So maybe also for PNPM, nested definition file should be filtered out, as the pnpm CLI takes care by itself to regard them?
There is also the option of disregarding the nested subproject wrt to a definition file resolveDependencies()
.
Note: For workspaces, it is necessary to skip the analysis of the individual definition files and analyze them from the workspace root in some cases, because the workspace root is where the lockfile resides. How does the lockfile situation look here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run tests with both approaches, using flatten()
and first()
, against the real project that was failing before. The results are almost identical; the only difference is that in case of flatten()
the nested project shows up in the list of projects in the ORT result, while this is not the case with first()
.
So, what do you prefer? For me both options would be fine as long as the Analyzer run is successful.
On a side note, is it safe to use first()
? Can we assume that there is a defined order in the output of PNPM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what do you prefer?
I prefer to analyze the nested project separately, if possible (consider lockfile location), and thus use first / single approach.
On a side note, is it safe to use
first()
? Can we assume that there is a defined order in the output of PNPM?
Note sure, but it certainly should be possible to find the single
item which corresponds to the project in the current working dir. I expect this to be simpler and more correct?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferences of the reviewers are somewhat contradictory with regard to the flatten()
vs first()
approach.
Would the following be an approach everybody can agree with:
parsePnpmList()
just returns aSequence<List<ModuleInfo>>
. Then no information is lost at this point.PNPM
filters the sequence for the list of paths of the top-level project and processes only those.
@fviernau, it is not clear to me based on which criteria this filtering can be done. How can the list that corresponds to the definition file be identified? If there is only a single path involved, I would go for the shortest path, but if there are multiple, I am not sure how to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following be an approach everybody can agree with:
That would work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PNPM filters the sequence for the list of paths of the top-level project and processes only those.
As long as this filtering is done, so that the nested projects deps are not considered for the out project, it's fine for me.
nit: Regarding the sequence return type. Does it safe any JSON parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the sequence return type. Does it safe any JSON parsing?
The docs of decodeToSequence
indicate so: "Transforms the given stream into lazily deserialized sequence of elements of type T".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Regarding the sequence return type. Does it safe any JSON parsing?
I guess this depends on how the filtering logic can be implemented. If all elements of the sequence have to be processed and compared to each other to find the correct one, then probably not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this seemingly is a regression, did you verify that new results for nested projects match the ones from before the regression?
plugins/package-managers/node/src/main/kotlin/pnpm/ModuleInfo.kt
Outdated
Show resolved
Hide resolved
plugins/package-managers/node/src/test/kotlin/pnpm/ModuleInfoTest.kt
Outdated
Show resolved
Hide resolved
...package-managers/node/src/funTest/assets/projects/synthetic/pnpm/nested-project/package.json
Show resolved
Hide resolved
result.projects.map { it.id } shouldContainExactlyInAnyOrder listOf( | ||
Identifier("PNPM::pnpm-nested-sub-project:1.0.0"), | ||
Identifier("PNPM::pnpm-package-with-nested-project:1.0.0") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the test similar to the Resolve workspace dependencies correctly
in YarnFunTest
, so that it uses an entire expected result file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If the approach is to ignore the entries for the nested project, I'm ok to not add this test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test has been changed.
Yes, I compared the new results against the ones from the last successful run of the affected project (scanning the same revision). In fact, the new results now contain significantly more NPM packages (472 vs 194). Since the processing logic of PNPM has been fully rewritten, I cannot easily tell what is causing this difference. And given the size of the dependency graph, it is also not trivial to further analyse the changes. |
But IIUC, the |
Yes, this is correct. Both approaches return the larger set of dependencies. |
78b62f6
to
930e814
Compare
930e814
to
9d07cba
Compare
9d07cba
to
3954294
Compare
Test failures seem to be unrelated. |
plugins/package-managers/node/src/test/kotlin/pnpm/ModuleInfoTest.kt
Outdated
Show resolved
Hide resolved
|
||
val moduleInfos = parsePnpmList(input) | ||
|
||
moduleInfos.toList() shouldContainExactly expectedResults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...you can omit the .toList()
conversion here and import io.kotest.matchers.sequences.shouldContainExactly
instead of the collection version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Both test cases now operate on sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, looks like this now leads to "This sequence can be consumed only once"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switched back to returning a List
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably lost track a bit, but I'm surprised that the test did not need to change again. Anyway, I'll approve 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's because parsePnpmList()
is tested, not listModules()
.
3954294
to
7344b61
Compare
val moduleInfoIterator = iterator() | ||
val first = moduleInfoIterator.nextOrNull() ?: return emptyList() | ||
|
||
fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following do the same?
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
first { list ->
list.any { File(it.path).absoluteFile == workingDir }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This will not handle the corner case that no element in the sequence satisfies the condition. The current code then returns the first element as fallback, while the proposed one will fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, and this one?
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
}.orEmpty()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another tries:
If we want to assume that there is it least one item:
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
} ?: first()
Otherwise:
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
} ?: firstOrNull().orEmpty()
I hope it's equivalent now?!
On the other hand: I cannot really follow the fear why the condition File(it.path).absoluteFile == workingDir
would not work in some unknown cases. I mean, the items do have a path
property. And thus the condition should work. I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.
I currently don't have the time to think about this filter logic in detail.
* analyzed. This function tries to detect the corresponding [ModuleInfo]s based on the [workingDir]. If this is not | ||
* possible, as a fallback the first list of [ModuleInfo] objects is returned. | ||
*/ | ||
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider naming this findModulesFor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function was renamed.
7344b61
to
db06ab3
Compare
val moduleInfoIterator = iterator() | ||
val first = moduleInfoIterator.nextOrNull() ?: return emptyList() | ||
|
||
fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, and this one?
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
}.orEmpty()
@fviernau: Well, this is still not equivalent to the existing code. My line of thinking is as follows: I am not 100 percent sure whether the condition |
val moduleInfoIterator = iterator() | ||
val first = moduleInfoIterator.nextOrNull() ?: return emptyList() | ||
|
||
fun List<ModuleInfo>.matchesWorkingDir() = any { File(it.path).absoluteFile == workingDir } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another tries:
If we want to assume that there is it least one item:
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
} ?: first()
Otherwise:
private fun Sequence<List<ModuleInfo>>.findModules(workingDir: File): List<ModuleInfo> =
firstOrNull { list ->
list.any { File(it.path).absoluteFile == workingDir }
} ?: firstOrNull().orEmpty()
I hope it's equivalent now?!
On the other hand: I cannot really follow the fear why the condition File(it.path).absoluteFile == workingDir
would not work in some unknown cases. I mean, the items do have a path
property. And thus the condition should work. I would actually prefer to make the assumption that there is a match, but would also like to hear @sschuberth if he agrees too.
@fviernau, it is not guaranteed that the Please, this issue is bugging us. Could we defer the "find the most elegant Kotlin solution contest" to a later time? |
You will do it? |
The unrelated test failure will get addressed by #9879. |
db06ab3
to
2b1dc53
Compare
Rebased on main to get the fixes for the failing tests. |
If PNPM encounters nested projects, the output of the `list` command is not well-formed JSON, but consists of multiple arrays. Change the `parsePnpmList()` function to handle this format correctly. In `Pnpm`, only analyze the top-level project, since nested projects will be handled by the `pnpm` command transparently. Fixes oss-review-toolkit#9784. Signed-off-by: Oliver Heger <[email protected]>
The test verifies that nested projects are now handled correctly. Signed-off-by: Oliver Heger <[email protected]>
54167a6
2b1dc53
to
54167a6
Compare
This PR changes the parsing of the PNPM
list
output to handle the special case of nested projects; here the command does not produce syntactically correct JSON. This is related to #9784.I ran a scan with these changes on an internal project that was failing before, and the generated reports looked at least reasonable. However, I am no expert for PNPM and thus cannot be sure whether the nested project is actually handled correctly. The successful fun tests should prove that no unrelated functionality was broken.