-
Hi, I am currently working on a project to generate plagiarism report for coding contests submissions. Typically we get around 1-2 thousand submissions per language but sometimes it crosses 2,000. I have observed that for large number of submissions the pair combination is in 10^6 and the time taken to generate plagiarism report is much higher. I wanted to know what are the factors that might be affecting the report generation and what is they average speed of comparison(lets consider average 30 lines of code). I have also observed that by default Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Disabling clusters helps with performance, also, you can adjust At its core, two factors affect the performance of JPlag: The number of submissions (exponential factor due to pairwise comparison) and the size of the submissions (especially affects parsing).
Currently, that is not possible. However, you could implement that for your own version of JPlag by adapting the corresponding methods in |
Beta Was this translation helpful? Give feedback.
Disabling clusters helps with performance, also, you can adjust
--shown-comparisons
if the report generation takes too long. Avoid additional features like--normalize
or--match-merging
, as they can increase the runtime significantly. For two submissions with 30 LOC, the comparison should be a few milliseconds. Finally, you can also increase the min token match with-t
, but this also adjusts the matching sensitivity. I would only do that if the submissions are larger and the results are still good afterward.At its core, two factors affect the performance of JPlag: The number of submissions (exponential factor due to pairwise comparison) and the size of the submissions (especially affect…