Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP Multiline strings causing line number to be reported incorrectly #623

Open
plotbox-io opened this issue Jan 30, 2024 · 3 comments
Open
Labels
bug An issue contains information about wrong behaviour

Comments

@plotbox-io
Copy link

plotbox-io commented Jan 30, 2024

Describe the bug
When scanning two PHP files that contain a duplicate block, but where one file also has a multi-line string before the block, the line number for the reported error will be off by the number of newlines within the PHP string (like the string is always assumed to be one line in the code that calculates this).

To Reproduce
Steps to reproduce the behavior:
Create one file with contexnts:

<?php

final class FirstClass
{
    /** @inheritDoc * */
    public function someFunction(): void
    {
        $sql = "SELECT 
                    LINE1,
                    LINE2,
                    LINE3
                FROM mysql.table";
    }

    public function imageUri(mixed $result, string $subdomain): string
    {
        $portPart = '';
        if ($this->environment->isDeveloperEnv()) {
            $port = (int) $this->environment->getHttpPort();
            if (!in_array($port, [80, 443])) {
                $portPart = ":$port";
            }
        }

        return "ABC123";
    }
}

Create a second file with:

<?php

final class SecondClass
{
    public function getImageUriBasePath(): string
    {
        $portPart = '';
        if ($this->environment->isDeveloperEnv()) {
            $port = (int) $this->environment->getHttpPort();
            if (!in_array($port, [80, 443])) {
                $portPart = ":$port";
            }
        }

        $subdomain = $this->environment->getSubDomain();

        return "ABC123";
    }
}

Put these files in same directory and run a scan. Observe the incorrect line numbers given for the 'FirstClass' file:

Clone found (php):

  • var/cpdtest/GenealogyImageData.php [11:11 - 21:7] (10 lines, 85 tokens)
    var/cpdtest/PlotImageService.php [5:2 - 15:11]

Then reduce the multiline string starting line 8 to a single line string and then rerun the scan:

Clone found (php):

  • var/cpdtest/GenealogyImageData.php [11:11 - 21:7] (10 lines, 85 tokens)
    var/cpdtest/PlotImageService.php [5:2 - 15:11]

Now the line numbers are correct (note the output is the same but the second time is correct because the code has changed making the line numbers match).

Expected behavior
Line numbers should still be correct even when a PHP file contains multiline strings

Desktop (please complete the following information):

  • OS: Linux Manjaro
  • OS Version: Rolling
  • NodeJS Version: v20.9.0
  • jscpd version: 3.5.10

Note: I have already tried updating the prism PHP language config by copying it into node_modules/reprism/languages/php.js from https://github.com/PrismJS/prism/blob/master/components/prism-php.js with same defect occurring.

@kucherenko kucherenko added the bug An issue contains information about wrong behaviour label May 27, 2024
@davidwebca
Copy link

Oh darn, I just encountered this too. I'll see if I can dig in the code to help.

@davidwebca
Copy link

Alright, I looked into it and the package reprism is definitely to blame since it probably forked a now very old version of PrismJS. I rolled back Prism versions until I encountered the bug to guess around what version multiline strings weren't working and it was around 1.8.0 which fits the range of time around when reprism forked. Reprism was created with the intend of being an esm compatible port, but wasn't really updated whereas Prism is still updated and they're currently working on v2 which will be the modernized esm version. I'll look into another approach to solve my problem and report my findings.

@davidwebca
Copy link

Found it. Not sure it's related to the version of Prism that much anymore, granted the new syntax definition is more accurate at times because of new PHP language features, but nonetheless, here's the guilty part[: there's a place where the alias gets passed as the lang argument]. (

(t: IToken) => (res = res.concat(createTokens(t, token.alias ? sanitizeLangName(token.alias as string) : lang))),
).

        (t: IToken) => (res = res.concat(createTokens(t, token.alias ? sanitizeLangName(token.alias as string) : lang))),

Not sure why, this piece of code is there, but that why some of the tokens don't go through when they're a few levels deep (string -> heredoc / string -> double-quoted-string + interpolations, etc.). A simple fix for PHP right is to simply replace that line with :

(t: IToken) => (res = res.concat(createTokens(t, lang))),

But that breaks the tests with some other languages so I'd need to investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue contains information about wrong behaviour
Projects
None yet
Development

No branches or pull requests

3 participants