-
Notifications
You must be signed in to change notification settings - Fork 0
Improved error detection BEN-1078 #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge activity
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧪 Benchify Analysis of PR 28
After analyzing the property-based tests, I have identified some key takeaways.
Passing tests: Most tests have passed, indicating that the code under test can correctly handle various scenarios, such as:
- Detecting and categorizing infrastructure-related issues versus code-related errors
- Splitting input strings by new lines and parsing TypeScript errors
- Filtering out non-critical errors and constructing
BuildError
objects - Writing transformed files to the sandbox's working directory
- Catching JSON parsing errors and returning an empty array
- Excluding base packages from the output
- Formatting output as
pkg@version
Failing tests: However, two tests have failed, suggesting that the code under test may have issues with:
- Correctly identifying and returning
BuildError
objects when both code-related and infrastructure-related errors are present in the input - Handling specific edge cases when extracting new packages from the package.json content
Recommendations: To address the failing tests, I recommend reviewing the implementation of detectCodeErrors
and extractNewPackages
to ensure they can correctly handle the mentioned edge cases. Additionally, consider adding more test cases to cover these scenarios.
/** | ||
* Detects if output contains code-related errors (not infrastructure issues) | ||
*/ | ||
export function detectCodeErrors(output: string): ErrorDetectionResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Detects Infrastructure Issues
The function should detect and correctly identify infrastructure-related issues, ensuring they do not get falsely categorized as code-related errors.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[["7z"]]}')... view full input |
200 | 100.0% |
view all inputs
The property-based test has passed. The test successfully validated that the detectCodeErrors
function correctly identifies infrastructure-related issues and does not falsely categorize them as code-related errors. In this case, the input string "{\"json\":[[\"7z\"]]}"
was used to generate an output that included the infrastructure error pattern 'EACCES: permission denied', and the function correctly classified it as infrastructure-only with no code errors.
Unit Tests
// Unit Test for "Detects Infrastructure Issues": The function should detect and correctly identify infrastructure-related issues, ensuring they do not get falsely categorized as code-related errors.
function benchify_s(s) {
return s.replace(/[^a-zA-Z0-9]/g, 'a');
}
it('benchify_s_exec_test_passing_0', () => {
const args = superjson.parse('{"json":[["7z"]]}');
benchify_s(...args);
});
/** | ||
* Detects if output contains code-related errors (not infrastructure issues) | ||
*/ | ||
export function detectCodeErrors(output: string): ErrorDetectionResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Identifies Code-Related Errors
The function should correctly identify and return code-related errors based on specific keywords in the output string.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[["azBaLF"]]}')... view full input |
200 | 100.0% |
view all inputs
The test has passed, indicating that the detectCodeErrors
function correctly identified and handled both code-related errors and infrastructure-related errors. The test provided an input string that triggered both code errors and infrastructure errors, and the function correctly returned hasErrors
as true
for the code errors and false
for the infrastructure errors. The function successfully distinguished between the two types of errors.
Unit Tests
// Unit Test for "Identifies Code-Related Errors": The function should correctly identify and return code-related errors based on specific keywords in the output string.
function benchify_s(s) {
return s.replace(/[^a-zA-Z0-9]/g, 'a');
}
it('benchify_s_exec_test_passing_0', () => {
const args = superjson.parse('{"json":[["azBaLF"]]}');
benchify_s(...args);
});
/** | ||
* Detects if output contains code-related errors (not infrastructure issues) | ||
*/ | ||
export function detectCodeErrors(output: string): ErrorDetectionResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Returns Structured Error Detection Result
The function should return a structured result indicating whether code errors are detected, and if only infrastructure issues are present, this should be flagged.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
❌ | superjson.parse('{"json":[[{"output":"Cannot re... view full input |
95 | 23.8% |
✅ | superjson.parse('{"json":[[{"output":"","expect... view full input |
305 | 76.3% |
view all inputs
Here is a summary of the test results:
The test failed due to an AssertionError. The function detectCodeErrors
did not return the expected result when the input output contained both a code error ("Cannot resolve import") and an infrastructure error ("EACCES: permission denied"). The function should have returned hasErrors: true
and isInfrastructureOnly: false
, but instead returned hasErrors: false
. This indicates that the function does not correctly handle mixed error scenarios.
Stack Trace
Error: expect(received).toBe(expected)
Expected: true
Received: false
at toBe (unknown)
at <anonymous> (/app/repo/lib/pver_346fef25-2a2e-4daa-b5a2-e421ce39beae.test.ts:79:36)
at <anonymous> (/app/configuration/fc.setup.ts:183:11)
at run (/app/node_modules/fast-check/lib/esm/check/property/Property.generic.js:46:33)
at runIt (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:18:30)
at check (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:62:11)
at <anonymous> (/app/configuration/fc.setup.ts:197:14)
at assertWithLogging (/app/configuration/fc.setup.ts:125:3)
at <anonymous> (/app/repo/lib/pver_346fef25-2a2e-4daa-b5a2-e421ce39beae.test.ts:38:6)
Unit Tests
// Unit Test for "Returns Structured Error Detection Result": The function should return a structured result indicating whether code errors are detected, and if only infrastructure issues are present, this should be flagged.
function benchify_codeError(codeError) {
return fc.constantFrom('EACCES: permission denied', 'failed to load config from /app/vite.config.ts', 'error when starting dev server', '/app/node_modules/.vite-temp/')
.chain(infraError => fc.tuple(fc.boolean(), fc.boolean()).map(([includeCode, includeInfra]) => ({
output: `${includeCode ? codeError : ''} ${includeInfra ? infraError : ''}`.trim(),
expectCodeError: includeCode && !includeInfra,
expectInfraError: includeInfra && !includeCode,
expectMixedError: includeCode && includeInfra
})));
}
it('benchify_codeError_exec_test_failing_0', () => {
const args = superjson.parse(
'{"json":[[{"output":"Cannot resolve import EACCES: permission denied","expectCodeError":false,"expectInfraError":false,"expectMixedError":true}]]}',
);
benchify_codeError(...args);
});
/** | ||
* Parses TypeScript compilation errors | ||
*/ | ||
export function parseTypeScriptErrors(stderr: string): BuildError[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Proper Handling of Input Splitting and Iteration
The function should split the input stderr string by new lines and correctly iterate through each line to check for valid TypeScript error patterns.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[[["example.ts(10,15):... view full input |
135 | 33.8% |
❌ | superjson.parse('{"json":[[["main.ts(20,5): err... view full input |
265 | 66.3% |
view all inputs
The test has failed due to a TypeError. The error occurs when trying to access the result of /d+/.exec(line)[0]
, which is null. This suggests that the regular expression /d+/.exec(line)
is not matching the expected pattern in the input string, causing the exec
method to return null. This is happening when processing the input ["{\"json\":[[[\"main.ts(20,5): error TS2304: Cannot find name\",\"xFeU\",\"05\"]]]}"]
, specifically when trying to extract the error code from the line "main.ts(20,5): error TS2304: Cannot find name"
.
Stack Trace
TypeError: null is not an object (evaluating '/d+/.exec(line)[0]')
at <anonymous> (/app/repo/lib/pver_8ab72219-6782-4fda-8229-96aedc28e4f6.test.ts:66:48)
at <anonymous> (/app/configuration/fc.setup.ts:183:11)
at run (/app/node_modules/fast-check/lib/esm/check/property/Property.generic.js:46:33)
at runIt (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:18:30)
at check (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:62:11)
at <anonymous> (/app/configuration/fc.setup.ts:197:14)
at assertWithLogging (/app/configuration/fc.setup.ts:125:3)
at <anonymous> (/app/repo/lib/pver_8ab72219-6782-4fda-8229-96aedc28e4f6.test.ts:38:6)
/** | ||
* Parses TypeScript compilation errors | ||
*/ | ||
export function parseTypeScriptErrors(stderr: string): BuildError[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Filtering of Non-critical Errors
Filter out lines containing 'deprecated', 'unused', or 'implicit any', treating them as non-critical errors and not including them in the output array.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[[5,"cZ"]]}')... view full input |
200 | 100.0% |
view all inputs
The test has passed successfully. The property-based test verified that the parseTypeScriptErrors
function correctly filters out lines containing 'deprecated', 'unused', or 'implicit any' and does not include them in the output array. The test generated 5 error lines with random messages, and the function correctly handled these errors, producing an output that meets the expected criteria.
Unit Tests
// Unit Test for "Filtering of Non-critical Errors": Filter out lines containing 'deprecated', 'unused', or 'implicit any', treating them as non-critical errors and not including them in the output array.
function benchify_lineCount(lineCount, message) {
const lines = Array.from({ length: lineCount }, () => {
const file = "someFile.ts";
const line = `${Math.floor(Math.random() * 100)}`;
const column = `${Math.floor(Math.random() * 100)}`;
const tsErrorNumber = `TS${Math.floor(1000 + Math.random() * 9000)}`;
return `${file}(${line},${column}): error ${tsErrorNumber}: ${message}`;
});
const criticalMessageLines = lines.map(line => line.includes('deprecated') || line.includes('unused') || line.includes('implicit any') ? line.replace(/deprecated|unused|implicit any/, '') : line).join('\\n');
const errors = parseTypeScriptErrors(criticalMessageLines);
expect(errors.every(error => !error.message.includes('deprecated') && !error.message.includes('unused') && !error.message.includes('implicit any'))).toBe(true);
}
it('benchify_lineCount_exec_test_passing_0', () => {
const args = superjson.parse('{"json":[[5,"cZ"]]}');
benchify_lineCount(...args);
});
/** | ||
* Parses errors from build/dev server output | ||
*/ | ||
function parseErrorsFromOutput(output: string): BuildError[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Correctly Parse and Return Genuine Code Errors
The parseErrorsFromOutput function should identify and return an array of BuildError objects representing genuine code-related errors from the given output string while excluding infrastructure-related issues. If no genuine code-related errors are found, the function should return an empty array.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[["seO"]]}')... view full input |
200 | 100.0% |
view all inputs
The test has passed successfully. The parseErrorsFromOutput
function correctly identified and returned an empty array of BuildError
objects, as there were no genuine code-related errors in the input string. The input string was "{\"json\":[[\"seO\"]]}"
which did not contain any syntax errors or other code-related issues.
Unit Tests
// Unit Test for "Correctly Parse and Return Genuine Code Errors": The parseErrorsFromOutput function should identify and return an array of BuildError objects representing genuine code-related errors from the given output string while excluding infrastructure-related issues. If no genuine code-related errors are found, the function should return an empty array.
function benchify_s(s) {
return s.replace(/[^a-zA-Z0-9]/g, 'a');
}
it('benchify_s_exec_test_passing_0', () => {
const args = superjson.parse('{"json":[["seO"]]}');
benchify_s(...args);
});
if (result.stderr.includes('npm ERR!')) { | ||
// Only treat critical npm errors as build errors (not warnings or peer dep issues) | ||
if (result.stderr.includes('npm ERR!') && | ||
(result.stderr.includes('ENOTFOUND') || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ File Writing Verification
Verify that transformed files are written to the sandbox's working directory at the specified paths.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[[[{"path":"DL3o5K~z4"... view full input |
200 | 100.0% |
view all inputs
The property-based test has passed, which means that the createSandbox
function successfully wrote the transformed files to the sandbox's working directory with the correct paths. The test checked that the files were written with the prefix /app/
and that the contents were correctly transformed. The test used an array of files with different paths and contents, and the createSandbox
function correctly handled these files and wrote them to the sandbox.
Unit Tests
// Unit Test for "File Writing Verification": Verify that transformed files are written to the sandbox's working directory at the specified paths.
function benchify_s(s) {
return s.replace(/[^a-zA-Z0-9]/g, 'a');
}
it('benchify_s_exec_test_passing_0', () => {
const args = superjson.parse(
'{"json":[[[{"path":"DL3o5K~z4","content":"aaka"},{"path":"+fu:;Yi","content":"taaa4Lraaa3"},{"path":".N? HmM>","content":"jaatya"},{"path":",JdLIOS","content":"maamaa"}]]]}',
);
benchify_s(...args);
});
} | ||
|
||
return errors; | ||
} | ||
|
||
function extractNewPackages(packageJsonContent: string): string[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Handles Invalid JSON Gracefully
The function should catch JSON parsing errors and return an empty array if the packageJsonContent
is not valid JSON.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[["TmaaN"]]}')... view full input |
200 | 100.0% |
view all inputs
The test has passed. The function extractNewPackages
successfully handled the input string "{\"json\":[[\"TmaaN\"]]}"
by catching the JSON parsing error and returning an empty array as expected. This demonstrates that the function is correctly implemented to return an empty array when the input is not a valid package.json structure.
Unit Tests
// Unit Test for "Handles Invalid JSON Gracefully": The function should catch JSON parsing errors and return an empty array if the `packageJsonContent` is not valid JSON.
function benchify_s(s) {
return s.replace(/[^a-zA-Z0-9]/g, 'a');
}
it('benchify_s_exec_test_passing_0', () => {
const args = superjson.parse('{"json":[["TmaaN"]]}');
benchify_s(...args);
});
} | ||
|
||
return errors; | ||
} | ||
|
||
function extractNewPackages(packageJsonContent: string): string[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Excludes Base Packages
The function should exclude any package that is part of basePackages
from its output.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
✅ | superjson.parse('{"json":[["{\"dependencies\":{... view full input |
200 | 100.0% |
view all inputs
The property-based test has passed, indicating that the extractNewPackages
function successfully excluded base packages from its output. The test input was a package JSON content with various dependencies, and the function correctly identified and returned only the non-base packages. The function's behavior aligns with the expected property description, which states that it should exclude any package that is part of basePackages
from its output.
Unit Tests
// Unit Test for "Excludes Base Packages": The function should exclude any package that is part of `basePackages` from its output.
function benchify_dependencies(dependencies) {
return JSON.stringify({ dependencies });
}
it('benchify_dependencies_exec_test_passing_0', () => {
const args = superjson.parse(
'{"json":[["{\\"dependencies\\":{\\"[_1uJT(/+\\":\\"p\\",\\"]LS\\":\\"!}z\\\\\\\\#&\\",\\"i5Jiu\\":\\"SFzp\\",\\"x0\\":\\"-~#VBB\\\\\\"&!&\\",\\"H_-K@\\\\\\"2ru\\":\\"callertoSt\\",\\"J]gHO&Qs>\\":\\"ref\\"}}"]]}',
);
benchify_dependencies(...args);
});
} | ||
|
||
return errors; | ||
} | ||
|
||
function extractNewPackages(packageJsonContent: string): string[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Output Matches pkg@version Format
Each element in the returned array should be formatted as pkg@version
.
Outcome | Example Input | # Inputs | % of Total |
---|---|---|---|
❌ | superjson.parse('{"json":[[{"#2]|s;48C":"a2z",... view full input |
1121 | 94.8% |
✅ | superjson.parse('{"json":[[{},{"basePackages":[... view full input |
61 | 5.2% |
view all inputs
Here is a summary of the analysis:
The test failed due to an AssertionError
. The extractNewPackages
function returned an array with unexpected elements. Specifically, the function returned an array containing elements in the format of pkg@version
, but with unexpected package names, such as #2]|s;48C@a2z
, =O[@0+rCDzi'
, and HKZw|t\"?L
@"k(YM$n\Ad, instead of the expected
[3#V<@tm]`. This suggests that the function is not correctly filtering out base packages from the dependency list.
Stack Trace
Error: expect(received).toEqual(expected)
[
+ "#2]|s;48C@a2z",
+ "=O[@0+rCDzi'",
+ "HKZw|t\"?L`@\"k(YM$n\\Ad",
"3#V<@Tm"
]
- Expected - 0
+ Received + 3
at toEqual (unknown)
at <anonymous> (/app/repo/lib/pver_78b8a9ce-9877-44b3-818e-1e77a420e42b.test.ts:78:24)
at <anonymous> (/app/configuration/fc.setup.ts:183:11)
at run (/app/node_modules/fast-check/lib/esm/check/property/Property.generic.js:46:33)
at runIt (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:18:30)
at check (/app/node_modules/fast-check/lib/esm/check/runner/Runner.js:62:11)
at <anonymous> (/app/configuration/fc.setup.ts:197:14)
at assertWithLogging (/app/configuration/fc.setup.ts:125:3)
at <anonymous> (/app/repo/lib/pver_78b8a9ce-9877-44b3-818e-1e77a420e42b.test.ts:38:6)
Unit Tests
// Unit Test for "Output Matches pkg@version Format": Each element in the returned array should be formatted as `pkg@version`.
function benchify_packages(packages, additional) {
const completePackages = { ...packages };
additional.basePackages.forEach(basePkg => {
completePackages[basePkg] = '1.0.0'; // Dummy version for base packages
});
completePackages[additional.newPackageName] = additional.newPackageVersion;
const packageJsonContent = JSON.stringify({ dependencies: completePackages });
const result = extractNewPackages(packageJsonContent);
const expectedOutput = [`${additional.newPackageName}@${additional.newPackageVersion}`];
expect(result).toEqual(expectedOutput);
}
it('benchify_packages_exec_test_failing_0', () => {
const args = superjson.parse(
'{"json":[[{"#2]|s;48C":"a2z","=O[":"0+rCDzi\'","HKZw|t\\"?L`":"\\"k(YM$n\\\\Ad"},{"basePackages":["vite","tailwindcss","tailwindcss","tailwindcss","react-dom","@vitejs/plugin-react"],"newPackageName":"3#V<","newPackageVersion":"Tm"}]]}',
);
benchify_packages(...args);
});
269dac1
to
4b77f60
Compare
37fc5d2
to
4d33441
Compare
4d33441
to
fe3cf2c
Compare
TL;DR
Added improved error detection for code generation with debug mode support.
What changed?
error-detection.ts
module with specialized functions for detecting and parsing code errorsHow to test?
debug = true
inapp/api/generate/route.ts
to test with the sample buggy codeWhy make this change?
The previous error detection system was too strict and would flag infrastructure or non-critical issues as errors. This change improves the user experience by focusing on actual code errors that need fixing, while ignoring harmless warnings or infrastructure-related messages. The addition of debug mode also makes it easier to test and improve the error detection system without generating new code each time.