[WIP] Fix overwriting of existing hint files in nightly mining workflow#984
Conversation
JUnit Test Overview ReportGenerated on: 23562097930 Overall Statistics
Test Summary by Plugin
Disabled Tests Detailssandbox_common_test (2 disabled)
sandbox_functional_converter_test (26 disabled)
sandbox_int_to_enum_test (1 disabled)
sandbox_jface_cleanup_test (1 disabled)
sandbox_junit_cleanup_test (36 disabled)
sandbox_platform_helper_test (1 disabled)
This report is automatically generated by the Test Report workflow |
✅ All Tests PassedNo test failures detected. This comment is automatically updated with test failure details for each commit. |
Test Results 265 files 265 suites 4m 6s ⏱️ Results for commit 20c72b7. ♻️ This comment has been updated with latest results. |
Fixes BundledLibrariesTest.testLoadStringModernizationLibrary which expects at least 3 rules and a non-null ID.
…ox-hint Fixes BundledLibrariesTest.testLoadPerformanceLibrary which expects at least 5 rules and a non-null ID.
Fixes BundledLibrariesTest.testLoadCollectionPerformanceLibrary which expects at least 5 rules and a non-null ID.
Fixes BundledLibrariesTest.testLoadMiscLibrary which expects at least 2 rules and a non-null ID.
There was a problem hiding this comment.
Pull request overview
This PR addresses an issue in the mining workflow where generated .sandbox-hint rules could overwrite existing hint file content, by switching the writer logic to append with duplicate detection. It also updates several bundled hint libraries with additional metadata/headers and restores/extends some mined rule sets.
Changes:
- Update
HintFileUpdaterto append validated rules to existing.sandbox-hintfiles and skip duplicates. - Add/restore rules and metadata in several bundled hint libraries under
sandbox_common_core/.../internal/. - Add a new top-level
string-modernization.sandbox-hintfile (appears redundant vs the bundled resource copy).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
sandbox_mining_core/src/main/java/org/sandbox/mining/core/comparison/HintFileUpdater.java |
Append instead of overwrite; add duplicate detection helpers. |
string-modernization.sandbox-hint |
Adds a root-level hint file duplicating the bundled library content. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/string-modernization.sandbox-hint |
Adds header/metadata to the bundled String modernization library. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/performance.sandbox-hint |
Adds header/metadata and new performance rules. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/misc.sandbox-hint |
Adds header/metadata and a toString() rewrite rule. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/jdt-internal-refactoring.sandbox-hint |
Restores ImportName-related rules ahead of existing NameEnvironment rules. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/eclipse-api-modernization.sandbox-hint |
Restores the IPath.toString() -> toOSString() rule. |
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/collection-performance.sandbox-hint |
Adds header/metadata to the bundled collection performance library. |
Comments suppressed due to low confidence (1)
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/string-modernization.sandbox-hint:21
- These
indexOf($sub)patterns can match theString#indexOf(int)overload (and potentially non-String receivers with anindexOfmethod), in which case rewriting tocontains($sub)will not compile or will change meaning. Add type guards (e.g.,instanceof($str, "java.lang.String")andinstanceof($sub, "java.lang.CharSequence")/"java.lang.String") to ensure only the intended overload is rewritten.
$str.indexOf($sub) != -1
=> $str.contains($sub)
;;
$str.indexOf($sub) == -1
=> !$str.contains($sub)
;;
$str.indexOf($sub) >= 0
=> $str.contains($sub)
;;
| // Extract normalized signatures from the new rule | ||
| Set<String> newSignatures = extractRuleSignatures(newRule); | ||
|
|
||
| // Check if ALL new rule signatures already exist | ||
| if (!newSignatures.isEmpty() && existingSignatures.containsAll(newSignatures)) { | ||
| return false; // duplicate — skip | ||
| } | ||
|
|
||
| // Append the new rule, separated by double newline | ||
| String separator = existingContent.endsWith("\n") ? "\n" : "\n\n"; //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Files.writeString(hintFile, existingContent + separator + newRule, StandardCharsets.UTF_8); | ||
| return true; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
appendRuleIfNotDuplicate only skips writing when all signatures in newRule already exist. If newRule contains multiple ;;-terminated rules and only some are duplicates, this will append the whole block and reintroduce duplicates. Consider splitting newRule into individual rule blocks and appending only the missing ones (or rejecting multi-rule payloads here).
| // Extract normalized signatures from the new rule | |
| Set<String> newSignatures = extractRuleSignatures(newRule); | |
| // Check if ALL new rule signatures already exist | |
| if (!newSignatures.isEmpty() && existingSignatures.containsAll(newSignatures)) { | |
| return false; // duplicate — skip | |
| } | |
| // Append the new rule, separated by double newline | |
| String separator = existingContent.endsWith("\n") ? "\n" : "\n\n"; //$NON-NLS-1$ //$NON-NLS-2$ | |
| Files.writeString(hintFile, existingContent + separator + newRule, StandardCharsets.UTF_8); | |
| return true; | |
| } | |
| /** | |
| // Split the incoming block into individual rules and append only the | |
| // ones whose signatures are not already present. | |
| List<String> rules = splitRules(newRule); | |
| List<String> rulesToAppend = new ArrayList<>(); | |
| for (String rule : rules) { | |
| if (rule == null || rule.isBlank()) { | |
| continue; | |
| } | |
| Set<String> ruleSignatures = extractRuleSignatures(rule); | |
| if (!ruleSignatures.isEmpty() && existingSignatures.containsAll(ruleSignatures)) { | |
| // This rule (or an equivalent one) already exists — skip it | |
| continue; | |
| } | |
| // Remember this rule for appending and update the signature set so | |
| // that duplicates within the same payload are also skipped. | |
| rulesToAppend.add(rule); | |
| existingSignatures.addAll(ruleSignatures); | |
| } | |
| if (rulesToAppend.isEmpty()) { | |
| return false; // all rules were duplicates — skip | |
| } | |
| // Append only the new rules, separated from existing content and from | |
| // each other by blank lines. | |
| StringBuilder updatedContent = new StringBuilder(existingContent); | |
| String separator = existingContent.endsWith("\n") ? "\n" : "\n\n"; //$NON-NLS-1$ //$NON-NLS-2$ | |
| updatedContent.append(separator); | |
| for (int i = 0; i < rulesToAppend.size(); i++) { | |
| if (i > 0) { | |
| updatedContent.append("\n\n"); //$NON-NLS-1$ | |
| } | |
| updatedContent.append(rulesToAppend.get(i)); | |
| } | |
| Files.writeString(hintFile, updatedContent.toString(), StandardCharsets.UTF_8); | |
| return true; | |
| } | |
| /** | |
| * Splits a hint file fragment into individual rules, each terminated | |
| * by {@code ;;}. The returned rule strings include their terminating | |
| * {@code ;;}. | |
| * | |
| * @param content the raw hint block that may contain one or more rules | |
| * @return list of individual rule strings | |
| */ | |
| private static List<String> splitRules(String content) { | |
| List<String> rules = new ArrayList<>(); | |
| if (content == null || content.isEmpty()) { | |
| return rules; | |
| } | |
| StringBuilder current = new StringBuilder(); | |
| int length = content.length(); | |
| for (int i = 0; i < length; i++) { | |
| char ch = content.charAt(i); | |
| current.append(ch); | |
| if (ch == ';' && i + 1 < length && content.charAt(i + 1) == ';') { | |
| // Include the second ';' in this rule | |
| i++; | |
| current.append(';'); | |
| rules.add(current.toString()); | |
| current.setLength(0); | |
| } | |
| } | |
| if (current.length() > 0 && !current.toString().isBlank()) { | |
| rules.add(current.toString()); | |
| } | |
| return rules; | |
| } | |
| /** |
| // Append the new rule, separated by double newline | ||
| String separator = existingContent.endsWith("\n") ? "\n" : "\n\n"; //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Files.writeString(hintFile, existingContent + separator + newRule, StandardCharsets.UTF_8); | ||
| return true; |
There was a problem hiding this comment.
appendRuleIfNotDuplicate rewrites the entire file (existingContent + ...) to append a rule. Since you already read the content for duplicate detection, you can avoid full rewrites by writing only the separator + new content using StandardOpenOption.APPEND (and create the file when missing). This reduces the risk of partial writes and scales better as hint files grow.
| /** | ||
| * Appends a rule to the given hint file if it does not already contain | ||
| * an equivalent rule. If the file does not exist, it is created. | ||
| * | ||
| * @param hintFile the target hint file | ||
| * @param newRule the new rule text to append | ||
| * @return {@code true} if the rule was written (file created or updated), | ||
| * {@code false} if the rule was a duplicate and skipped | ||
| * @throws IOException if file I/O fails | ||
| */ | ||
| static boolean appendRuleIfNotDuplicate(Path hintFile, String newRule) throws IOException { | ||
| if (!Files.exists(hintFile)) { | ||
| // New file — write directly | ||
| Files.writeString(hintFile, newRule, StandardCharsets.UTF_8); | ||
| return true; | ||
| } | ||
|
|
||
| // Read existing content | ||
| String existingContent = Files.readString(hintFile, StandardCharsets.UTF_8); | ||
|
|
||
| // Extract normalized signatures from existing rules | ||
| Set<String> existingSignatures = extractRuleSignatures(existingContent); | ||
|
|
||
| // Extract normalized signatures from the new rule | ||
| Set<String> newSignatures = extractRuleSignatures(newRule); | ||
|
|
||
| // Check if ALL new rule signatures already exist | ||
| if (!newSignatures.isEmpty() && existingSignatures.containsAll(newSignatures)) { | ||
| return false; // duplicate — skip | ||
| } | ||
|
|
||
| // Append the new rule, separated by double newline | ||
| String separator = existingContent.endsWith("\n") ? "\n" : "\n\n"; //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Files.writeString(hintFile, existingContent + separator + newRule, StandardCharsets.UTF_8); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The new append/duplicate-detection behavior is not covered by the existing HintFileUpdaterTest (current tests only verify file creation/skips). Adding tests for (1) appending to an existing file, and (2) skipping duplicates (including mixed duplicate/non-duplicate multi-rule input) would help prevent regressions.
| java.lang.Integer.valueOf(0) | ||
| => java.lang.Integer.ZERO | ||
| ;; | ||
|
|
There was a problem hiding this comment.
java.lang.Integer.ZERO is not a JDK constant, so this rewrite will generate uncompilable code. Use a valid expression (e.g., keep Integer.valueOf(0) or use Integer.valueOf(0) with a comment explaining caching) or remove this rule.
| java.lang.Integer.valueOf(0) | |
| => java.lang.Integer.ZERO | |
| ;; |
| new java.lang.String($str) | ||
| => $str | ||
| ;; |
There was a problem hiding this comment.
new java.lang.String($str) => $str is not semantics-preserving in all cases (e.g., code relying on reference identity via ==/System.identityHashCode). Consider making this a hint-only rule (no =>) or add guards to narrow it to clearly safe situations.
| new java.lang.String($str) | |
| => $str | |
| ;; | |
| // Disabled: replacing 'new java.lang.String($str)' with '$str' is not always semantics-preserving | |
| // new java.lang.String($str) | |
| // => $str | |
| ;; // keep terminator line structure for consistency |
| <!id: string-modernization> | ||
| <!description: String API modernization patterns> | ||
| <!severity: info> | ||
| <!minJavaVersion: 8> | ||
| <!tags: string, modernization> | ||
|
|
||
|
|
||
| $str.indexOf($sub) != -1 | ||
| => $str.contains($sub) | ||
| ;; | ||
|
|
||
| $str.indexOf($sub) == -1 | ||
| => !$str.contains($sub) | ||
| ;; | ||
|
|
||
| $str.indexOf($sub) >= 0 | ||
| => $str.contains($sub) | ||
| ;; | ||
|
|
||
| $str.indexOf($sub) < 0 | ||
| => !$str.contains($sub) | ||
| ;; No newline at end of file |
There was a problem hiding this comment.
This duplicate file at repository root is not in the bundled hint-file resources path (sandbox_common_core/src/main/resources/.../internal/) used by the mining workflow, and is likely to be ignored (or cause confusion about which library is active). Consider removing it and keeping only the resource copy under sandbox_common_core/.../internal/.
| <!id: string-modernization> | |
| <!description: String API modernization patterns> | |
| <!severity: info> | |
| <!minJavaVersion: 8> | |
| <!tags: string, modernization> | |
| $str.indexOf($sub) != -1 | |
| => $str.contains($sub) | |
| ;; | |
| $str.indexOf($sub) == -1 | |
| => !$str.contains($sub) | |
| ;; | |
| $str.indexOf($sub) >= 0 | |
| => $str.contains($sub) | |
| ;; | |
| $str.indexOf($sub) < 0 | |
| => !$str.contains($sub) | |
| ;; | |
| # Deprecated duplicate hint file. | |
| # | |
| # This root-level `string-modernization.sandbox-hint` file is intentionally left | |
| # without any active hint definitions to avoid duplicating the canonical | |
| # resource used by the mining workflow. | |
| # | |
| # The active copy of this hint file is located under: | |
| # sandbox_common_core/src/main/resources/.../internal/string-modernization.sandbox-hint | |
| # | |
| # Please make all changes to the internal copy only. |
| } | ||
| sb.append(trimmed); | ||
| } | ||
| return sb.toString(); |
There was a problem hiding this comment.
Duplicate detection is sensitive to formatting differences (e.g., => on its own line vs inline) because normalizeRule preserves line breaks. This can miss duplicates that are semantically identical but formatted differently. Consider normalizing by collapsing all whitespace (or at least joining lines) after stripping comments/directives.
| return sb.toString(); | |
| String withoutExtraWhitespace = sb.toString().replaceAll("\\s+", " "); //$NON-NLS-1$ //$NON-NLS-2$ | |
| return withoutExtraWhitespace.trim(); |
| @@ -10,22 +10,27 @@ | |||
| * | |||
| * Contributors: | |||
| * Carsten Hammer | |||
| *******************************************************************************/ | |||
| *****************************************************************************/ | |||
There was a problem hiding this comment.
The file header banner width/style is inconsistent with the rest of sandbox_mining_core (most files use /******************************************************************************* … *******************************************************************************/). Consider reverting to the standard header format to keep the module consistent.
| java.lang.Integer.parseInt($str) | ||
| => java.lang.Integer.valueOf($str) | ||
| ;; | ||
|
|
There was a problem hiding this comment.
java.lang.Integer.parseInt($str) returns a primitive int, while Integer.valueOf($str) returns Integer; this can change overload resolution and inferred types. If the intent is a performance hint, either keep the return type identical or add guards so it only triggers in a context that requires boxing (or convert this to a hint-only rule).
| java.lang.Integer.parseInt($str) | |
| => java.lang.Integer.valueOf($str) | |
| ;; |
| => System.lineSeparator() | ||
| ;; | ||
|
|
||
| $x.toString() |
There was a problem hiding this comment.
The $x.toString() => String.valueOf($x) rewrite changes behavior when $x is null (NPE vs returning "null"). The DSL guidance recommends guarding this direction with isNullable($x) (or making it a labeled/hint-only rule) to avoid unintended semantic changes.
| $x.toString() | |
| $x.toString() | |
| when !isNullable($x) |
Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.
Original prompt
Problem
The nightly mining workflow (
mining-core.yml) runsHintFileUpdater.writeHintFiles()which overwrites existing.sandbox-hintfiles instead of appending new rules. This causes previously discovered rules to be silently deleted every night.Evidence from PR #983 (2026-03-25)
The diff shows that
Files.writeString(hintFile, rule, StandardCharsets.UTF_8)inHintFileUpdater.javaline 121 replaces the entire file content:deprecated-api.sandbox-hint: The existingnew Float($x) → Float.valueOf($x)andnew Double($x) → Double.valueOf($x)rules were deleted and replaced with Integer/Byte/Short/Character/Long rules.eclipse-api-modernization.sandbox-hint: The existingIPath.toString() → $path.toOSString()rule was deleted and replaced withSubProgressMonitor → SubMonitor.convert.jdt-internal-refactoring.sandbox-hint: The existing rulesjdt.internal.importname.simple-name-extractionandjdt.internal.importname.qualified-name-constructionwere deleted and replaced with newnameenvironmentrules.Root Cause
In
sandbox_mining_core/src/main/java/org/sandbox/mining/core/comparison/HintFileUpdater.java, line ~121:Files.writeStringusesCREATE, TRUNCATE_EXISTING, WRITEby default, which overwrites the file.Required Changes
1. Fix
HintFileUpdater.writeHintFiles()— append instead of overwriteIn
sandbox_mining_core/src/main/java/org/sandbox/mining/core/comparison/HintFileUpdater.java:;;terminators)\n\n) if it's not already presentapplyGaps()method which has the same overwrite patternThe duplicate detection should normalize rules by:
//)<!)2. Fix
PromptBuilder— instruct LLM to include ALL rules, not just new onesIn
sandbox_common_core/src/main/java/org/sandbox/jdt/triggerpattern/llm/PromptBuilder.java:In the
appendTaskSection()method (around line 208-218), add explicit instructions to the LLM prompt:Also update the
targetHintFileschema description (line 240) to be clearer:3. Restore the deleted rules in hint files
Restore the rules that were deleted by the overwrite bug:
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/deprecated-api.sandbox-hint:Add back the missing Float and Double rules while keeping the new Integer/Byte/Short/Character/Long rules:
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/eclipse-api-modernization.sandbox-hint:Add back the IPath rule while keeping the SubProgressMonitor rule:
sandbox_common_core/src/main/resources/org/sandbox/jdt/triggerpattern/internal/jdt-internal-refactoring.sandbox-hint:Add back the ImportName rules while keeping the new nameenvironment rules: