chore!: refactor PackageURL by moving String functions to StringUtil #220

jeremylong · 2025-03-22T18:30:42Z

Move string validation and encoding functions to the new StringUtil class
Removed the deprecated public @Nullable String uriDecode(final @Nullable String source)
Update @since 1.6.0 to @since 2.0.0 (this was missed in build: bump major version #219)

…ringUtil

ppkarwasz · 2025-03-23T06:26:03Z

➕ 1 for refactoring String-related methods into their own methods, but I don't understand why the breaking change is necessary? The only breaking change is the removal of PackageURL#uriDecode, I don't think it is worth scaring users with a major release.

ppkarwasz · 2025-03-23T06:48:46Z

src/main/java/com/github/packageurl/utils/StringUtil.java

+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+package com.github.packageurl.utils;


Since these classes are not meant to be used by third-party libraries, I would suggest:

Using com.github.packageurl.internal or similar as package name.

Documenting that the package is internal in package-info.java.

Due to #201, this package will not be exported through JPMS, since package-info.java is not annotated with @Export.

src/main/java/com/github/packageurl/utils/StringUtil.java

src/main/java/com/github/packageurl/PackageURL.java

ppkarwasz · 2025-03-23T07:09:48Z

The only breaking change is the removal of PackageURL#uriDecode, I don't think it is worth scaring users with a major release.

Sorry, I just saw the change to PackageURL#getQualifiers.

jeremylong · 2025-03-23T09:18:51Z

This PR also made minor changes to the percentDecode and percentEncode. The benchmarks before and after show an improvement in percentEncode. However, I'm guessing there is an issue with the benchmark for percentDecode as the times didn't change and are fast enough I'm guessing only the search for an encoded string was not found.

Benchmark of updated percent decode/encode:

Benchmark	(nonAsciiProb)	Mode	Cnt	Score		Error	Units
StringUtilBenchmark.baseline	0	avgt	25	689.715	±	7.937	us/op
StringUtilBenchmark.baseline	0.1	avgt	25	1347.950	±	3.308	us/op
StringUtilBenchmark.baseline	0.5	avgt	25	2609.872	±	5.754	us/op
StringUtilBenchmark.percentDecode	0	avgt	25	191.887	±	0.347	us/op
StringUtilBenchmark.percentDecode	0.1	avgt	25	192.141	±	0.208	us/op
StringUtilBenchmark.percentDecode	0.5	avgt	25	192.031	±	0.190	us/op
StringUtilBenchmark.percentEncode	0	avgt	25	1023.150	±	3.235	us/op
StringUtilBenchmark.percentEncode	0.1	avgt	25	3269.102	±	7.088	us/op
StringUtilBenchmark.percentEncode	0.5	avgt	25	7179.152	±	7.827	us/op

Benchmark	(nonAsciiProb)	Mode	Cnt	Score		Error	Units
PercentEncodingBenchmark.baseline	0	avgt	25	686.520	±	6.698	us/op
PercentEncodingBenchmark.baseline	0.1	avgt	25	1344.912	±	3.942	us/op
PercentEncodingBenchmark.baseline	0.5	avgt	25	2614.673	±	3.389	us/op
PercentEncodingBenchmark.percentDecode	0	avgt	25	191.987	±	0.319	us/op
PercentEncodingBenchmark.percentDecode	0.1	avgt	25	192.025	±	0.227	us/op
PercentEncodingBenchmark.percentDecode	0.5	avgt	25	191.950	±	0.293	us/op
PercentEncodingBenchmark.percentEncode	0	avgt	25	1158.468	±	3.644	us/op
PercentEncodingBenchmark.percentEncode	0.1	avgt	25	2172.666	±	6.813	us/op
PercentEncodingBenchmark.percentEncode	0.5	avgt	25	4432.998	±	12.272	us/op

jeremylong · 2025-03-23T10:07:00Z

I figured out the problem with the benchmark and I'm re-running it now.

ppkarwasz · 2025-03-23T10:14:15Z

I figured out the problem with the benchmark and I'm re-running it now.

The setup() method has a bug (encodedData = encodeData(encodedData)), I'll post an improved benchmark soon.

jeremylong · 2025-03-23T10:16:41Z

I can update the benchmark as part of this PR. I'd push the code - but I'm running the benchmark now and I'd like to see the results in another 2 hours after it runs on both pre and post my updates.

ppkarwasz · 2025-03-23T10:34:29Z

I can update the benchmark as part of this PR. I'd push the code - but I'm running the benchmark now and I'd like to see the results in another 2 hours after it runs on both pre and post my updates.

I fixed and extended the benchmark in #222.

ppkarwasz · 2025-03-23T11:15:39Z

Since nonAsciiProb == 0 (i.e. there are no characters to encode or to decode) is in practice the most common case, we should probably aggressively optimize for it. The percentDecode can be easily optimized by skipping all processing if the are no % characters:

if (source.indexOf(PERCENT_CHAR) == -1) {
    return source;
}

I am not sure if percentEncode can get much better.

jeremylong · 2025-03-23T15:45:17Z

After updating to use the new benchmark. You'll notice that there isn't a lot of change in the percentDecode. Again, I think I have the solution to this and I'll include it in this PR. I'll be back in 4+ hours (I really need to go buy a better dev machine ;)).

With my changes:

Benchmark	(nonAsciiProb)	Mode	Cnt	Score	Error	Units
StringUtilBenchmark.baseline	0	avgt	25	44.430	± 0.247	us/op
StringUtilBenchmark.baseline	0.1	avgt	25	44.335	± 0.371	us/op
StringUtilBenchmark.baseline	0.5	avgt	25	44.474	± 0.256	us/op
StringUtilBenchmark.percentDecode	0	avgt	25	191.667	± 0.247	us/op
StringUtilBenchmark.percentDecode	0.1	avgt	25	191.876	± 0.110	us/op
StringUtilBenchmark.percentDecode	0.5	avgt	25	191.632	± 0.216	us/op
StringUtilBenchmark.percentEncode	0	avgt	25	1012.234	± 8.376	us/op
StringUtilBenchmark.percentEncode	0.1	avgt	25	1001.798	± 4.785	us/op
StringUtilBenchmark.percentEncode	0.5	avgt	25	993.283	± 3.431	us/op
StringUtilBenchmark.toLowerCase	0	avgt	25	97.897	± 0.134	us/op
StringUtilBenchmark.toLowerCase	0.1	avgt	25	97.759	± 0.232	us/op
StringUtilBenchmark.toLowerCase	0.5	avgt	25	98.030	± 0.302	us/op
StringUtilBenchmark.toLowerCaseJre	0	avgt	25	910.749	± 3.538	us/op
StringUtilBenchmark.toLowerCaseJre	0.1	avgt	25	911.569	± 3.500	us/op
StringUtilBenchmark.toLowerCaseJre	0.5	avgt	25	907.451	± 2.940	us/op

Legacy version:

Benchmark	(nonAsciiProb)	Mode	Cnt	Score	Error	Units
PercentEncodingBenchmark.baseline	0	avgt	25	44.774	± 0.299	us/op
PercentEncodingBenchmark.baseline	0.1	avgt	25	44.258	± 0.478	us/op
PercentEncodingBenchmark.baseline	0.5	avgt	25	44.443	± 0.321	us/op
PercentEncodingBenchmark.percentDecode	0	avgt	25	191.934	± 0.276	us/op
PercentEncodingBenchmark.percentDecode	0.1	avgt	25	192.135	± 0.171	us/op
PercentEncodingBenchmark.percentDecode	0.5	avgt	25	191.946	± 0.349	us/op
PercentEncodingBenchmark.percentEncode	0	avgt	25	1161.316	± 7.246	us/op
PercentEncodingBenchmark.percentEncode	0.1	avgt	25	1147.482	± 4.225	us/op
PercentEncodingBenchmark.percentEncode	0.5	avgt	25	1149.934	± 5.693	us/op
PercentEncodingBenchmark.toLowerCase	0	avgt	25	97.918	± 0.352	us/op
PercentEncodingBenchmark.toLowerCase	0.1	avgt	25	97.995	± 0.168	us/op
PercentEncodingBenchmark.toLowerCase	0.5	avgt	25	98.058	± 0.355	us/op
PercentEncodingBenchmark.toLowerCaseJre	0	avgt	25	912.258	± 1.517	us/op
PercentEncodingBenchmark.toLowerCaseJre	0.1	avgt	25	912.719	± 1.942	us/op
PercentEncodingBenchmark.toLowerCaseJre	0.5	avgt	25	914.154	± 3.604	us/op

ppkarwasz · 2025-03-23T18:25:03Z

I'll be back in 4+ hours (I really need to go buy a better dev machine ;)).

JMH tests have a duration expressed in seconds and do not depend on the machine. 😉

ppkarwasz · 2025-03-23T20:15:35Z

In #224 I expanded on this PR by optimizing the case when no percent encoding is needed.

Profiling has shown that shouldEncode is the slowest method.

* feat: Improve benchmark (#222) Fixes a bug in the benchmark initialization and adds a `toLowerCase` benchmark. * fix: Benchmark initialization The benchmark **must** be initialized in a `@Setup` method, otherwise `nonAsciiProb` will always be `0.0`. * fix: Improve encoding/decoding performance for ASCII strings Since strings that don't require **any** percent encoding are in practice the rule, the encoding/decoding code should be optimized for this case.

jeremylong · 2025-03-24T12:04:54Z

@ppkarwasz I think this is good to go. I don't think we can get much more optimization out of the encode/decode and moving the string functions to their own class helps clean up the PackageURL class.

ppkarwasz

LGTM

Around 100 ns per operation on a 256 character long String looks good enough to me.

Maybe we could split the toLowerCase and toLowerCaseJre benchmark method to a separate benchmark class: right now these methods use the test strings for encoding, so there are no favorable test string (e.g. a string with only lowercase characters).

src/main/java/com/github/packageurl/internal/package-info.java

jeremylong added 2 commits March 22, 2025 14:27

BREAKING CHANGE: refactor PackageURL by moving String functions to St…

613aa3d

…ringUtil

style: spotless apply

d5f136c

jeremylong changed the title ~~BREAKING CHANGE: refactor PackageURL by moving String functions to StringUtil~~ chore!: refactor PackageURL by moving String functions to StringUtil Mar 22, 2025

dwalluck mentioned this pull request Mar 22, 2025

build: bump major version #219

Merged

ppkarwasz requested changes Mar 23, 2025

View reviewed changes

ppkarwasz reviewed Mar 23, 2025

View reviewed changes

src/main/java/com/github/packageurl/PackageURL.java Show resolved Hide resolved

jeremylong added 2 commits March 23, 2025 06:39

fix: use internal package

c49dfb5

fix: merge conflicts

8f17eee

jeremylong force-pushed the scratch/refactor-stringutils branch from 006e58a to 8f17eee Compare March 23, 2025 21:10

jeremylong and others added 2 commits March 23, 2025 17:12

style: spotless

576f2c6

jeremylong marked this pull request as ready for review March 24, 2025 12:02

jeremylong requested a review from ppkarwasz March 24, 2025 12:02

ppkarwasz approved these changes Mar 24, 2025

View reviewed changes

src/main/java/com/github/packageurl/internal/package-info.java Show resolved Hide resolved

fix: add jspecify annotations

ce08baa

stevespringett approved these changes Mar 24, 2025

View reviewed changes

stevespringett merged commit 9b9cde4 into master Mar 24, 2025
5 checks passed

stevespringett deleted the scratch/refactor-stringutils branch March 24, 2025 19:48

dwalluck mentioned this pull request Apr 2, 2025

Suggest splitting PackageURL.java into multiple files #186

Closed

chore!: refactor PackageURL by moving String functions to StringUtil #220

chore!: refactor PackageURL by moving String functions to StringUtil #220

Uh oh!

Conversation

jeremylong commented Mar 22, 2025

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

ppkarwasz Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

jeremylong commented Mar 23, 2025

Uh oh!

jeremylong commented Mar 23, 2025

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

jeremylong commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

jeremylong commented Mar 23, 2025

Uh oh!

ppkarwasz commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppkarwasz commented Mar 23, 2025

Uh oh!

jeremylong commented Mar 24, 2025

Uh oh!

ppkarwasz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeremylong commented Mar 23, 2025 •

edited

Loading

ppkarwasz commented Mar 23, 2025 •

edited

Loading