Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

package org.junit.jupiter.params.provider;

import static org.apiguardian.api.API.Status.EXPERIMENTAL;
import static org.apiguardian.api.API.Status.STABLE;

import java.lang.annotation.Documented;
Expand Down Expand Up @@ -235,4 +236,17 @@
@API(status = STABLE, since = "5.10")
boolean ignoreLeadingAndTrailingWhitespace() default true;

/**
* The character used to denote comments when reading the CSV files.
*
* <p>Any line that begins with this character will be treated as a comment and ignored
* during parsing.
*
* <p>Defaults to a hash tag {@code #}.
*
* @since 6.0.1
*/
@API(status = EXPERIMENTAL, since = "6.0.1")
char commentCharacter() default '#';

}
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import java.nio.charset.Charset;
import java.util.Set;
import java.util.UUID;
import java.util.stream.Stream;

import de.siegmar.fastcsv.reader.CsvCallbackHandler;
import de.siegmar.fastcsv.reader.CsvReader;
Expand Down Expand Up @@ -65,6 +66,7 @@ private static void validateDelimiter(char delimiter, String delimiterString, An

static CsvReader<? extends CsvRecord> createReaderFor(CsvSource csvSource, String data) {
String delimiter = selectDelimiter(csvSource.delimiter(), csvSource.delimiterString());
validateControlCharactersDiffer(delimiter, csvSource.quoteCharacter(), csvSource.commentCharacter());
// @formatter:off
var builder = CsvReader.builder()
.skipEmptyLines(SKIP_EMPTY_LINES)
Expand All @@ -73,7 +75,8 @@ static CsvReader<? extends CsvRecord> createReaderFor(CsvSource csvSource, Strin
.allowMissingFields(ALLOW_MISSING_FIELDS)
.fieldSeparator(delimiter)
.quoteCharacter(csvSource.quoteCharacter())
.commentStrategy(csvSource.textBlock().isEmpty() ? NONE : SKIP);
.commentStrategy(csvSource.textBlock().isEmpty() ? NONE : SKIP)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m concerned that using CsvSource#value results in an exception with delimiter = '#', even though the comment strategy for value is NONE, meaning that all comments ignored:

@ParameterizedTest
@CsvSource(value = "test", delimiter = '#')
void test(String str) {
}

@osiegmar, could we have FastCSV ignore a commentCharacter that duplicates the fieldSeparator or quoteCharacter when CommentStrategy.NONE is used?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdmitrienko I think so, yes! I can take a look at it tomorrow or the day after and let you know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly a dumb question and maybe not the right place to discuss this, but I'm going to ask anyway... 🙂

@osiegmar Even if CommentStrategy isn't NONE, couldn't FastCSV allow using the same char as commentCharacter and quoteCharacter since lines must start with commentCharacter in order to be treated as comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dumb question at all, because... I was thinking the exact same thing. 🤓

Is there a technical reason that FastCSV does not allow the comment character and delimiter (field separator) to be the same?

Or is that rather a potential enhancement for FastCSV to support that like the Univocity parser did?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting the same character for both field separator and comment character can lead to ambiguity.

Consider content like this:

foo,bar
foo,
,bar

Now, imagine the same content, but using # as both the field separator and the comment character:

foo#bar
foo#
#bar

The meaning in the last line is ambiguous. Is it a comment or a record with an empty first field and "bar" as the second field?

For the sake of clarity and reliability, FastCSV does not allow to use the same character for both purposes at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example, @osiegmar.

I agree that the latter is ambiguous, even if it was supported previously in JUnit Jupiter.

And I admit: I personally find it a bit odd to use the same character for both (even if it previously worked).

So, perhaps it's best to:

  1. Document the change in behavior as a breaking change.
  2. Introduce support for configuring a custom comment character (as in this PR).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdmitrienko I’ve created the branch here: https://github.com/osiegmar/FastCSV/tree/feat/ignore-same-comment-char. Like you suggested, the comment character is ignored for CommentStrategy.NONE.

If everything looks good on your end, I can go ahead and release it quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osiegmar, I've just checked the changes you implemented, they work exactly as expected. Thank you!

Could you, please, proceed to releasing them?

In the meantime, I'll update the PR accordingly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdmitrienko Awesome! FastCSV 4.1.0 has been released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osiegmar Thanks for the super quick turnaround!

Updated in #5046

.commentCharacter(csvSource.commentCharacter());

var callbackHandler = createCallbackHandler(
csvSource.emptyValue(),
Expand All @@ -90,6 +93,7 @@ static CsvReader<? extends CsvRecord> createReaderFor(CsvFileSource csvFileSourc
Charset charset) {

String delimiter = selectDelimiter(csvFileSource.delimiter(), csvFileSource.delimiterString());
validateControlCharactersDiffer(delimiter, csvFileSource.quoteCharacter(), csvFileSource.commentCharacter());
// @formatter:off
var builder = CsvReader.builder()
.skipEmptyLines(SKIP_EMPTY_LINES)
Expand All @@ -98,7 +102,8 @@ static CsvReader<? extends CsvRecord> createReaderFor(CsvFileSource csvFileSourc
.allowMissingFields(ALLOW_MISSING_FIELDS)
.fieldSeparator(delimiter)
.quoteCharacter(csvFileSource.quoteCharacter())
.commentStrategy(SKIP);
.commentStrategy(SKIP)
.commentCharacter(csvFileSource.commentCharacter());

var callbackHandler = createCallbackHandler(
csvFileSource.emptyValue(),
Expand All @@ -121,6 +126,18 @@ private static String selectDelimiter(char delimiter, String delimiterString) {
return DEFAULT_DELIMITER;
}

private static void validateControlCharactersDiffer(String delimiter, char quoteCharacter, char commentCharacter) {
long uniqueCharacterCount = Stream.of(delimiter, quoteCharacter, commentCharacter) //
.map(String::valueOf) //
.distinct() //
.count();

String message = //
"(delimiter|delimiterString): '%s', quoteCharacter: '%s', and commentCharacter: '%s' must all differ";
Preconditions.condition(uniqueCharacterCount == 3,
() -> message.formatted(delimiter, quoteCharacter, commentCharacter));
}

private static CsvCallbackHandler<? extends CsvRecord> createCallbackHandler(String emptyValue,
Set<String> nullValues, boolean ignoreLeadingAndTrailingWhitespaces, int maxCharsPerColumn,
boolean useHeadersInDisplayName) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

package org.junit.jupiter.params.provider;

import static org.apiguardian.api.API.Status.EXPERIMENTAL;
import static org.apiguardian.api.API.Status.STABLE;

import java.lang.annotation.Documented;
Expand Down Expand Up @@ -296,4 +297,17 @@
@API(status = STABLE, since = "5.10")
boolean ignoreLeadingAndTrailingWhitespace() default true;

/**
* The character used to denote comments in a {@linkplain #textBlock text block}.
*
* <p>Any line that begins with this character will be treated as a comment and ignored
* during parsing.
*
* <p>Defaults to a hash tag {@code #}.
*
* @since 6.0.1
*/
@API(status = EXPERIMENTAL, since = "6.0.1")
char commentCharacter() default '#';

}
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,47 @@ void honorsCommentCharacterWhenUsingTextBlockAttribute() {
assertThat(arguments).containsExactly(array("bar", "#baz"), array("#bar", "baz"));
}

@Test
void honorsCustomCommentCharacter() {
var annotation = csvSource().textBlock("""
*foo
bar, *baz
'*bar', baz
""").commentCharacter('*').build();

var arguments = provideArguments(annotation);

assertThat(arguments).containsExactly(array("bar", "*baz"), array("*bar", "baz"));
}

@ParameterizedTest
@MethodSource("invalidControlCharacterCombinations")
void throwsExceptionWhenControlCharactersNotDiffer(Object delimiter, char quoteCharacter, char commentCharacter) {
var builder = csvSource().textBlock("""
foo""").quoteCharacter(quoteCharacter).commentCharacter(commentCharacter);

var annotation = delimiter instanceof Character c //
? builder.delimiter(c).build() //
: builder.delimiterString(delimiter.toString()).build();

var message = "(delimiter|delimiterString): '%s', quoteCharacter: '%s', and commentCharacter: '%s' must all differ";
assertPreconditionViolationFor(() -> provideArguments(annotation).findAny()) //
.withMessage(message.formatted(delimiter, quoteCharacter, commentCharacter));
}

static Stream<Arguments> invalidControlCharacterCombinations() {
return Stream.of(
// delimiter
Arguments.of('#', '#', '#'), //
Arguments.of('#', '#', '*'), //
Arguments.of('*', '#', '#'), //
Arguments.of('#', '*', '#'), //
// delimiterString
Arguments.of("#", '#', '*'), //
Arguments.of("#", '*', '#') //
);
}

@Test
void supportsCsvHeadersWhenUsingTextBlockAttribute() {
var annotation = csvSource().useHeadersInDisplayName(true).textBlock("""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,34 @@ void ignoresCommentedOutEntries() {
assertThat(arguments).containsExactly(array("foo", "bar"));
}

@Test
void honorsCustomCommentCharacter() {
var annotation = csvFileSource()//
.resources("test.csv")//
.commentCharacter(';')//
.delimiter(',')//
.build();

var arguments = provideArguments(annotation, "foo, bar \n;baz, qux");

assertThat(arguments).containsExactly(array("foo", "bar"));
}

@ParameterizedTest
@MethodSource("org.junit.jupiter.params.provider.CsvArgumentsProviderTests#invalidControlCharacterCombinations")
void throwsExceptionWhenControlCharactersNotDiffer(Object delimiter, char quoteCharacter, char commentCharacter) {
var builder = csvFileSource().resources("test.csv") //
.quoteCharacter(quoteCharacter).commentCharacter(commentCharacter);

var annotation = delimiter instanceof Character c //
? builder.delimiter(c).build() //
: builder.delimiterString(delimiter.toString()).build();

var message = "(delimiter|delimiterString): '%s', quoteCharacter: '%s', and commentCharacter: '%s' must all differ";
assertPreconditionViolationFor(() -> provideArguments(annotation, "foo").findAny()) //
.withMessage(message.formatted(delimiter, quoteCharacter, commentCharacter));
}

@Test
void closesInputStreamForClasspathResource() {
var closed = new AtomicBoolean(false);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ static MockCsvFileSourceBuilder csvFileSource() {
protected String[] nullValues = new String[0];
protected int maxCharsPerColumn = 4096;
protected boolean ignoreLeadingAndTrailingWhitespace = true;
private char commentCharacter = '#';

private MockCsvAnnotationBuilder() {
}
Expand Down Expand Up @@ -88,6 +89,11 @@ B ignoreLeadingAndTrailingWhitespace(boolean ignoreLeadingAndTrailingWhitespace)
return getSelf();
}

B commentCharacter(char commentCharacter) {
this.commentCharacter = commentCharacter;
return getSelf();
}

abstract A build();

// -------------------------------------------------------------------------
Expand Down Expand Up @@ -129,6 +135,7 @@ CsvSource build() {
when(annotation.nullValues()).thenReturn(super.nullValues);
when(annotation.maxCharsPerColumn()).thenReturn(super.maxCharsPerColumn);
when(annotation.ignoreLeadingAndTrailingWhitespace()).thenReturn(super.ignoreLeadingAndTrailingWhitespace);
when(annotation.commentCharacter()).thenReturn(super.commentCharacter);

// @CsvSource
when(annotation.value()).thenReturn(this.lines);
Expand Down Expand Up @@ -188,6 +195,7 @@ CsvFileSource build() {
when(annotation.nullValues()).thenReturn(super.nullValues);
when(annotation.maxCharsPerColumn()).thenReturn(super.maxCharsPerColumn);
when(annotation.ignoreLeadingAndTrailingWhitespace()).thenReturn(super.ignoreLeadingAndTrailingWhitespace);
when(annotation.commentCharacter()).thenReturn(super.commentCharacter);

// @CsvFileSource
when(annotation.resources()).thenReturn(this.resources);
Expand Down
Loading