Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom delimiter on array fields #227

Merged
merged 5 commits into from
Sep 3, 2024
Merged

Custom delimiter on array fields #227

merged 5 commits into from
Sep 3, 2024

Conversation

joneubank
Copy link
Contributor

@joneubank joneubank commented Aug 30, 2024

Ticket: #204

Objective

Adds the ability to define a custom delimiter on any field. If the field has an array value, the value will use this custom delimiter to split the string value into an array.

The property is named delimiter and accepts a string value. Delimiter values must be at least one character in length, but can be longer and can contain any combination of characters.

The dictionary-reference documentation has been updated to reflect the array delimiter implementation.

Example

String array split on | character:

const exampleFieldWithCustomDelimiter: SchemaField = {
	name: 'pipe-separated-list',
	isArray: true,
	delimiter: '|',
	valueType: 'string'
};

Then when used to parse data by the function parseFieldValue() from the validation library:

const result = parseFieldValue('abc|def|ghi', exampleFieldWithCustomDelimiter);
// result.data === ['abc', 'def', 'ghi']

@joneubank joneubank force-pushed the feat/delimiter branch 3 times, most recently from 8533d42 to bf5a5fb Compare August 30, 2024 22:42
@@ -300,5 +300,34 @@ describe('Parse Values - parseFieldValue', () => {
it('Boolean array field, rejects array where value is missing (two delimiters are adjacent)');
expect(parseFieldValue(',true,false,TRUE', fieldBooleanArrayRequired).success).false;
});
describe('Custom delimiter', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice feature! "out loud" thoughts about this, not to question your coverage but to narrate my analysis: asking myself "how do you escape the delimiter if it's also meant to be used as a value?" like you would in a comprehensive array of symbols. the answer I'd guess would be "use a compound delimiter", and then I realised it's not clear what happens if I give the delimiter repeatedly. does the array end up with an empty value? or does the function filter empties?

e.g. assume default (i.e. ,), what if I give you symbols = ',",\,%,,,#,* do I end up with [' , " , \ , % , , , # , *].length === 8? or [' , " , \ , % , # , *].length === 6
... and what would be the representation of the quotes in JS, since one of the two first elements here would have to be escaped to make a string.
There's special handling here that could use being documented, methinks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you put the delimiter back-to-back this results in an empty value between them. This is not allowed in arrays where each element is expected to have a value. Validation of the data will fail.

it('Rejects array where value is missing (two delimiters are adjacent)', () => {
const result = parseFieldValue('12, 45,,-111111', fieldIntegerArrayRequired);
expect(result.success).false;

Copy link
Contributor Author

@joneubank joneubank Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no handling for escaping characters. If you are providing data in a TSV, all characters are allowed. If you are providing data in JSON, then the JSON parsing will strip escape characters. In general, parsing raw data handles escape characters, once the string is a variable in memory we do no escape character handling other than splitting on the delimiter.

We do not have a mechanism for allowing the delimiter in the string value. We could investigate this but I prefer pushing for the dictionary to use a delimiter that is not part of the data set (prefer because its simpler for devs and data submitters, though it puts the burden on the dictionary author)

| -------------- | -------- | ---------------------- | --------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| `name` | Required | | NameString (no whitespace or `.`) | Name of the field. This will be used as the header in TSV files in this field's schema, and in any paths referencing this field. | `"example_field` |
| `valueType` | Required | | [Field Data Type](#field-data-types) | Type of value stored in this field | `"string"` |
| `delimiter` | Optional | `,` | `string` | Character or string that will be used to split multiple values into an array. The default delimiter is a comma `,`. Any characters can be used as a delimiter. The delimiter value can be one or more characters long, but cannot be an empty string. Note: This property has no effect unless the field has `isArray: true`. | `"\|"` |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check and confirm, I assume there's validation and TS preventing a submission where valueType: 'integer', isArray: true, delimiter: '0' ?
isArray is true but valueType is not string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not. That is a valid delimiter, and it would prevent numeric values with 0 in them. I would advise against using this delimiter but the spec allows it atm.

@@ -138,7 +138,7 @@ const convertArrayValue = (value: string, fieldDefinition: SchemaField): Result<

/* === Start of convertArrayValue logic === */
const { valueType } = fieldDefinition;
const delimiter = DEFAULT_DELIMITER;
const delimiter = fieldDefinition.delimiter !== undefined ? fieldDefinition.delimiter : DEFAULT_DELIMITER;
Copy link
Member

@justincorrigible justincorrigible Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line as is would allow empty strings, null values or arrays be used as a delimiter; empty strings not being a delimiter at all. am I missing something here?
possible fix/workaround:

Suggested change
const delimiter = fieldDefinition.delimiter !== undefined ? fieldDefinition.delimiter : DEFAULT_DELIMITER;
const delimiter = typeof fieldDefinition.delimiter === 'string' && fieldDefinition.delimiter.length ? fieldDefinition.delimiter : DEFAULT_DELIMITER;

if the argument "TypeScript would not let that happen at this point" may seem tempting, that's not a tool available at PR reviews shrug... also does TS check for string length?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised zod schema states min 1... code here could still benefit from the additional declarativeness, though

Copy link
Contributor Author

@joneubank joneubank Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 ...

This is good type checking and I am considering how this fits.

  • Any dictionary that has been validated by the lectern server or client will have a minimum delimiter length of 1.
  • Any use of this with a dictionary that hasn't been validated could have '' as a delimiter. Splitting a string by this results in each character being its own string value to be parsed.

If I provide a delimiter of '' to the parsing function, the above behaviour is consistent. It is distinct behaviour from an undefined delimiter value. We also protect against this use case for any dictionaries read from a lectern server or validated using the dictionary package.

Alternately - with the proposed solution - a use case where a non-validated delimiter of '' is provided, no splitting would occur. This is inconsistent since there is a delimiter provided but not used. I can see the situation where a user expects this behaviour, since they consider this as a "no value" case (though, it is not). In this case, some documentation would be good. The documentation does say that it shouldn't be an empty string, but I can make it more explicit.

Both of these outcomes are

Copy link
Member

@justincorrigible justincorrigible left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comment+suggestion to improve declarativeness as well as triple ensure type safety for the custom delimiters 👍

Nice feature!

@joneubank joneubank merged commit 8ef0ce5 into develop Sep 3, 2024
2 checks passed
@joneubank joneubank deleted the feat/delimiter branch September 3, 2024 15:34
@leoraba leoraba mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants