Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding formatters (all formatters?) are not applied to the header #528

Closed
tiagof opened this issue Jun 21, 2024 · 2 comments
Closed

Encoding formatters (all formatters?) are not applied to the header #528

tiagof opened this issue Jun 21, 2024 · 2 comments

Comments

@tiagof
Copy link

tiagof commented Jun 21, 2024

Bug Report

Importing a file encoded in ISO-8895-15 where both the header and rows have accented characters

Information Description
Version 9.15
PHP version 8.2
OS Platform macOS (Sonoma)

Summary

Importing a file with 1 header + 1 row (example), encoded in ISO-8895-15:

Café,Late
Exposé,Now

Standalone code, or other way to reproduce the problem

test.csv

$encoder = (new CharsetConverter())->inputEncoding('ISO-8895-15');
$csv = Reader::createFromPath('test.csv')
    ->addFormatter($encoder)
    ->skipEmptyRecords()
    ->setHeaderOffset(0);

$stmt = Statement::create();
$records = $stmt->process($csv);

foreach ($records as $row) {
    dump(array_keys($row), array_values($row));
}

Expected result

array:2 [
  0 => "Café"
  1 => "Late"
] 
array:2 [
  0 => "Exposé"
  1 => "Now"
]

Actual result

array:2 [
  0 => b"Café"
  1 => "Late"
] 
array:2 [
  0 => "Exposé"
  1 => "Now"
]

Notice the "b" before "Café" showing that is not properly encoded.

@nyamsprod
Copy link
Member

@tiagof thanks for using the library. As explain in the documentation

Formatting happens AFTER combining the header and the fields value if a header is available and CSV value BUT BEFORE you can access the actual value.

Which means that at that point the header record is already calculated hence no formatting can be applied on it.

What you can/should do to workaround this expected behaviour is to change how your formatter is being added to the CSV document.

<?php

$csv = Reader::createFromPath('test.csv');
CharsetConverter::addTo($csv, 'ISO-8895-15'); //attach the CharsetConverter class as a stream filter

$csv
    ->skipEmptyRecords()
    ->setHeaderOffset(0);

foreach ($csv as $row) {
    dump(array_keys($row), array_values($row));
}

In this example, the CharsetConverter is used as a stream filter and stream filtering is applied before the CSV header is calculated which would result in ALL your CSV fields being converted.

Please refer to the documentation for further informations on the limitation of my proposal solution.

@tiagof
Copy link
Author

tiagof commented Jun 21, 2024

@nyamsprod , many thanks! It works flawlessly!
Honestly, I did go through the documentation, but apparently not thoroughly enough.

Cheers!

@tiagof tiagof closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants