Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to filter out duplicated rows from a csv ? #542

Closed
6 tasks done
nasser-rawashdeh opened this issue Nov 10, 2024 · 1 comment
Closed
6 tasks done

Is there a way to filter out duplicated rows from a csv ? #542

nasser-rawashdeh opened this issue Nov 10, 2024 · 1 comment

Comments

@nasser-rawashdeh
Copy link

nasser-rawashdeh commented Nov 10, 2024

(Fill in the relevant information below to help triage your issue.)

Q A
Version 9.0

Question

Short background:

I accept a CSV file from the user, and I aim to parse and consume it.
As the system is agnostic to duplicated lines, they disappear when processed,
but then the numbers I report don't match.

Actual question

What is the best way to detect the number of duplicated rows, or filter them out ?

Checks before submitting

  • Be sure that there isn't already an issue about this. See: Issues list
  • Be sure that there isn't already a pull request about this. See: Pull requests
  • I have read, searched and not found the information on the documentation website.
  • I have read, searched and not found the information on PHP related forums and/or websites.
  • This issue is about 1 question around the package with no business or domain specific logic related to a specific situation.
  • The question has a descriptive title. For example: "Can I use the library with compressed CSV documents ?".
@nyamsprod
Copy link
Member

nyamsprod commented Nov 14, 2024

@nasser-rawashdeh thanks for using the package.

Filtering out duplicate row is IMHO a domain specific issue which is not limited to CSV but to any tabular or collection of data. the CSV provided by the packate is an Iterator or array records. If you can filter out duplicates from a database you can apply the same technique to league/csv. In other word the problem you are trying to resolve is:

  • not specific to CSV
  • not resolved by the package because it depends on a lot of outside parameters the package will never be knowledgable about

so no de-duplicating a CSV is not handle and is considered out of scope for this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants