Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix discrepancy of empty bed files #775

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kloetzl
Copy link

@kloetzl kloetzl commented Jun 28, 2024

This PR includes two changes.

  1. On latex-line 78 a bed file is allowed to be empty but in line 170 a file is defined as having at least one line. This commit changes the first occurrence to disallow empty files
  2. As specified right now every line including the last line has to be terminated by a newline. I have made the newline optional for the last line. (bedtools2 is fine with that.)

CC: @michaelmhoffman @arq5x

@jmarshall jmarshall added the bed label Jun 29, 2024
@jmarshall
Copy link
Member

Certainly the inconsistency around the minimum number of lines should be fixed. However allowing files with zero lines was an active choice during spec development (see #570 (comment) and the following comment), so the maintainers would likely resolve the inconsistency in the other direction.

An empty file is not the only way to represent an empty set of features, but IMHO it is likely consistent with previous existing practice (which didn't define any header lines or other metadata). Indeed I think it would not be consistent with the previous existing practice that the BED spec is largely codifying to disallow a file with zero lines.

@jmarshall
Copy link
Member

The description of lines as always being terminated by a line separator follows the pattern of similar text in specifications such as POSIX (“A sequence of zero or more non-<newline> characters plus a terminating <newline> character”) and the C Standard (§5.1.1.2, “A source file that is not empty shall end in a new-line character […]”).

Personally I think this is a good way of being clear and specific in exposition, while specifying a simple and transportable form of text file for interoperability between platforms. Implementations (such as bedtools) are of course free to additionally accept e.g. files whose last line is missing its terminator as a quality of implementation convenience feature.

@kloetzl
Copy link
Author

kloetzl commented Jun 29, 2024

Certainly the inconsistency around the minimum number of lines should be fixed. … so the maintainers would likely resolve the inconsistency in the other direction.

Personally I think that en empty file is probably the because of a bug in some shell script and it's more useful to error out than to treat it as empty. But I acknowledge that it can be useful in certain occasions so I'm willing to make the change in the other direction.

The description of lines as always being terminated by a line separator …

Yeah, I wasn't sure if I should mix the two changes into one PR. I see how this is more controversial and am happy to reverse it.

Thanks for you valuable feedback as always.

@jmarshall
Copy link
Member

Empty files may indeed be due to failed programs, and it may be useful to detect such. For example samtools since 1.10 disallows empty SAM files on this basis (see samtools/htslib#261) as a quality of implementation feature, despite the spec saying such files are valid SAM.

However I think the tradeoff is different for BED, not least because the absence of headers in the format discourages representing zero data records in a file that is not zero-length overall (compare a SAM file with an @HD header line and no record lines). (You could write a comment line, but as it doesn't have semantic meaning it's rather artificial.) Nonetheless a BED implementation could take it upon itself to disallow completely empty BED files on the same basis that samtools does with empty SAM.

(Speaking just for myself) I would suggest that at least initially you consider reporting perceived specification problems via issues rather than pull requests. Reports of inconsistencies or lacks of clarity, and suggestions for improvements are gratefully received. However it is the designated specification maintainers who have the responsibility to decide how to address such defects or to lead in formulating the details of improvements. They will likely have their own ideas of how best to adjust the wording based on their experience of the document.

* make an empty file valid
* every line has to be terminated by a newline
@kloetzl
Copy link
Author

kloetzl commented Jun 29, 2024

Thank you for your detailed response. You have convinced me that the spec should allow empty bed files whereas an implementation might give a warning. I have changed my commit with regards to this and the eof issue.

I would suggest that at least initially you consider reporting perceived specification problems via issues rather than pull requests.

Yeah, I was wondering about that. I checked if there is a contributing section in the readme or even a separate contributing.md but that was not the case so I decided to be a good citizen and proactively provide a fix instead of only pointing out that something needs attention. A contributing section might be a worthwhile addition if one way is preferred over the other.

@kloetzl kloetzl changed the title disallow empty bed files but allow unterminated last line fix discrepancy of empty bed files Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants