"Product" formats/natures (formats that are both X and Y)

We have formats that parse ambiguously. For example, a Keynote document is a JPEG "at the head" and a ZIP with a specific structure "at the tail". A CR2 is a TIFF until considered otherwise. A TIFF is somewhat CR2-ish until considered otherwise. An Office document is a ZIP initially...

The number of these is only ever going to increase (see the library grounding principles). Currently we are at the stage where we litter the code with workarounds like "if this is _also_ a CR2, bail out", "if this is _also_ a ZIP, it is a Keynote file so bail out..." and so forth. What if, instead of doing this, we were to do the following:

* Apply all the low level parsers, always
* Apply some "folder" or "matcher" strategy to the flat list of results. For example, if something is matched as a JPEG and a ZIP and has a specific file structure we can assume it is Keynote. We then take the two results and smash them together into one which states the Keynote file type unambiguously. If we see the Office ZIP filenames in the file we convert the result into a Word file result
* We return the "folder" list to the caller.

So the procedure would look somewhat like this:

```
initial_results = parsers.map {|p| p.call(io) } #=> [JPEG, ZIP]
results_with_complex_types = fold_complex_filetypes(initial_results) # => [Keynote]
```

This does clash with the idea of parsing "at most as many parsers as was requested" but we would get much more intuitive operation in return, and we could remove quite a few hacks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Product" formats/natures (formats that are both X and Y) #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Product" formats/natures (formats that are both X and Y) #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions