Skip to content

Conversation

chocolatkey
Copy link
Member

Work in progress. Given the following input:

<!doctype html>
<html xmlns:epub="http://www.idpf.org/2007/ops"><!-- lang="en" xml:lang="en" -->
<body>
	<p xml:lang="fr">Paragraphe avec image: <img src="src/image.jpg" alt="A cool image" /></p>
	<p>This job requires a certain <em xml:lang="fr">savoir faire</em> that can only be acquired over time.</p>
	<p>This is a paragraph <b>with some very-<em>strong</em> bold</b> text!</p>

	<div>
	<span id="pg04" role="doc-pagebreak" epub:type="pagebreak" title="4"/>
	<p>And the next pagebreak is in the middle <span id="pg05" role="doc-pagebreak" epub:type="pagebreak" title="4"/> of a sentence.</p>
	</div>


	<section role="doc-chapter" epub:type="chapter">
		<h1>Title of the chapter</h1>
	</section>
	<ul>
		<li>First item</li>
		<li>Second item</li>
		<li>Third item</li>
	</ul>
	<p aria-hidden="true">Hidden <b>text!</b> <img src="with_image.jpg" />...</p>

	<img src="image1.avif" alt="Alternative text using the alt attribute">
	<span role="img" aria-label="Rating: 4 out of 5 stars">
		<span></span>
		<span></span>
		<span></span>
		<span></span>
		<span></span>
	</span>
	<figure aria-labelledby="cat-caption"> 
		<pre>
			/\_/\
		( o.o )
				 ^ 
		</pre>
		<figcaption id="cat-caption">
		ASCII Art of a cat face
		</figcaption>
	</figure>
</body>
</html>

the following guided nav doc is generated:

{
    "guided": [
        {
            "children": [
                {
                    "children": [
                        {
                            "text": {
                                "language": "fr",
                                "plain": "Paragraphe avec image: "
                            }
                        },
                        {
                            "description": "A cool image",
                            "imgref": "src/image.jpg",
                            "role": [
                                "image"
                            ]
                        }
                    ],
                    "role": [
                        "paragraph"
                    ]
                },
                {
                    "children": [
                        {
                            "text": "This job requires a certain "
                        },
                        {
                            "text": {
                                "language": "fr",
                                "plain": "savoir faire"
                            }
                        },
                        {
                            "text": " that can only be acquired over time."
                        }
                    ],
                    "role": [
                        "paragraph"
                    ]
                },
                {
                    "children": [
                        {
                            "text": "This is a paragraph with some very-strong bold text!"
                        }
                    ],
                    "role": [
                        "paragraph"
                    ]
                },
                {
                    "children": [
                        {
                            "children": [
                                {
                                    "text": "And the next pagebreak is in the middle of a sentence."
                                }
                            ],
                            "role": [
                                "paragraph"
                            ]
                        }
                    ]
                },
                {
                    "children": [
                        {
                            "children": [
                                {
                                    "text": "Title of the chapter"
                                }
                            ],
                            "role": [
                                "heading"
                            ]
                        }
                    ],
                    "role": [
                        "chapter"
                    ]
                },
                {
                    "children": [
                        {
                            "children": [
                                {
                                    "text": "First item"
                                }
                            ],
                            "role": [
                                "listItem"
                            ]
                        },
                        {
                            "children": [
                                {
                                    "text": "Second item"
                                }
                            ],
                            "role": [
                                "listItem"
                            ]
                        },
                        {
                            "children": [
                                {
                                    "text": "Third item"
                                }
                            ],
                            "role": [
                                "listItem"
                            ]
                        }
                    ],
                    "role": [
                        "list"
                    ]
                },
                {
                    "children": [
                        {
                            "imgref": "with_image.jpg",
                            "role": [
                                "image"
                            ]
                        }
                    ],
                    "role": [
                        "paragraph"
                    ]
                },
                {
                    "description": "Alternative text using the alt attribute",
                    "imgref": "image1.avif",
                    "role": [
                        "image"
                    ]
                },
                {
                    "description": "Rating: 4 out of 5 stars",
                    "role": [
                        "image"
                    ]
                },
                {
                    "description": "ASCII Art of a cat face",
                    "role": [
                        "figure"
                    ]
                }
            ]
        }
    ]
}

@HadrienGardeur
Copy link
Member

Looking at the results, here are a few early comments:

  • we shouldn't cut into multiple elements like we did with Content Iterator when we encounter another language, instead we should use SSML on text and indicate language changes that way
  • SSML should also handle emphasis which would cover at least <em> and <i> but probably <strong> and <b> as well
  • we seem to use too many children everywhere, for example the <h1> element should result in a single object with a role (heading), a level (it's missing right now) and a text
  • this seems to be missing support for pagebreaks, whether they're on their own or within an other element (which would require SSML)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants