-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behavior
Description
xml2::read_html(x) returns the HTML within a linked data JSON object as expected:
library(xml2)
library(magrittr)
library(rvest)
test_ld <- '<script type="application/ld+json">{"@context":"http://schema.org","@type":"ReproducibleExample", "description":"<p><strong>text within tags</strong>text after closing tag</p>"'
# tags preserved
test_ld %>%
read_html() %>%
html_node('script[type="application/ld+json"]') %>%
as.character()
[1] "<script type=\"application/ld+json\">{\"@context\":\"http://schema.org\",\"@type\":\"ReproducibleExample\", \"description\":\"<p><strong>text within tags</strong>text after closing tag</p>\"</script>"
Where description contains the HTML <p><strong>text within tags</strong>text after closing tag</p>
But if using xml2::read_html(x, options = 'HUGE') or with any single option (I've tested 5 or 6), the closing tags are removed from the HTML text in a JSON-LD object.
# tags removed
test_ld %>%
read_html(options = 'HUGE') %>%
html_node('script[type="application/ld+json"]') %>%
as.character()
# removed
test_ld %>%
read_html(options = "NOBLANKS") %>%
html_node('script[type="application/ld+json"]') %>%
as.character()
# removed
test_ld %>%
read_html(options = '') %>%
html_node('script[type="application/ld+json"]') %>%
as.character()
# all return:
[1] "<script type=\"application/ld+json\">{\"@context\":\"http://schema.org\",\"@type\":\"ReproducibleExample\", \"description\":\"<p><strong>text within tagstext after closing tag\"</script
description now becomes <p><strong>text within tagstext after closing tag
Setting options is necessary for some of the HTML I'm parsing. Is it possible to use options and preserve properly formatted HTML from a linked data object?
Metadata
Metadata
Assignees
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behavior