Closed
Description
=== What steps will reproduce the problem ===
scala> <foo>{"hi\nthere"}</foo>
res6: scala.xml.Elem =
<foo>hi
there</foo>
scala> new PrettyPrinter(9999,2).format(<foo>{"hi\nthere"}</foo>)
res7: String = <foo>hi there</foo>
scala> new PrettyPrinter(9999,2).format(<foo>{PCData("hi\nthere")}</foo>)
res8: String = <foo><![CDATA[hi there]]></foo>
Activity
scabug commentedon Feb 28, 2011
Imported From: https://issues.scala-lang.org/browse/SI-4303?orig=1
Reporter: Ittay Dror (ittayd)
scabug commentedon Mar 1, 2011
@axel22 said:
The correct behaviour needs to be checked by someone in the xml specification. Contributions are, of course, always welcome.
scabug commentedon Feb 18, 2014
Francois Armand (fanf) said:
For people with that problem, it seems to simply changing the "doPreserve" method of PrettyPrinter to always returning true make what we want. I don't have the least knowledge about what is expecting by XML spec or DTD.
So bad that the doPreserve method is private...
scabug commentedon Dec 22, 2014
Michael Beckerle (mbeckerle.dfdl) said:
I would like to comment on this issue of the XML specificatiion, and what the right behavior is.
XML 1.1 spec is very clear that if you insert a CR into text using via an "entity value literal" then that character must be preserved. This suggests to me that the only reasonable implementation would not do any whitespace normalization on output, as all the various unicode line-ending characters can be inserted by this same mechanism.
This from the XML 1.1 spec (this clarification is not in the original XML 1.0 spec, but I suggest it is the "right thing" to do for XML 1.0 implementations anyway)
2.3 Common Syntactic Constructs
This section defines some symbols used widely in the grammar.
S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs.
White Space
[3] S ::= (#x20 | #x9 | #xD | #xA)+
Note:The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition. As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal.
scabug commentedon Dec 22, 2014
@som-snytt said:
Footnote, you don't get incomplete parses from embedded Scala blocks:
scabug commentedon Dec 23, 2014
@som-snytt said (edited on Dec 23, 2014 9:46:38 PM UTC):
Took a quick look. First, Utility.serialize is the non-formatting option. Second the PrettyPrinter is pretty ugly. It's not obvious whether it's trying to minimize verticality. When is GSOC again?
scabug commentedon Dec 23, 2014
Michael Beckerle (mbeckerle.dfdl) said:
Sorry GSOC means what?
scabug commentedon Dec 23, 2014
@som-snytt said:
I was hoping a Google Summer of Code intern wanted to do a project with XML.
Maybe a student co-majoring in History. The "digital humanities" are huge these days.
scabug commentedon Jul 17, 2015
@SethTisue said:
The scala-xml library is now community-maintained. Issues with it are now tracked at https://github.com/scala/scala-xml/issues instead of here in the Scala JIRA.
Interested community members: if you consider this issue significant, feel free to open a new issue for it on GitHub, with links in both directions.
scabug commentedon Jul 29, 2015
Michael Beckerle (mbeckerle.dfdl) said:
Issue migrated to scala/scala-xml#76