-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when parsing a valid XML file #44
Comments
It seems that the XML parser cannot parse dots in XML files. Won't merge into develop while a fix has not been found.
In fact it seems that even that header tag : I also tried to use a String instead of an InputStream (I cannot find a signature corresponding to the example given in the Readme!): String xml = IOUtils.toString(inputData, StandardCharsets.UTF_8);
xmlFormat.merge(xml, ExtensionRegistry.newInstance(), builder); But I still get the same error. I printed my xml string and I shows the original one, so the problem comes right from the parsing process. |
I identified the error: in fact the The current Regex is: extension|[a-zA-Z_\s;@][0-9a-zA-Z_\s;@+-]*+|[.]?[0-9+-][0-9a-zA-Z_.+-]*+|<\/|[\\0-9]++|"([^"
\\]|\\.)*+("|\\?$)|'([^'
\\]|\\.)*+('|\\?$) |
Would you mind putting up a fix with a test & update RELEASE-NOTES.md? |
Yep I'll try, but I need to find the right regex first, it might not be simple. When I find a solution for sure I'll create a pull request. |
Is this change integrated in the master branch ? - I had the same issue and I wonder if the modifications are working - the code simplifies a lot the tokenization regexps. |
No it's not merged yet, since simplifying the regex has side effects. In fact I think that the whole parser should be refactored, so I left it as is for the moment, if you find a way to make it work, feel free to fork my repo :) |
One thing I don't understand is why they had to code their own XML parser instead of using standard ones ? |
I don't know, that's why I gave up trying to debug it, since the whole parser is quite restrictive and does not handle every cases. I think the only clean solution would be to reimplement everything using an existing parser but I did not have time to do it yet. |
Hi , I have the same problem. Is this issue still open? |
Hi,
I am trying to parse a sample document into a Protobuf Message, using the AddressBook schema from Google examples:
Here is the document:
Here is the code:
Though, I get the following error:
I tried with UTF-8 and ISO-8859-1 encoding but I still get the error. Then I tried to remove the dots in the email address in my XML doc and I now parse successfully.
This is the working XML:
If you want, I can also join the Protobuf schema if you want to try by yourself.
The text was updated successfully, but these errors were encountered: