Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre setting enable/disable any unicode character transformation #173

Open
ac-medexter opened this issue Feb 5, 2025 · 4 comments
Open

Comments

@ac-medexter
Copy link

ac-medexter commented Feb 5, 2025

Description of the situation

Currently, all hex/deximal encoded characters and special html encoded characters inside an xml are (tried) to be converted to their representation in charsets.
eg.
> -> >
&x10; -> new line
ä -> error

This does not seem to work correctly for decimal encoded characters, which are part of the xml's defined charset, and often used in other languages:
Original

<?xml version="1.0" encoding="utf-8"?>
<Address>W&#228;hring 21</Address>

After pretty xml

<?xml version="1.0" encoding="utf-8"?>
<Address>W�hring 21</Address>

Requested feature

Experiencing this behaviour, I think it is a generally good idea to provide a setting which will completely disable any character transformation (as described above).

Whether the transformation is working as intended or not, at times even correct transformation is unwanted, as the xml needs to be edited (in prettified format ^^) manually, before used in testing a service, experimenting with a framework or interface, etc... . In such cases, eventhough pretty formatting is desired, obviously no unintended changes to the original xml text contents shall occur.

@ac-medexter ac-medexter changed the title Per setting disable any unicode transformation Pre setting enable/disable any unicode character transformation Feb 5, 2025
@pmahend1
Copy link
Owner

pmahend1 commented Feb 5, 2025

Only the mentioned characters are part of existing setting for Unicode character conversion. &#228; is not part of current look up. The parser automatically convers characters unless this lookup is updated.

Did you want to add just add &#228;?

@ac-medexter
Copy link
Author

Thanks for the response. Your conviction that pretty xml does not cause html symbols to be wrongfully formatted made me try everything again. After just formatting the file, using VisualStudio Code default formatter, no extensions enabled, I found that it seems to mess up my html symbols too.

I still don't know why this happens, couldn't find anything online or any setting in VSCode to regulate html symbol formatting, but this means the issue would not be connected to pretty xml? sorry for having raised this issue here

@pmahend1
Copy link
Owner

Xml org rules define how these are parsed apparently. This extension uses C# parser which converts it accordingly. So I added reconversion code by look up for those 3 since I knew those are used in XAML and other formatters allow it. There may be other extensions which format through regex that may allow this. We can also extend PrettyXML look up with additional few unicodes which or let user define array of lookup

@pmahend1
Copy link
Owner

@ac-medexter Let me know if you have more questions. If it still needs addressing let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants