-
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
217 additions
and
175 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,177 +1,219 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"----------\n", | ||
"\n", | ||
"> **How to Run This Notebook**\n", | ||
"\n", | ||
"You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to \"Java (java)\". Refresh the browser before running any subsequent code. You can also run this notebook locally if you have the IJava kernel for Jupyter installed." | ||
] | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "view-in-github", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"<a href=\"https://colab.research.google.com/github/sualeh/What-a-Character/blob/java/Notebooks/4_java_encoding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "XxxqMxcaiYUE" | ||
}, | ||
"source": [ | ||
"----------\n", | ||
"\n", | ||
"> **How to Run This Notebook**\n", | ||
"\n", | ||
"You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to \"Java (java)\". Refresh the browser before running any subsequent code. You can also run this notebook locally if you have the IJava kernel for Jupyter installed." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "tPIzNJECiYUF" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#@title Prepare Google Colab for IJava Kernel\n", | ||
"\n", | ||
"%%sh\n", | ||
"# Install java kernel\n", | ||
"wget -q https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip\n", | ||
"unzip -q ijava-1.3.0.zip\n", | ||
"python install.py\n", | ||
"\n", | ||
"# Install proxy for the java kernel\n", | ||
"wget -qO- https://gist.github.com/SpencerPark/e2732061ad19c1afa4a33a58cb8f18a9/archive/b6cff2bf09b6832344e576ea1e4731f0fb3df10c.tar.gz | tar xvz --strip-components=1\n", | ||
"python install_ipc_proxy_kernel.py --kernel=java --implementation=ipc_proxy_kernel.py" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "YYVz1VZBiYUH" | ||
}, | ||
"source": [ | ||
"----------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "PI4_voFmiYUH" | ||
}, | ||
"source": [ | ||
"# Encoding" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "OUXHdQ0MiYUH" | ||
}, | ||
"source": [ | ||
"## Converting to Bytes\n", | ||
"\n", | ||
"Always specify encoding to avoid cross-platform surprises." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "4uT_O9zMiYUH" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"String original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"byte[] utf8Bytes = original.getBytes(\"UTF-8\");\n", | ||
"String roundTrip = new String(utf8Bytes, \"UTF-8\");\n", | ||
"\n", | ||
"System.out.println(roundTrip);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "wU3pGbGciYUI" | ||
}, | ||
"source": [ | ||
"> **Bad decoding**\n", | ||
"\n", | ||
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "Xz9s8mjziYUI" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"utf8Bytes = original.getBytes(\"UTF-8\");\n", | ||
"roundTrip = new String(utf8Bytes, \"UTF-16\");\n", | ||
"\n", | ||
"// NOTE: No encoding errors are reported!\n", | ||
"System.out.println(roundTrip);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "zw6V8_2diYUI" | ||
}, | ||
"source": [ | ||
"## Writing Files" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "dxvh6UqoiYUI" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"final String original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"final OutputStream fos = new FileOutputStream(\"test.txt\");\n", | ||
"final Writer wtr = new OutputStreamWriter(fos, \"UTF-8\");\n", | ||
"wtr.write(original);\n", | ||
"wtr.close();" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "BG8qjkGaiYUI" | ||
}, | ||
"source": [ | ||
"## Reading Files" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "boCblFt7iYUI" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"InputStream fis = new FileInputStream(\"test.txt\");\n", | ||
"Reader rdr = new InputStreamReader(fis, \"UTF-8\");\n", | ||
"BufferedReader brdr = new BufferedReader(rdr);\n", | ||
"String text = brdr.readLine();\n", | ||
"brdr.close();\n", | ||
"\n", | ||
"System.out.println(text);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "xN-u1x9uiYUJ" | ||
}, | ||
"source": [ | ||
"If you specify an incorrect encoding when reading a file, you can get gibberish." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "gCr646zviYUJ" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"fis = new FileInputStream(\"test.txt\");\n", | ||
"rdr = new InputStreamReader(fis, \"UTF-16\");\n", | ||
"brdr = new BufferedReader(rdr);\n", | ||
"text = brdr.readLine();\n", | ||
"brdr.close();\n", | ||
"\n", | ||
"System.out.println(text);" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Java", | ||
"language": "java", | ||
"name": "java" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": "java", | ||
"file_extension": ".jshell", | ||
"mimetype": "text/x-java-source", | ||
"name": "java", | ||
"pygments_lexer": "java", | ||
"version": "17.0.6+9-LTS-190" | ||
}, | ||
"colab": { | ||
"provenance": [], | ||
"include_colab_link": true | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#@title Prepare Google Colab for IJava Kernel\n", | ||
"\n", | ||
"%%sh\n", | ||
"# Install java kernel\n", | ||
"wget -q https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip\n", | ||
"unzip -q ijava-1.3.0.zip\n", | ||
"python install.py\n", | ||
"\n", | ||
"# Install proxy for the java kernel\n", | ||
"wget -qO- https://gist.github.com/SpencerPark/e2732061ad19c1afa4a33a58cb8f18a9/archive/b6cff2bf09b6832344e576ea1e4731f0fb3df10c.tar.gz | tar xvz --strip-components=1\n", | ||
"python install_ipc_proxy_kernel.py --kernel=java --implementation=ipc_proxy_kernel.py" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"----------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Encoding" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Converting to Bytes\n", | ||
"\n", | ||
"Always specify encoding to avoid cross-platform surprises." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"String original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"byte[] utf8Bytes = original.getBytes(\"UTF-8\");\n", | ||
"String roundTrip = new String(utf8Bytes, \"UTF-8\");\n", | ||
"\n", | ||
"System.out.println(roundTrip);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"> **Bad decoding**\n", | ||
"\n", | ||
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"utf8Bytes = original.getBytes(\"UTF-8\");\n", | ||
"roundTrip = new String(utf8Bytes, \"UTF-16\");\n", | ||
"\n", | ||
"// NOTE: No encoding errors are reported!\n", | ||
"System.out.println(roundTrip);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Writing Files" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"final String original = \"Aß東𐐀\";\n", | ||
"\n", | ||
"final OutputStream fos = new FileOutputStream(\"test.txt\");\n", | ||
"final Writer wtr = new OutputStreamWriter(fos, \"UTF-8\");\n", | ||
"wtr.write(original);\n", | ||
"wtr.close();" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Reading Files" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"InputStream fis = new FileInputStream(\"test.txt\");\n", | ||
"Reader rdr = new InputStreamReader(fis, \"UTF-8\");\n", | ||
"BufferedReader brdr = new BufferedReader(rdr);\n", | ||
"String text = brdr.readLine();\n", | ||
"brdr.close();\n", | ||
"\n", | ||
"System.out.println(text);" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If you specify an incorrect encoding when reading a file, you can get gibberish." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fis = new FileInputStream(\"test.txt\");\n", | ||
"rdr = new InputStreamReader(fis, \"UTF-16\");\n", | ||
"brdr = new BufferedReader(rdr);\n", | ||
"text = brdr.readLine();\n", | ||
"brdr.close();\n", | ||
"\n", | ||
"System.out.println(text);" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Java", | ||
"language": "java", | ||
"name": "java" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": "java", | ||
"file_extension": ".jshell", | ||
"mimetype": "text/x-java-source", | ||
"name": "java", | ||
"pygments_lexer": "java", | ||
"version": "17.0.6+9-LTS-190" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |