Skip to content

Commit

Permalink
Created using Colab
Browse files Browse the repository at this point in the history
  • Loading branch information
sualeh committed May 8, 2024
1 parent a3c5c65 commit 01beb77
Showing 1 changed file with 217 additions and 175 deletions.
392 changes: 217 additions & 175 deletions Notebooks/4_java_encoding.ipynb
Original file line number Diff line number Diff line change
@@ -1,177 +1,219 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----------\n",
"\n",
"> **How to Run This Notebook**\n",
"\n",
"You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to \"Java (java)\". Refresh the browser before running any subsequent code. You can also run this notebook locally if you have the IJava kernel for Jupyter installed."
]
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/sualeh/What-a-Character/blob/java/Notebooks/4_java_encoding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XxxqMxcaiYUE"
},
"source": [
"----------\n",
"\n",
"> **How to Run This Notebook**\n",
"\n",
"You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to \"Java (java)\". Refresh the browser before running any subsequent code. You can also run this notebook locally if you have the IJava kernel for Jupyter installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tPIzNJECiYUF"
},
"outputs": [],
"source": [
"#@title Prepare Google Colab for IJava Kernel\n",
"\n",
"%%sh\n",
"# Install java kernel\n",
"wget -q https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip\n",
"unzip -q ijava-1.3.0.zip\n",
"python install.py\n",
"\n",
"# Install proxy for the java kernel\n",
"wget -qO- https://gist.github.com/SpencerPark/e2732061ad19c1afa4a33a58cb8f18a9/archive/b6cff2bf09b6832344e576ea1e4731f0fb3df10c.tar.gz | tar xvz --strip-components=1\n",
"python install_ipc_proxy_kernel.py --kernel=java --implementation=ipc_proxy_kernel.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YYVz1VZBiYUH"
},
"source": [
"----------"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PI4_voFmiYUH"
},
"source": [
"# Encoding"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OUXHdQ0MiYUH"
},
"source": [
"## Converting to Bytes\n",
"\n",
"Always specify encoding to avoid cross-platform surprises."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4uT_O9zMiYUH"
},
"outputs": [],
"source": [
"String original = \"Aß東𐐀\";\n",
"\n",
"byte[] utf8Bytes = original.getBytes(\"UTF-8\");\n",
"String roundTrip = new String(utf8Bytes, \"UTF-8\");\n",
"\n",
"System.out.println(roundTrip);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wU3pGbGciYUI"
},
"source": [
"> **Bad decoding**\n",
"\n",
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Xz9s8mjziYUI"
},
"outputs": [],
"source": [
"original = \"Aß東𐐀\";\n",
"\n",
"utf8Bytes = original.getBytes(\"UTF-8\");\n",
"roundTrip = new String(utf8Bytes, \"UTF-16\");\n",
"\n",
"// NOTE: No encoding errors are reported!\n",
"System.out.println(roundTrip);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zw6V8_2diYUI"
},
"source": [
"## Writing Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dxvh6UqoiYUI"
},
"outputs": [],
"source": [
"final String original = \"Aß東𐐀\";\n",
"\n",
"final OutputStream fos = new FileOutputStream(\"test.txt\");\n",
"final Writer wtr = new OutputStreamWriter(fos, \"UTF-8\");\n",
"wtr.write(original);\n",
"wtr.close();"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BG8qjkGaiYUI"
},
"source": [
"## Reading Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "boCblFt7iYUI"
},
"outputs": [],
"source": [
"InputStream fis = new FileInputStream(\"test.txt\");\n",
"Reader rdr = new InputStreamReader(fis, \"UTF-8\");\n",
"BufferedReader brdr = new BufferedReader(rdr);\n",
"String text = brdr.readLine();\n",
"brdr.close();\n",
"\n",
"System.out.println(text);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xN-u1x9uiYUJ"
},
"source": [
"If you specify an incorrect encoding when reading a file, you can get gibberish."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gCr646zviYUJ"
},
"outputs": [],
"source": [
"fis = new FileInputStream(\"test.txt\");\n",
"rdr = new InputStreamReader(fis, \"UTF-16\");\n",
"brdr = new BufferedReader(rdr);\n",
"text = brdr.readLine();\n",
"brdr.close();\n",
"\n",
"System.out.println(text);"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Java",
"language": "java",
"name": "java"
},
"language_info": {
"codemirror_mode": "java",
"file_extension": ".jshell",
"mimetype": "text/x-java-source",
"name": "java",
"pygments_lexer": "java",
"version": "17.0.6+9-LTS-190"
},
"colab": {
"provenance": [],
"include_colab_link": true
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#@title Prepare Google Colab for IJava Kernel\n",
"\n",
"%%sh\n",
"# Install java kernel\n",
"wget -q https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip\n",
"unzip -q ijava-1.3.0.zip\n",
"python install.py\n",
"\n",
"# Install proxy for the java kernel\n",
"wget -qO- https://gist.github.com/SpencerPark/e2732061ad19c1afa4a33a58cb8f18a9/archive/b6cff2bf09b6832344e576ea1e4731f0fb3df10c.tar.gz | tar xvz --strip-components=1\n",
"python install_ipc_proxy_kernel.py --kernel=java --implementation=ipc_proxy_kernel.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encoding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting to Bytes\n",
"\n",
"Always specify encoding to avoid cross-platform surprises."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"String original = \"Aß東𐐀\";\n",
"\n",
"byte[] utf8Bytes = original.getBytes(\"UTF-8\");\n",
"String roundTrip = new String(utf8Bytes, \"UTF-8\");\n",
"\n",
"System.out.println(roundTrip);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Bad decoding**\n",
"\n",
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"original = \"Aß東𐐀\";\n",
"\n",
"utf8Bytes = original.getBytes(\"UTF-8\");\n",
"roundTrip = new String(utf8Bytes, \"UTF-16\");\n",
"\n",
"// NOTE: No encoding errors are reported!\n",
"System.out.println(roundTrip);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"final String original = \"Aß東𐐀\";\n",
"\n",
"final OutputStream fos = new FileOutputStream(\"test.txt\");\n",
"final Writer wtr = new OutputStreamWriter(fos, \"UTF-8\");\n",
"wtr.write(original);\n",
"wtr.close();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"InputStream fis = new FileInputStream(\"test.txt\");\n",
"Reader rdr = new InputStreamReader(fis, \"UTF-8\");\n",
"BufferedReader brdr = new BufferedReader(rdr);\n",
"String text = brdr.readLine();\n",
"brdr.close();\n",
"\n",
"System.out.println(text);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you specify an incorrect encoding when reading a file, you can get gibberish."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fis = new FileInputStream(\"test.txt\");\n",
"rdr = new InputStreamReader(fis, \"UTF-16\");\n",
"brdr = new BufferedReader(rdr);\n",
"text = brdr.readLine();\n",
"brdr.close();\n",
"\n",
"System.out.println(text);"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Java",
"language": "java",
"name": "java"
},
"language_info": {
"codemirror_mode": "java",
"file_extension": ".jshell",
"mimetype": "text/x-java-source",
"name": "java",
"pygments_lexer": "java",
"version": "17.0.6+9-LTS-190"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 01beb77

Please sign in to comment.