Skip to content

Commit

Permalink
Created using Colab
Browse files Browse the repository at this point in the history
  • Loading branch information
sualeh committed May 7, 2024
1 parent 8c5d7ad commit 6668363
Showing 1 changed file with 181 additions and 156 deletions.
337 changes: 181 additions & 156 deletions Notebooks/4_go_encoding.ipynb
Original file line number Diff line number Diff line change
@@ -1,158 +1,183 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encoding"
]
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "y5-0qPlZRNRv"
},
"source": [
"# Encoding"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EOz9sGI2RNRw"
},
"source": [
"## Converting to Bytes\n",
"\n",
"Always specify encoding to avoid cross-platform surprises."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yNU6pAmXRNRx"
},
"outputs": [],
"source": [
"%%\n",
"original := \"Aß東𐐀\"\n",
"\n",
"utf16Bytes := utf16.Encode([]rune(original))\n",
"\n",
"buf := new(bytes.Buffer)\n",
"for _, b := range utf16Bytes {\n",
" buf.WriteRune(rune(b))\n",
"}\n",
"roundTrip := buf.String()\n",
"\n",
"fmt.Println(roundTrip)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kElPJp6oRNRy"
},
"source": [
"> **Bad decoding**\n",
"\n",
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6uMstf9XRNRy"
},
"outputs": [],
"source": [
"%%\n",
"original := \"Aß東𐐀\"\n",
"\n",
"utf16Bytes := utf16.Encode([]rune(original))\n",
"\n",
"// NOTE: Read the bytes back as UTF-8.\n",
"roundTrip := buf.String()\n",
"\n",
"fmt.Println(roundTrip)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2AtIUt7TRNRy"
},
"source": [
"## Writing Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "F5IkwmzRRNRy"
},
"outputs": [],
"source": [
"%%\n",
"str := \"Aß東𐐀\"\n",
"\n",
"err := ioutil.WriteFile(\"test.txt\", []byte(str), 0644)\n",
"if err != nil {\n",
" panic(err)\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3bLKcVTgRNRy"
},
"source": [
"## Reading Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YfuallDiRNRy"
},
"outputs": [],
"source": [
"%%\n",
"data, err := ioutil.ReadFile(\"test.txt\")\n",
"if err != nil {\n",
" panic(err)\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P3xW3MjYRNRz"
},
"source": [
"If you specify an incorrect encoding when reading a file, you can get gibberish."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d5y1y6cbRNRz"
},
"outputs": [],
"source": [
"%%\n",
"file, err := os.Open(\"test.txt\")\n",
"if err != nil {\n",
" panic(err)\n",
"}\n",
"defer file.Close()\n",
"\n",
"reader := transform.NewReader(file, unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewDecoder())\n",
"content, err := ioutil.ReadAll(reader)\n",
"if err != nil {\n",
" panic(err)\n",
"}\n",
"\n",
"fmt.Println(string(content))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"colab": {
"provenance": []
}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting to Bytes\n",
"\n",
"Always specify encoding to avoid cross-platform surprises."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%\n",
"original := \"Aß東𐐀\"\n",
"\n",
"utf16Bytes := utf16.Encode([]rune(original))\n",
"\n",
"buf := new(bytes.Buffer)\n",
"for _, b := range utf16Bytes {\n",
" buf.WriteRune(rune(b))\n",
"}\n",
"roundTrip := buf.String()\n",
"\n",
"fmt.Println(roundTrip)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Bad decoding**\n",
"\n",
"If an incorrect encoding is speccified, no exceptions may be thrown even if data gets corrupted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%\n",
"original := \"Aß東𐐀\"\n",
"\n",
"utf16Bytes := utf16.Encode([]rune(original))\n",
"\n",
"// NOTE: Read the bytes back as UTF-8.\n",
"roundTrip := buf.String()\n",
"\n",
"fmt.Println(roundTrip)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%\n",
"str := \"Aß東𐐀\"\n",
"\n",
"err := ioutil.WriteFile(\"test.txt\", []byte(str), 0644)\n",
"if err != nil {\n",
" panic(err)\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading Files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%\n",
"data, err := ioutil.ReadFile(\"test.txt\")\n",
"if err != nil {\n",
" panic(err)\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you specify an incorrect encoding when reading a file, you can get gibberish."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%\n",
"file, err := os.Open(\"test.txt\")\n",
"if err != nil {\n",
" panic(err)\n",
"}\n",
"defer file.Close()\n",
"\n",
"reader := transform.NewReader(file, unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewDecoder())\n",
"content, err := ioutil.ReadAll(reader)\n",
"if err != nil {\n",
" panic(err)\n",
"}\n",
"\n",
"fmt.Println(string(content))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 6668363

Please sign in to comment.