Convert regular expression examples to Go

sualeh · May 7, 2024 · 5aaca82 · 5aaca82
1 parent ec90e53
commit 5aaca82
Showing 1 changed file with 259 additions and 0 deletions.
diff --git a/Notebooks/5_go_unicode_pattern_matching.ipynb b/Notebooks/5_go_unicode_pattern_matching.ipynb
@@ -0,0 +1,259 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/sualeh/What-a-Character/blob/go/Notebooks/5_go_unicode_pattern_matching.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "----------\n",
+        "\n",
+        "## Google Colab\n",
+        "\n",
+        "You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to `Go (gonb)`. Refresh the browser before running any subsequent code. If you are not running the notebook in Google Colab, skip this section."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "vscode": {
+          "languageId": "polyglot-notebook"
+        }
+      },
+      "outputs": [],
+      "source": [
+        "#@title Prepare Google Colab for Go Kernel\n",
+        "\n",
+        "# Install Go and goimports.\n",
+        "!echo -n \"Installing go ...\"\n",
+        "!mkdir -p cache\n",
+        "!wget -q -O cache/go.tar.gz 'https://go.dev/dl/go1.22.2.linux-amd64.tar.gz'\n",
+        "!tar xzf cache/go.tar.gz\n",
+        "%env GOROOT=/content/go\n",
+        "!ln -sf \"/content/go/bin/go\" /usr/bin/go\n",
+        "!echo \" done.\"\n",
+        "!go version\n",
+        "\n",
+        "# Install gonb, goimports, gopls.\n",
+        "!echo -n \"Installing gonb ...\"\n",
+        "!go install github.com/janpfeifer/gonb@latest >& /tmp/output || cat /tmp/output\n",
+        "!echo \" done.\"\n",
+        "!ln -sf /root/go/bin/gonb /usr/bin/gonb\n",
+        "\n",
+        "!echo -n \"Installing goimports ...\"\n",
+        "!go install golang.org/x/tools/cmd/goimports@latest >& /tmp/output || cat /tmp/output\n",
+        "!echo \" done.\"\n",
+        "!ln -sf /root/go/bin/goimports /usr/bin/goimports\n",
+        "\n",
+        "!echo -n \"Installing gopls ...\"\n",
+        "!go install golang.org/x/tools/gopls@latest >& /tmp/output || cat /tmp/output\n",
+        "!echo \" done.\"\n",
+        "!ln -sf /root/go/bin/gopls /usr/bin/gopls\n",
+        "\n",
+        "# Install gonb kernel configuration.\n",
+        "!gonb --install --logtostderr\n",
+        "!echo \"Done!\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "----------"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Auhkou2CdVVl"
+      },
+      "source": [
+        "# Unicode Pattern Matching"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ioVqHQLkdVVm"
+      },
+      "source": [
+        "## Case Insensitive Matching"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qoWvPXGJdVVo"
+      },
+      "source": [
+        "In Greek, the word for dog in lowercase is \"σκύλος\". Notice that the first and last letter are both sigma."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Rvfkcl8JdVVo"
+      },
+      "outputs": [],
+      "source": [
+        "%%\n",
+        "patternGreek := regexp.MustCompile(\"(?iU)σκύλος\")\n",
+        "matches := patternGreek.MatchString(\"ΣΚΎΛΟΣ\")\n",
+        "\n",
+        "fmt.Println(matches)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cjQ2WOGNdVVp"
+      },
+      "source": [
+        "When a lowercase character results in more than one uppercase character, there is no match."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xNnL9Xf7dVVp"
+      },
+      "outputs": [],
+      "source": [
+        "%%\n",
+        "patternGerman := regexp.MustCompile(\"(?iU)straße\"\n",
+        "matches := patternGerman.MatchString(\"STRASSE\")\n",
+        "\n",
+        "fmt.Println(matches)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "detk1R7DdVVq"
+      },
+      "source": [
+        "## Matching Numbers"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Er1YlQy3dVVq"
+      },
+      "outputs": [],
+      "source": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2xJQaHyxdVVr"
+      },
+      "source": [
+        "A naive match with a range of digits `[0-9]` does not work."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "CVdo8t6ndVVr"
+      },
+      "outputs": [],
+      "source": [
+        "%%\n",
+        "hindiNumber := \"१२३४५६७८९०\"\n",
+        "digitRegex := regexp.MustCompile(\"[0-9]+\")\n",
+        "matches := digitRegex.MatchString(hindiNumber)\n",
+        "\n",
+        "fmt.Println(matches)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kdtZo30XdVVr"
+      },
+      "source": [
+        "A slightly better regular expression with a `\\d` pattern works."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "DsNUXh3jdVVr"
+      },
+      "outputs": [],
+      "source": [
+        "%%\n",
+        "hindiNumber := \"१२३४५६७८९०\"\n",
+        "digitRegex := regexp.MustCompile(\"\\\\d+\")\n",
+        "matches := digitRegex.MatchString(hindiNumber)\n",
+        "\n",
+        "fmt.Println(matches)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ang1w7yGdVVs"
+      },
+      "source": [
+        "The best way to match digits is by matching against the Unicode Decimal Number Category (Nd), using a Unicode Category pattern `\\p{Nd}`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Shx_V-xsdVVs"
+      },
+      "outputs": [],
+      "source": [
+        "%%\n",
+        "hindiNumber := \"१२३४५६७८९०\"\n",
+        "digitRegex := regexp.MustCompile(\"\\\\p{Nd}+\")\n",
+        "matches := digitRegex.MatchString(hindiNumber)\n",
+        "\n",
+        "fmt.Println(matches)"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "include_colab_link": true,
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "base",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.11.5"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}