-
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Convert regular expression examples to Go
- Loading branch information
Showing
1 changed file
with
259 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,259 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"colab_type": "text", | ||
"id": "view-in-github" | ||
}, | ||
"source": [ | ||
"<a href=\"https://colab.research.google.com/github/sualeh/What-a-Character/blob/go/Notebooks/5_go_unicode_pattern_matching.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"----------\n", | ||
"\n", | ||
"## Google Colab\n", | ||
"\n", | ||
"You can run this notebook in Google Colab. The cell below should be run only once, and then followed by a change of runtime to `Go (gonb)`. Refresh the browser before running any subsequent code. If you are not running the notebook in Google Colab, skip this section." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "polyglot-notebook" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#@title Prepare Google Colab for Go Kernel\n", | ||
"\n", | ||
"# Install Go and goimports.\n", | ||
"!echo -n \"Installing go ...\"\n", | ||
"!mkdir -p cache\n", | ||
"!wget -q -O cache/go.tar.gz 'https://go.dev/dl/go1.22.2.linux-amd64.tar.gz'\n", | ||
"!tar xzf cache/go.tar.gz\n", | ||
"%env GOROOT=/content/go\n", | ||
"!ln -sf \"/content/go/bin/go\" /usr/bin/go\n", | ||
"!echo \" done.\"\n", | ||
"!go version\n", | ||
"\n", | ||
"# Install gonb, goimports, gopls.\n", | ||
"!echo -n \"Installing gonb ...\"\n", | ||
"!go install github.com/janpfeifer/gonb@latest >& /tmp/output || cat /tmp/output\n", | ||
"!echo \" done.\"\n", | ||
"!ln -sf /root/go/bin/gonb /usr/bin/gonb\n", | ||
"\n", | ||
"!echo -n \"Installing goimports ...\"\n", | ||
"!go install golang.org/x/tools/cmd/goimports@latest >& /tmp/output || cat /tmp/output\n", | ||
"!echo \" done.\"\n", | ||
"!ln -sf /root/go/bin/goimports /usr/bin/goimports\n", | ||
"\n", | ||
"!echo -n \"Installing gopls ...\"\n", | ||
"!go install golang.org/x/tools/gopls@latest >& /tmp/output || cat /tmp/output\n", | ||
"!echo \" done.\"\n", | ||
"!ln -sf /root/go/bin/gopls /usr/bin/gopls\n", | ||
"\n", | ||
"# Install gonb kernel configuration.\n", | ||
"!gonb --install --logtostderr\n", | ||
"!echo \"Done!\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"----------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "Auhkou2CdVVl" | ||
}, | ||
"source": [ | ||
"# Unicode Pattern Matching" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "ioVqHQLkdVVm" | ||
}, | ||
"source": [ | ||
"## Case Insensitive Matching" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "qoWvPXGJdVVo" | ||
}, | ||
"source": [ | ||
"In Greek, the word for dog in lowercase is \"σκύλος\". Notice that the first and last letter are both sigma." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "Rvfkcl8JdVVo" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%\n", | ||
"patternGreek := regexp.MustCompile(\"(?iU)σκύλος\")\n", | ||
"matches := patternGreek.MatchString(\"ΣΚΎΛΟΣ\")\n", | ||
"\n", | ||
"fmt.Println(matches)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "cjQ2WOGNdVVp" | ||
}, | ||
"source": [ | ||
"When a lowercase character results in more than one uppercase character, there is no match." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "xNnL9Xf7dVVp" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%\n", | ||
"patternGerman := regexp.MustCompile(\"(?iU)straße\"\n", | ||
"matches := patternGerman.MatchString(\"STRASSE\")\n", | ||
"\n", | ||
"fmt.Println(matches)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "detk1R7DdVVq" | ||
}, | ||
"source": [ | ||
"## Matching Numbers" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "Er1YlQy3dVVq" | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "2xJQaHyxdVVr" | ||
}, | ||
"source": [ | ||
"A naive match with a range of digits `[0-9]` does not work." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "CVdo8t6ndVVr" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%\n", | ||
"hindiNumber := \"१२३४५६७८९०\"\n", | ||
"digitRegex := regexp.MustCompile(\"[0-9]+\")\n", | ||
"matches := digitRegex.MatchString(hindiNumber)\n", | ||
"\n", | ||
"fmt.Println(matches)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "kdtZo30XdVVr" | ||
}, | ||
"source": [ | ||
"A slightly better regular expression with a `\\d` pattern works." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "DsNUXh3jdVVr" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%\n", | ||
"hindiNumber := \"१२३४५६७८९०\"\n", | ||
"digitRegex := regexp.MustCompile(\"\\\\d+\")\n", | ||
"matches := digitRegex.MatchString(hindiNumber)\n", | ||
"\n", | ||
"fmt.Println(matches)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "ang1w7yGdVVs" | ||
}, | ||
"source": [ | ||
"The best way to match digits is by matching against the Unicode Decimal Number Category (Nd), using a Unicode Category pattern `\\p{Nd}`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "Shx_V-xsdVVs" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%\n", | ||
"hindiNumber := \"१२३४५६७८९०\"\n", | ||
"digitRegex := regexp.MustCompile(\"\\\\p{Nd}+\")\n", | ||
"matches := digitRegex.MatchString(hindiNumber)\n", | ||
"\n", | ||
"fmt.Println(matches)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"colab": { | ||
"include_colab_link": true, | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"display_name": "base", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |