diff --git a/README.md b/README.md index 9651e15..ec9cd78 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ * [Tiếng Việt](translations/README-vn.md) * [فارسی](translations/README-fa.md) * [עברית](translations/README-he.md) - +* [Bahasa Indonesia](translations/README-id.md) ## What is Regular Expression? diff --git a/translations/README-id.md b/translations/README-id.md new file mode 100644 index 0000000..7fd6340 --- /dev/null +++ b/translations/README-id.md @@ -0,0 +1,601 @@ +
+
+
+
+
+
+
+
+
+
+
+
+"the" => The fat cat sat on the mat. ++ +[Uji Ekspresi reguler](https://regex101.com/r/dmRygT/1) + +Ekspresi reguler `123` cocok dengan string `123`. Ekspresi reguler adalah +dicocokkan dengan string input dengan membandingkan setiap karakter di reguler +ekspresi ke setiap karakter dalam string input, satu demi satu. Reguler +ekspresi biasanya peka huruf besar/kecil sehingga ekspresi reguler `The` akan +tidak cocok dengan string `the`. + +
+"The" => The fat cat sat on the mat. ++ +[Uji Ekspresi reguler](https://regex101.com/r/1paXsy/1) + +## 2. Karakter Meta + +Karakter meta adalah blok bangunan ekspresi reguler. Meta +karakter tidak berdiri sendiri tetapi sebaliknya ditafsirkan dalam beberapa +cara spesial. Beberapa karakter meta memiliki arti khusus dan tertulis di dalamnya +tanda kurung siku. Karakter metanya adalah sebagai berikut: + +|Karakter Meat|Deskripsi| +|:----:|----| +|.|Titik cocok dengan karakter tunggal apa pun kecuali jeda baris.| +|[ ]|Kelas karakter. Cocok dengan karakter apa pun yang ada di antara tanda kurung siku.| +|[^ ]|Kelas karakter yang dinegasikan. Cocok dengan karakter apa pun yang tidak ada di antara tanda kurung siku| +|*|Mencocokkan 0 atau lebih pengulangan dari simbol sebelumnya.| +|+|Mencocokkan 1 atau lebih pengulangan dari simbol sebelumnya.| +|?|Menjadikan simbol sebelumnya opsional.| +|{n,m}|Braces. Cocok setidaknya "n" tetapi tidak lebih dari "m" pengulangan simbol sebelumnya.| +|(xyz)|Kelompok karakter. Mencocokkan karakter xyz dalam urutan yang tepat.| +|||AAlternasi. Cocok dengan karakter sebelum atau karakter setelah simbol.| +|\|Escape karakter berikutnya. Ini memungkinkan Anda untuk mencocokkan karakter yang dipesan
[ ] ( ) { } . * + ? ^ $ \ |
|
+|^|Cocok dengan awal input.|
+|$|Cocok dengan akhir input.|
+
+## 2.1 Tanda Titik
+
+Tanda titik `.` adalah contoh paling sederhana dari karakter meta. Karakter meta `.`
+cocok dengan karakter tunggal apa pun. Itu tidak akan cocok dengan karakter kembali atau baris baru.
+Misalnya, ekspresi reguler `.ar` berarti: karakter apa pun, diikuti oleh
+huruf `a`, diikuti dengan huruf `r`.
+
++".ar" => The car parked in the garage. ++ +[Uji Ekspresi Reguler](https://regex101.com/r/xc9GkU/1) + +## 2.2 Character Sets + +Character sets are also called character classes. Square brackets are used to +specify character sets. Use a hyphen inside a character set to specify the +characters' range. The order of the character range inside the square brackets +doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase +`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`. + +
+"[Tt]he" => The car parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/2ITLQ4/1) + +A period inside a character set, however, means a literal period. The regular +expression `ar[.]` means: a lowercase character `a`, followed by the letter `r`, +followed by a period `.` character. + +
+"ar[.]" => A garage is a good place to park a car. ++ +[Test the regular expression](https://regex101.com/r/wL3xtE/1) + +### 2.2.1 Negated Character Sets + +In general, the caret symbol represents the start of the string, but when it is +typed after the opening square bracket it negates the character set. For +example, the regular expression `[^c]ar` means: any character except `c`, +followed by the character `a`, followed by the letter `r`. + +
+"[^c]ar" => The car parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/nNNlq3/1) + +## 2.3 Repetitions + +The meta characters `+`, `*` or `?` are used to specify how many times a +subpattern can occur. These meta characters act differently in different +situations. + +### 2.3.1 The Star + +The `*` symbol matches zero or more repetitions of the preceding matcher. The +regular expression `a*` means: zero or more repetitions of the preceding lowercase +character `a`. But if it appears after a character set or class then it finds +the repetitions of the whole character set. For example, the regular expression +`[a-z]*` means: any number of lowercase letters in a row. + +
+"[a-z]*" => The car parked in the garage #21. ++ +[Test the regular expression](https://regex101.com/r/7m8me5/1) + +The `*` symbol can be used with the meta character `.` to match any string of +characters `.*`. The `*` symbol can be used with the whitespace character `\s` +to match a string of whitespace characters. For example, the expression +`\s*cat\s*` means: zero or more spaces, followed by a lowercase `c`, +followed by a lowercase `a`, followed by a lowercase `t`, +followed by zero or more spaces. + +
+"\s*cat\s*" => The fat cat sat on the concatenation. ++ +[Test the regular expression](https://regex101.com/r/gGrwuz/1) + +### 2.3.2 The Plus + +The `+` symbol matches one or more repetitions of the preceding character. For +example, the regular expression `c.+t` means: a lowercase `c`, followed by +at least one character, followed by a lowercase `t`. It needs to be +clarified that`t` is the last `t` in the sentence. + +
+"c.+t" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/Dzf9Aa/1) + +### 2.3.3 The Question Mark + +In regular expressions, the meta character `?` makes the preceding character +optional. This symbol matches zero or one instance of the preceding character. +For example, the regular expression `[T]?he` means: Optional uppercase +`T`, followed by a lowercase `h`, followed by a lowercase `e`. + +
+"[T]he" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/cIg9zm/1) + +
+"[T]?he" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/kPpO2x/1) + +## 2.4 Braces + +In regular expressions, braces (also called quantifiers) are used to +specify the number of times that a character or a group of characters can be +repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least +2 digits, but not more than 3, ranging from 0 to 9. + +
+"[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0. ++ +[Test the regular expression](https://regex101.com/r/juM86s/1) + +We can leave out the second number. For example, the regular expression +`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma, the +regular expression `[0-9]{3}` means: Match exactly 3 digits. + +
+"[0-9]{2,}" => The number was 9.9997 but we rounded it off to 10.0. ++ +[Test the regular expression](https://regex101.com/r/Gdy4w5/1) + +
+"[0-9]{3}" => The number was 9.9997 but we rounded it off to 10.0. ++ +[Test the regular expression](https://regex101.com/r/Sivu30/1) + +## 2.5 Capturing Groups + +A capturing group is a group of subpatterns that is written inside parentheses +`(...)`. As discussed before, in regular expressions, if we put a quantifier +after a character then it will repeat the preceding character. But if we put a quantifier +after a capturing group then it repeats the whole capturing group. For example, +the regular expression `(ab)*` matches zero or more repetitions of the character +"ab". We can also use the alternation `|` meta character inside a capturing group. +For example, the regular expression `(c|g|p)ar` means: a lowercase `c`, +`g` or `p`, followed by `a`, followed by `r`. + +
+"(c|g|p)ar" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/tUxrBG/1) + +Note that capturing groups do not only match, but also capture, the characters for use in +the parent language. The parent language could be Python or JavaScript or virtually any +language that implements regular expressions in a function definition. + +### 2.5.1 Non-Capturing Groups + +A non-capturing group is a capturing group that matches the characters but +does not capture the group. A non-capturing group is denoted by a `?` followed by a `:` +within parentheses `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to +`(c|g|p)ar` in that it matches the same characters but will not create a capture group. + +
+"(?:c|g|p)ar" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/Rm7Me8/1) + +Non-capturing groups can come in handy when used in find-and-replace functionality or +when mixed with capturing groups to keep the overview when producing any other kind of output. +See also [4. Lookaround](#4-lookaround). + +## 2.6 Alternation + +In a regular expression, the vertical bar `|` is used to define alternation. +Alternation is like an OR statement between multiple expressions. Now, you may be +thinking that character sets and alternation work the same way. But the big +difference between character sets and alternation is that character sets work at the +character level but alternation works at the expression level. For example, the +regular expression `(T|t)he|car` means: either (an uppercase `T` or a lowercase +`t`, followed by a lowercase `h`, followed by a lowercase `e`) OR +(a lowercase `c`, followed by a lowercase `a`, followed by +a lowercase `r`). Note that I included the parentheses for clarity, to show that either expression +in parentheses can be met and it will match. + +
+"(T|t)he|car" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/fBXyX0/1) + +## 2.7 Escaping Special Characters + +A backslash `\` is used in regular expressions to escape the next character. This +allows us to include reserved characters such as `{ } [ ] / \ + * . $ ^ | ?` as matching characters. To use one of these special character as a matching character, prepend it with `\`. + +For example, the regular expression `.` is used to match any character except a +newline. Now, to match `.` in an input string, the regular expression +`(f|c|m)at\.?` means: a lowercase `f`, `c` or `m`, followed by a lowercase +`a`, followed by a lowercase `t`, followed by an optional `.` +character. + +
+"(f|c|m)at\.?" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/DOc5Nu/1) + +## 2.8 Anchors + +In regular expressions, we use anchors to check if the matching symbol is the +starting symbol or ending symbol of the input string. Anchors are of two types: +The first type is the caret `^` that checks if the matching character is the first +character of the input and the second type is the dollar sign `$` which checks if a matching +character is the last character of the input string. + +### 2.8.1 The Caret + +The caret symbol `^` is used to check if a matching character is the first character +of the input string. If we apply the following regular expression `^a` (meaning 'a' must be +the starting character) to the string `abc`, it will match `a`. But if we apply +the regular expression `^b` to the above string, it will not match anything. +Because in the string `abc`, the "b" is not the starting character. Let's take a look +at another regular expression `^(T|t)he` which means: an uppercase `T` or +a lowercase `t` must be the first character in the string, followed by a +lowercase `h`, followed by a lowercase `e`. + +
+"(T|t)he" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/5ljjgB/1) + +
+"^(T|t)he" => The car is parked in the garage. ++ +[Test the regular expression](https://regex101.com/r/jXrKne/1) + +### 2.8.2 The Dollar Sign + +The dollar sign `$` is used to check if a matching character is the last character +in the string. For example, the regular expression `(at\.)$` means: a +lowercase `a`, followed by a lowercase `t`, followed by a `.` +character and the matcher must be at the end of the string. + +
+"(at\.)" => The fat cat. sat. on the mat. ++ +[Test the regular expression](https://regex101.com/r/y4Au4D/1) + +
+"(at\.)$" => The fat cat. sat. on the mat. ++ +[Test the regular expression](https://regex101.com/r/t0AkOd/1) + +## 3. Shorthand Character Sets + +There are a number of convenient shorthands for commonly used character sets/ +regular expressions: + +|Shorthand|Description| +|:----:|----| +|.|Any character except new line| +|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`| +|\W|Matches non-alphanumeric characters: `[^\w]`| +|\d|Matches digits: `[0-9]`| +|\D|Matches non-digits: `[^\d]`| +|\s|Matches whitespace characters: `[\t\n\f\r\p{Z}]`| +|\S|Matches non-whitespace characters: `[^\s]`| + +## 4. Lookarounds + +Lookbehinds and lookaheads (also called lookarounds) are specific types of +***non-capturing groups*** (used to match a pattern but without including it in the matching +list). Lookarounds are used when a pattern must be +preceded or followed by another pattern. For example, imagine we want to get all +numbers that are preceded by the `$` character from the string +`$4.44 and $10.88`. We will use the following regular expression `(?<=\$)[0-9\.]*` +which means: get all the numbers which contain the `.` character and are preceded +by the `$` character. These are the lookarounds that are used in regular +expressions: + +|Symbol|Description| +|:----:|----| +|?=|Positive Lookahead| +|?!|Negative Lookahead| +|?<=|Positive Lookbehind| +|? +"(T|t)he(?=\sfat)" => The fat cat sat on the mat. + + +[Test the regular expression](https://regex101.com/r/IDDARt/1) + +### 4.2 Negative Lookahead + +Negative lookaheads are used when we need to get all matches from an input string +that are not followed by a certain pattern. A negative lookahead is written the same way as a +positive lookahead. The only difference is, instead of an equals sign `=`, we +use an exclamation mark `!` to indicate negation i.e. `(?!...)`. Let's take a look at the following +regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words +from the input string that are not followed by a space character and the word `fat`. + +
+"(T|t)he(?!\sfat)" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/V32Npg/1) + +### 4.3 Positive Lookbehind + +Positive lookbehinds are used to get all the matches that are preceded by a +specific pattern. Positive lookbehinds are written `(?<=...)`. For example, the +regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words +from the input string that come after the word `The` or `the`. + +
+"(?<=(T|t)he\s)(fat|mat)" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/avH165/1) + +### 4.4 Negative Lookbehind + +Negative lookbehinds are used to get all the matches that are not preceded by a +specific pattern. Negative lookbehinds are written `(? +"(?<!(T|t)he\s)(cat)" => The cat sat on cat. + + +[Test the regular expression](https://regex101.com/r/8Efx5G/1) + +## 5. Flags + +Flags are also called modifiers because they modify the output of a regular +expression. These flags can be used in any order or combination, and are an +integral part of the RegExp. + +|Flag|Description| +|:----:|----| +|i|Case insensitive: Match will be case-insensitive.| +|g|Global Search: Match all instances, not just the first.| +|m|Multiline: Anchor meta characters work on each line.| + +### 5.1 Case Insensitive + +The `i` modifier is used to perform case-insensitive matching. For example, the +regular expression `/The/gi` means: an uppercase `T`, followed by a lowercase +`h`, followed by an `e`. And at the end of regular expression +the `i` flag tells the regular expression engine to ignore the case. As you can +see, we also provided `g` flag because we want to search for the pattern in the +whole input string. + +
+"The" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/dpQyf9/1) + +
+"/The/gi" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/ahfiuh/1) + +### 5.2 Global Search + +The `g` modifier is used to perform a global match (finds all matches rather than +stopping after the first match). For example, the regular expression`/.(at)/g` +means: any character except a new line, followed by a lowercase `a`, +followed by a lowercase `t`. Because we provided the `g` flag at the end of +the regular expression, it will now find all matches in the input string, not just the first one (which is the default behavior). + +
+"/.(at)/" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/jnk6gM/1) + +
+"/.(at)/g" => The fat cat sat on the mat. ++ +[Test the regular expression](https://regex101.com/r/dO1nef/1) + +### 5.3 Multiline + +The `m` modifier is used to perform a multi-line match. As we discussed earlier, +anchors `(^, $)` are used to check if a pattern is at the beginning of the input or +the end. But if we want the anchors to work on each line, we use +the `m` flag. For example, the regular expression `/at(.)?$/gm` means: a lowercase +`a`, followed by a lowercase `t` and, optionally, anything except +a new line. And because of the `m` flag, the regular expression engine now matches patterns +at the end of each line in a string. + +
+"/.at(.)?$/" => The fat + cat sat + on the mat. ++ +[Test the regular expression](https://regex101.com/r/hoGMkP/1) + +
+"/.at(.)?$/gm" => The fat + cat sat + on the mat. ++ +[Test the regular expression](https://regex101.com/r/E88WE2/1) + +## 6. Greedy vs Lazy Matching +By default, a regex will perform a greedy match, which means the match will be as long as +possible. We can use `?` to match in a lazy way, which means the match should be as short as possible. + +
+"/(.*at)/" => The fat cat sat on the mat.+ + +[Test the regular expression](https://regex101.com/r/AyAdgJ/1) + +
+"/(.*?at)/" => The fat cat sat on the mat.+ + +[Test the regular expression](https://regex101.com/r/AyAdgJ/2) + + +## Contribution + +* Open a pull request with improvements +* Discuss ideas in issues +* Spread the word +* Reach out with any feedback [](https://twitter.com/ziishaned) + +## License + +MIT © [Zeeshan Ahmad](https://twitter.com/ziishaned)