Skip to content

Files

Latest commit

Aug 24, 2024
34b7b0c · Aug 24, 2024

History

History
44 lines (33 loc) · 2.98 KB

README.md

File metadata and controls

44 lines (33 loc) · 2.98 KB

String-matching


Overview

CLRS Topic
32.1 Naive String-matching
32.2 Rabin-Karp Algorithm
32.3 String-matching Automata
32.4 Knuth-Morris-Pratt Algorithm
32.5 [ed4] Suffix Arrays

String-matching Problem

String matching is the problem of finding all occurrences of a string pattern P of length m in a text T of lenght n , where m n . The characters in the pattern and the text come from a finite set Σ called the alphabet .

We say that pattern P occurs with shift s in text T if 0 s n m and T [ s : s + m 1 ] = P [ 0 : m 1 ] . If P occurs with shift s in T , then s is called a valid shift for P in T , otherwise s is called an invalid shift for P in T .

In other words, the string-matching problem is to find all valid shifts for P in T , i.e. the shift set of P in T . Using the terminology below, the string-matching problem is to find all shifts  s such that  P T s + m , where 0 s n m and T s + m is the prefix of T of length s + m .


Terminology and properties

  • Σ : the set of all finite-length strings over Σ
  • ϵ : the empty string, i.e. the string of length 0
  • | x | : the length of string x
  • x y : the concatenation of strings x and y , with | x y | = | x | + | y |
  • T q : the q-character prefix of string T , i.e. T q = T [ 0 : q 1 ]
  • x y : string x is a prefix of string y , i.e. y = x z for some string z Σ
  • x y : string x is a suffix of string y , i.e. y = z x for some string z Σ
  • x y | x | | y |
  • x y | x | | y |
  • x y | x | < | y | : x is a proper prefix of y
  • x y | x | < | y | : x is a proper suffix of y
  • x Σ ϵ x
  • x Σ ϵ x
  • a Σ : x y x a y a
  • z Σ : x y y z x z
  • z Σ : x y y z x z