String Matching Analysis¶
Consider a string of finite length Let it be
.
Finding whether a string
of length
exsists in
is known as String Matching, Following is some of the
comparision based String Matching Algorithms.
- Brute Force String Matching Algorithm
- Horspool String Matching
- Boyer - Moore String Matching
Before looking at the analysis part, we shall examine the Language in built methods to sorting
The in
operator and str.index()
¶
We have already seen the in
operator in several contexts. Let’s see
the working of in
operator again
In [1]:
x = 'this is some random text used for illustrative purposes'
In [2]:
x
Out[2]:
'this is some random text used for illustrative purposes'
In [3]:
'this' in x
Out[3]:
True
In [4]:
'not' in x
Out[4]:
False
In [5]:
x.index('is')
Out[5]:
2
In [6]:
x.index('not')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-a1f052cc5af7> in <module>()
----> 1 x.index('not')
ValueError: substring not found
Standard import
statement¶
In [7]:
from openanalysis.string_matching import StringMatchingAlgorithm,StringMatchingAnalyzer
StringMatchingAlgorithm
is the base class providing the standards to
implement sorting algorithms, SearchVisualizer
visualizes and
analyses the algorithm
StringMatchingAlgorithm
class¶
Any String Matching Algorithm, which has to be implemented, has to be derived from this class. Now we shall see data members and member functions of this class.
Data Members¶
name
- Name of the Searching Algorithmcount
- Holds the number of basic operations performed
Member Functions¶
__init__(self, name):
- Initializes algorithm with aname
match(self, text, pattern)
_ The base String Matching function. Setscount
to 0.
An example …. Horspool String Matching Algorithm¶
Now we shall implement the class Horspool
In [10]:
class Horspool(StringMatchingAlgorithm): # Must derive from StringMatchingAlgorithm
def __init__(self):
StringMatchingAlgorithm.__init__(self, "Hosrpool String Matching")
self.shift_table = {}
self.pattern = ''
def generate_shift_table(self, pattern): # class is needed to include helper methods
self.pattern = pattern
for i in range(0, len(pattern) - 1):
self.shift_table.update({pattern[i]: len(pattern) - 1 - i})
def match(self, text: str, pattern: str):
StringMatchingAlgorithm.match(self, text, pattern)
self.generate_shift_table(pattern)
i = len(self.pattern) - 1
while i < len(text):
j = 0
while j < len(self.pattern) and text[i-j] == self.pattern[len(self.pattern)-1-j]:
j += 1
self.count += j # Increment count here
if j == len(self.pattern):
return i-len(self.pattern)+1
if text[i] in self.shift_table:
i += self.shift_table[text[i]]
else:
i += len(self.pattern)
return -1
StringMatchingAnalyzer
class¶
This class provides the visualization and analysis methods. Let’s see its methods in detail
__init__(self, matching):
Initializes visualizer with a String Matching Algorithm.searcher
is a class, which is derived fromStringMatchingAlgorithm
analyze(self):
- Plots the number of basic operations performed
- Both Text length and Pattern Length are varied
- Samples are chosen randomly from pre defined large data
In [11]:
StringMatchingAnalyzer(Horspool).analyze()
please wait while analysing...
Note performance of algorithm