boyer moore search Algorithm
The Boyer-Moore search algorithm is an efficient string matching technique, used to search for the occurrence of a pattern within a given text. This algorithm, developed by Robert S. Boyer and J Strother Moore in 1977, has become one of the standard algorithms for string searching due to its impressive performance in practice. It works by pre-processing the pattern and using this information to shift the pattern more intelligently, skipping over portions of the text that cannot contain the pattern. This results in fewer character comparisons and faster search times compared to other traditional string matching algorithms, such as the naïve approach or the Knuth-Morris-Pratt algorithm.
The Boyer-Moore algorithm's efficiency comes from two main heuristics: the bad character rule and the good suffix rule. The bad character rule allows the algorithm to shift the pattern by using information about the occurrence of a mismatched character in the pattern itself. This is done by pre-processing the pattern and creating a lookup table that shows how far the pattern can be shifted when a mismatch occurs. The good suffix rule, on the other hand, deals with the situation when a partial match has been found. It uses the information about the matched portion of the pattern to determine how far the pattern can be shifted to align with a potential match in the text. By combining these two heuristics, the Boyer-Moore algorithm can achieve better performance in terms of the number of character comparisons and overall search time, especially on large texts or when searching for relatively rare patterns.
#include <stdio.h>
#include <string.h>
#define NUM_OF_CHARS 256
int max(int a, int b) {return (a>b)? a:b;}
void computeArray(char *pattern, int size, int arr[NUM_OF_CHARS])
{
int i;
for(i = 0; i < NUM_OF_CHARS; i++)
arr[i] = -1;
/* Fill the actual value of last occurrence of a character */
for(i = 0; i < size; i++)
arr[(int) pattern[i]] = i;
}
/* Boyer Moore Search algorithm */
void boyer_moore_search(char *str, char *pattern)
{
int n = strlen(str);
int m = strlen(pattern);
int shift = 0;
int arr[NUM_OF_CHARS];
computeArray(pattern, m, arr);
while(shift <= (n - m))
{
int j = m - 1;
while (j >= 0 && pattern[j] == str[shift + j])
j--;
if (j < 0)
{
printf("--Pattern is found at: %d\n", shift);
shift += (shift + m < n) ? m - arr[str[shift + m]] : 1;
} else {
shift += max(1, j - arr[str[shift +j]]);
}
}
}
int main()
{
char str[] = "AABCAB12AFAABCABFFEGABCAB";
char pat1[] = "ABCAB";
char pat2[] = "FFF"; /* not found */
char pat3[] = "CAB";
printf("String test: %s\n", str);
printf("Test1: search pattern %s\n", pat1);
boyer_moore_search(str, pat1);
printf("Test2: search pattern %s\n", pat2);
boyer_moore_search(str, pat2);
printf("Test3: search pattern %s\n", pat3);
boyer_moore_search(str, pat3);
return 0;
}