FIND2_C

Authors

Publication

Pub Details

Date

Pages

See all articles from QL Hacker's Journal 2

I have written a C program that will search for a particular string in a file. My first version of the program worked only on text file. The second version was designed to work on Quill files, or any binary file.

The reason for these two programs, is that I was trying to help a friend index some Quill files he has. Not wanting to write a long complex program in a language I was just learning, I decided to write a short little program that will search a binary file for every occurance of a string and report where it found the string.

The listings below is the result of my efforts. Both programs start off by asking for the file to search through and the string to search for. If the file is not opened (the file does not exist) the program aborts. If the file is opened, the programs then proceeds to read in each character of the file, one at a time.

The search algorithm is quick but weak. There are some cases that it will miss. Essentially it has a pointer that points to the character in the string to compare against. This is initialy set to one so it starts comparing with the first character of the string. If the input character is equal to the first character, the pointer is incremented so it will compare with the second character. This goes on until the pointer is greater than the string. At this point the string comparison has been successfull. If at any point a single character comparison fails, the pointer is set back to the first character.

This algorithm will miss strings like abc in ababc. This can be fixed, but I’ll leave it up to the reader on how to do this. The Knuth-Morris-Pratt algorithm is the best for string search, but it is not simple to implement in a new language (sometimes not even in an old language).

The second program prints out the byte offset of the beginning of the string. This means that the start of the string was found at the Nth byte of the file. Since the program works on binary files (such as Quill) the program could not return the line number of the string. The first version of the program does print out the line number of the matching line.

I have also modified the second program so that its results are sent to a “log” file, so that a permanent record can be kept of the searches.

Program Version 1:

/*  Find_c
This program searches any file for a certain string
that the user has input. Every time the program
finds the string in the file, it prints to the
screen the offset of where the string started
The program is case insensitive. All characters
are converted to upper case.
The algorithm does have some problems. It will miss
the string abc in ababc.
*/

#include <stdio_h>

main() {
char c,
string[20],
file[30];

int count,
file_count,
str_len,
fd;

/* abcdefghijk test string*/

count = 0;
file_count = 0;

printf("Enter File Name : \n");
gets(file);

printf("Enter Search String : \n");
gets(string);

fd = fopen(file,"r");
if (fd == NULL) {
printf("Did not open file");
abort(1);
}

str_len = strlen(string);

while (( c = getc(fd)) != EOF) {
++file_count;
/* make sure a character is upper case */
if (isascii(c)) c=toupper(c);
if ( c == toupper(string[count]))
++count;
else
count = 0;

if (count == str_len)
printf("String found at %d\n",
file_count-str_len+1);

}
}

Program Version 2:

/*  Find2_c
This program searches any file for a certain string
that the user has input. Every time the program
finds the string in the file, it prints to a
file the keyword and offsets of where the string
started.
The program is case insensitive. All characters
are converted to upper case.
The algorithm does have some problems. It will miss
the string abc in ababc.
The program expects a file called find2_log to exist
on flp1_.
*/

#include <stdio_h>

main() {
char c,
string[20],
file[30];

int count,
file_count,
str_len,
fd,
fd2;

/* abcdefghijk test string */

count = 0;
file_count = 0;

printf("Enter File Name : \n");
gets(file);

printf("Enter Search String : \n");
gets(string);

fd = fopen(file,"r");
if (fd == NULL) {
printf("Did not open file");
abort(1);
}

fd2 = fopen("flp1_find2_log","a");

fputs(string,fd2);
fprintf(fd2,"\n ");
fputs(file,fd2);

str_len = strlen(string);

while (( c = getc(fd)) != EOF) {
++file_count;
/* make sure a character is upper case */
if (isascii(c)) c=toupper(c);
if ( c == toupper(string[count]))
++count;
else
count = 0;

if (count == str_len) {
printf(" String found at %d\n",
file_count-str_len+1);
fprintf(fd2,", %d",file_count-str_len+1);
}
}
fprintf(fd2,"\n");
fclose(fd2);

}

Products

 

Downloadable Media

 
Scroll to Top