Strings

A string is simply an array of characters with one key difference.
Strings in C are NULL terminated.
This means the last character of a string is the NULL character: ”.
This feature is very useful to us due to the fact that we don’t need to know the length of the string in order to traverse it.
Recall that with arrays, if we wish to traverse an array we would use a for loop, starting from index i = 0, until N, where N is the length of the array.
However with strings, we simply start at the beginning and traverse the string until we reach the NULL character.
To make a string we do the following:

   char* myStr = (char*)malloc(N*sizeof(char));

Note that the last character will be NULL. Hence, you need to account for that be reserving 1 extra character.

Reading into Strings
Okay, so now that we declared a string, we can ask a user for input.
Previously we’ve used scanf() for input, however, this is a poor option when reading in strings for 2 reasons.
Firstly, scanf() does not read white-space. That means if you try to input a phrase like “Hello World”, scanf() will only read the word “Hello”.
The second issue is that Scanf() does not check bounds.
What that means is that if we use Malloc to reserve space for 5 elements, we are allowed to input a 4 character string. The last one is for the NULL character.
However, if we use scanf() and enter “abcdefghijk” as an input, Scanf() will accept this. As a result you will use up the 4 characters you allocated to hold “abcd” this will be followed by NULL, than the rest of the input will also be stored in memory, however you won’t know where. This might overwrite existing information. To avoid this issue we use a different input function.
fgets().
The fgets() function accepts 3 arguments.
1) pointer to char. This is the string you want to read into.
2) integer. This is the size of the string you’re reading into, the size here accounts for the NULL character. Hence, if you do N * sizeof(char) in Malloc, you would provide fgets() with N as the size.
3) Input source. Here you can specify where the input is coming from, it can be from a specific file in the system, or it can just be entered from the keyboard. We will use the latter, hence for this parameter you enter: stdin.
So let’s look at an example.

    char* myStr = (char*)malloc(5 * sizeof(char));
    fgets(myStr, 5, stdin);

Examples
Okay, now we can look at some examples.
Write a program to calculate the length of the string, and return it as an integer.
Note that, a string can be 30 characters, but only have the first 4 used. Hence we want to return the length of that first word.

int stringLen(char * s)
{
	int i=0;
	while (*s != '\0' && *s != 'n')
	{
		s++;
		i++;
	}
	return i;	
}

This code is very straightforward. We want to advance the string pointer, s, while it’s not pointing to a new line character or the NULL character, while doing this we increment a counter, i.
If we want to shorten the code, we can write this in a single line using a for-loop.

   int i;
   for (i = 0; *s != NULL && *s != 'n'; i++, s++);
   return i;

Another example.
Write a program that accepts a char* string as an input from a user, and checks of the string is a palindrome.
A palindrome is a string that is identical forward and backward. For example: radar is a palindrome as it is read the same left to right and right to left.
The function should return true if the string is a palindrome, false otherwise.
Note, here we will use the above function we just wrote to find the length of the string.
We then assign one pointer to the head of the string, and one to the tail of the string. We just compare these pointers to each other, using the condition that the head pointer is always less than the tail pointer, because if they intersect or meet, than we have traversed every character. After each comparison, increment head and decrement tail.
How do you assign a pointer to the tail of a string?
Use the above function to get the length, than add that to the head.
Let’s look at the code.

  bool isPalindrome(char* s)
  {
	char* tail = s + stringLen(s) - 1;
	char* head = s;
	while (head < tail)
	{
		if (*head++ != *tail--)
			return false;
	}
	return true;
  } 

As you can see, head is a pointer to the beginning of the string, and tail points to the last character before the NULL character.
See diagram below.
palindrome1

Okay, let’s say the user enters “radar” as the input.
Let’s see how the program works using diagrams.
First we have the following:
palindrome2
Okay, let’s look at the main part of our code.

while (head < tail)
	{
		if (*head++ != *tail--)
			return false;
	}
	return true;

So, looking at the diagram, is head’s position to the left of tail?
Yes it is.
So let’s enter the loop and check the values of the head and tail.
*head is ‘r’
*tail is ‘r’
Since the if-condition does not evaluate to true, we don’t return false.
Just perform the post-increment and decrement.
head++ and tail–
Let’s see the diagram again.
palindrome3
Okay, now we’ve advanced head and brought tail back.
Let’s check that while-loop condition again.
Is head’s position to the left of tail?
Yes it is.
Let’s enter the loop and compare their values.
*head is ‘a’
*tail is ‘a’
Once again, the if-condition does not evaluate to true, so we don’t return anything.
Just perform post increment and decrement.
head++ and tail–
Let’s see what happened.
palindrome4
Now, we’ve advanced head and brought tail back to a point at which they are pointing to the same position in the array.
Hence, head < tail evaluates to false. We exit the loop and immediately return true.

Exam Example
Here is an example from the 2012 APS105 exam.

Question 12 [8 Marks]
Write a C function called checkPlagiarism, the prototype of which is given below, that returns true if two suspected input codes (code1) and (code2) have high similarity. High similarity is defined as matching exactly, but ignoring any spaces (’ ’) or return characters (’n’). For example, the function checkPlagiarism() returns true when comparing the example strings c1 and c2 below. Do not use recursion in your solution. You can assume that code1 and code2 are null-terminated strings. Hint: Your code should return false as soon as it finds evidence of
a mis-match.

#include <stdio.h>
#include <stdbool.h>
bool checkPlagiarism(char *code1, char *code2);
int main(void)
{
char c1[] = "int main(void){n int x = 10;n int z = x + 5;n return 0;n}n";
char c2[] = "int main(void){n int x=10;n int z=x+5;nnn return 0;n}n";
printf("%dn", checkPlagiarism(c1, c2));
}

The solution is very straightforward. Take both strings, put them in a while-loop.
While neither has reached the terminating NULL character, compare characters. If they are identical, advance both. When you get to a point when they are not equal, check to see if you have new line characters or spaces that are causing the mismatch. If so, than ignore and just advance the pointers. Otherwise, the strings aren’t equal, return false.
If we exit this loop successfully and haven’t returned false yet, it means that the strings are identical and can return true.
Here’s the code:

bool checkPlagiarism(const char *code1, const char *code2)
{
	while ((*code1 != NULL) && (*code2 != NULL))
	{
		if (*code1 == *code2)
		{
			code1++;
			code2++;
		}
		else
		{
			if (*code1 == 'n' || *code1 == ' ')
				code1++;
			else if (*code2 == 'n' || *code2 == ' ')
				code2++;
			else
				return false;
		}
	}
	return true;
}

Only other thing to note is at the function header, I have the keyword “const” before the char* for both arguments. This is simply to ensure that I don’t modify either string in the function.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s