Moved to Wordpress

This blog has moved to Wordpress. GOTO the new blog now.

There will be no new updates here. Most of the posts have also been expanded.

2010/08/31

What In The Hell Are Strings

Strings are a data type that consist of a sequence of characters. The simplest way to think about strings is that they are text. Strings are usually formed by putting quotes around some text like so, "This is a string".

Strings come in two varieties. Terminated strings use a special character to signal the end of the string. This character is not allowed to appear in the string, as it would cause the string to end, leaving out the rest. Strings can also be implemented using a length field. These strings do not need a terminating character as the length of the string is stored with the string. In many languages you will not be able to tell which type of string you are using and it is usually irrelevant. However some languages, like C, require you to place the termination character ('\0') manually.

Strings can use a variety of encodings for the characters contained within. The two most common are ASCII, which can represent 128 different characters, and Unicode, which has support for over 100,000 different characters and allows strings to hold non-English characters. Both forms of encoding are popular today, however most applications are moving toward Unicode characters as the need for international software grows.

Another common type of string is called a "string literal". A string literal is a string (usually enclosed in quotes) that appears directly in the source code. String literals are usually immutable, which means that you can not modify them. More information on string literals can be found here.

One common problem with strings is representing characters that would otherwise be interpreted by the language to mean something else. For instance the string "Have you read the book "1984" by George Orwell?" would be interpreted as 3 expressions; (1) The string: "Have you read the book " (2) The number: 1984 (3) The string: " by George Orwell?". This is obviously not what we want. In this case we would use an escape character to stop the quotes around "1984" from terminating the string. Our new string would look something like this (this may vary from language to language): "Have you read the book \"1984\" by George Orwell?". In this string the '\' character tells the language not to end the string when it sees the following ".


Terminology for Strings


Concatenation:
Joining two strings together to form one string is called concatenation.

Substring:
A substring is a part of a larger string. For example, in the string "Hello World" one possible substring would be "Hello", and another could be "llo Wo".



C

Strings in C are represented as a null-terminated array of characters. To create a string you create an array, fill it with characters and place a '\0' character directly after the last character in the string. If you are using string literals to create a string you do not have to supply the size or the '\0' character, the compiler will do that for you.

Create a string:
char s[6] = {'H', 'e', 'l', 'l', 'o', '\0'}
char sl[] = "Hello World"

Escape Character:
char e[50] = "Have you read the book \"1984\" by George Orwell?"

Python

Strings in Python are enclosed in quotes. You can use one of three different quoting styles to create a string. Single quotes are useful when you have double quote characters in the string. Triple quotes allow you to place newlines in the string.

Create a string:
a = "Hello World"
b = 'Have you read "1984" by George Orwell?'
c = """Materials:
Pen
Paper"""

Escape Character:
d = "Have you read the book \"1984\" by George Orwell?"

Scheme

Strings in Scheme are enclosed in quotes or created with the 'string' function. Newlines may be embedded in any string.


Create a string:

(define a "Hello World")

(define b (string #\H #\e #\l #\l #\o))


Escape Character:

(define c "Have you read the book \"1984\" by George Orwell?")



Links

What in the heck is: a string - Dan Sugalski has a much more through discussion of strings on his blog "Squawks of the Parrot". Incidentally it was his "What in the heck" series that inspired me to begin writing the WITH series.

Wikipedia also has an article on strings with lots of links to concepts related to them.

No comments:

Post a Comment