Moved to Wordpress

This blog has moved to Wordpress. GOTO the new blog now.

There will be no new updates here. Most of the posts have also been expanded.
Showing posts with label What In The Hell Series. Show all posts
Showing posts with label What In The Hell Series. Show all posts

2011/01/29

What In The Hell Are Functions

Functions go by many names depending largely upon the programming language you use. You may hear them referred to as subroutines, procedures, methods, routines and probably more. There are many names to describe functions, but in the end they all boil down to just about the same thing. A function is a method of grouping a set of instructions to preform a certain task.

The simplest way to explain this concept is with an example. Suppose you wanted a program that displays the procedure for getting the area of a circle. First let’s explore the most straightforward way to achieve this.



When you run this program it will output:

The formula for the area of a circle is Pi*r2
The radius of the circle is: 5
The area of the circle is: 78.5

OK. We’re done. But wait, what if you want to show the area of more than one circle? Three circles you say? Ok that’s easy, just cut and paste the code above three times and we get the following program.




Which will print out:

The formula for the area of a circle is Pi*r2
The radius of the circle is: 5
The area of the circle is: 78.5

The formula for the area of a circle is Pi*r2
The radius of the circle is: 10
The area of the circle is: 314.0

The formula for the area of a circle is Pi*r2
The radius of the circle is: 10
The area of the circle is: 314.0


That wasn’t too bad. But what about 1,000 circles? That certainly is a lot of copying and pasting. Fortunately for us there is a better way. It is called a function.

If you look at the code that we copied and pasted above you’ll notice a lot of similarities. We’ll use these similarities to create a function called circleRadius to print this information for us. Our new program now looks like this:




And the output is exactly the same as last time:

The formula for the area of a circle is Pi*r2
The radius of the circle is: 5
The area of the circle is: 78.5

The formula for the area of a circle is Pi*r2
The radius of the circle is: 10
The area of the circle is: 314.0

The formula for the area of a circle is Pi*r2
The radius of the circle is: 15
The area of the circle is: 706.5


The code using the function is a lot shorter than the code we copied and pasted. It is also a little more clear than the previous version. However the real benefit comes when we want to make a change to how the information is displayed. Suppose that you just wanted to print the radius and the area, like this:

radius = 5
area = 78.5

It would take you quite a lot of work to change all those copy/paste versions to make this work. But with a function you only have to change it in one place and it will work all over. Our new version looks like this:




And the output is:

radius = 5
area = 78.5

radius = 10
area = 314.0

radius = 15
area = 706.5


Notice how the “circleRadius(x)” has not changed? The power of functions extends well beyond this small example, but this should get you started writing smaller, modifiable and reusable code.



Check back for articles that cover some of the more advanced features of functions, including recursion, nesting and closures.

2010/09/15

What In The Hell Is A Hash Table

A hash table is a combination of two parts; an array, to store the data, and a hash function that computes the index of the array from a key. The best analogy to a hash table is a dictionary. When you use a dictionary you lookup a word (the key) and find the definition (the value).

Hash tables are an improvement over arrays and linked lists for large data sets. With a hash table you don’t have to search through every element in the collection (O(n) access time). Instead you pay a one time cost (the hash function) to retrieve data (O(1) access time).

The Hash Function


The purpose of the hash function is to compute an index from a key. That index is then used to store and retrieve data from the array. Many higher-level languages include some type of hash table and hash function.



As you can see from the diagram above the hash(“string”) function transforms a string into an integer which is then used as the index into the array. This index is used to store or retrieve data in the array.


The amount of time it takes to set or retrieve a value from a hash table is the time it takes to run the hash function plus the time it takes to access an element of the array. For small data sets this could take longer than simply iterating over each element of the array. However, since it is a fixed cost, for larger data sets using a hash table could be much faster.

Assuming we have a function ‘set’ that takes a key to be hashed as its first argument and a value to be stored as its second argument, we can create the following hash table.



As you can see from the above diagram we store both the key and the value in the table. The reason for this is collisions, which will be explained below.

Collisions


A collision occurs when two different keys hash to the same value. Collisions will occur in even the best designed hash tables, however there are a few reasons why they can occur more frequently. The first reason is a poor hash function that does not distribute the indices evenly throughout the table, this will be explained in more detail below. The second reason is that the array is too small. If you have a 10 element array and you are trying to store 100 key/values in the hash table, there will obviously be a lot of collisions.


In the diagram above “Puff” hashed to the same value as “Peter” causing a collision. Both “Peter” and “Puff” are now stored in the same index of the array. A certain amount of collisions are unavoidable in a hash table, so we’ll need a way to deal with them.

The simplest way to deal with collisions is to make each element of the array a linked list. Then, when a collision occurs, we simply search through the linked list to find the data we want. This way the hash function gives us a ‘starting point’ to begin our search.

With a good hash function and an appropriately sized array collisions are a rare occurrence. However it is possible, though very unlikely, that all the key/values will be stored in one index and we will end up doing a search on a linked list.

What is a good hash function


A good hash function has two properties, computing unique index values and uniform distribution. Obviously a hash function that computes the same index value for a large number of keys will cause a lot of collisions. As we saw above, collisions can break our hash table down into a search of a linked list which is O(n) compared to the hash table’s O(1). A good hash function must also produce a uniform distribution of index values. If it does not it increases the chances of collisions and wastes space.

2010/08/31

What In The Hell Are Strings

Strings are a data type that consist of a sequence of characters. The simplest way to think about strings is that they are text. Strings are usually formed by putting quotes around some text like so, "This is a string".

Strings come in two varieties. Terminated strings use a special character to signal the end of the string. This character is not allowed to appear in the string, as it would cause the string to end, leaving out the rest. Strings can also be implemented using a length field. These strings do not need a terminating character as the length of the string is stored with the string. In many languages you will not be able to tell which type of string you are using and it is usually irrelevant. However some languages, like C, require you to place the termination character ('\0') manually.

Strings can use a variety of encodings for the characters contained within. The two most common are ASCII, which can represent 128 different characters, and Unicode, which has support for over 100,000 different characters and allows strings to hold non-English characters. Both forms of encoding are popular today, however most applications are moving toward Unicode characters as the need for international software grows.

Another common type of string is called a "string literal". A string literal is a string (usually enclosed in quotes) that appears directly in the source code. String literals are usually immutable, which means that you can not modify them. More information on string literals can be found here.

One common problem with strings is representing characters that would otherwise be interpreted by the language to mean something else. For instance the string "Have you read the book "1984" by George Orwell?" would be interpreted as 3 expressions; (1) The string: "Have you read the book " (2) The number: 1984 (3) The string: " by George Orwell?". This is obviously not what we want. In this case we would use an escape character to stop the quotes around "1984" from terminating the string. Our new string would look something like this (this may vary from language to language): "Have you read the book \"1984\" by George Orwell?". In this string the '\' character tells the language not to end the string when it sees the following ".


Terminology for Strings


Concatenation:
Joining two strings together to form one string is called concatenation.

Substring:
A substring is a part of a larger string. For example, in the string "Hello World" one possible substring would be "Hello", and another could be "llo Wo".



C

Strings in C are represented as a null-terminated array of characters. To create a string you create an array, fill it with characters and place a '\0' character directly after the last character in the string. If you are using string literals to create a string you do not have to supply the size or the '\0' character, the compiler will do that for you.

Create a string:
char s[6] = {'H', 'e', 'l', 'l', 'o', '\0'}
char sl[] = "Hello World"

Escape Character:
char e[50] = "Have you read the book \"1984\" by George Orwell?"

Python

Strings in Python are enclosed in quotes. You can use one of three different quoting styles to create a string. Single quotes are useful when you have double quote characters in the string. Triple quotes allow you to place newlines in the string.

Create a string:
a = "Hello World"
b = 'Have you read "1984" by George Orwell?'
c = """Materials:
Pen
Paper"""

Escape Character:
d = "Have you read the book \"1984\" by George Orwell?"

Scheme

Strings in Scheme are enclosed in quotes or created with the 'string' function. Newlines may be embedded in any string.


Create a string:

(define a "Hello World")

(define b (string #\H #\e #\l #\l #\o))


Escape Character:

(define c "Have you read the book \"1984\" by George Orwell?")



Links

What in the heck is: a string - Dan Sugalski has a much more through discussion of strings on his blog "Squawks of the Parrot". Incidentally it was his "What in the heck" series that inspired me to begin writing the WITH series.

Wikipedia also has an article on strings with lots of links to concepts related to them.

What In The Hell Are Arrays

Arrays are one of the oldest and simplest data structures. An array is a collection of elements that can be accessed by an index. An array’s index usually starts at 0 and stops at the number of elements in the array minus 1. The diagram below shows an array of the characters in the string “HELLO”.



In the array above the ‘0’ element is the character “H”, the ‘1’ element is the character “E” and so on.

Arrays have constant time access to the elements stored within. This means that it takes the same amount of time to access any element. For example, it would take the same amount of time to access the last element of an array as it would the first, or even an element in the middle.

In lower level languages an array can only contain elements of a single type. This makes storing and retrieving elements in an array very easy. In an array of fixed sized elements the next element is simply the size of the individual elements away. For instance an array of integers, where the integers are 4 bytes long, would have the first element at address ‘0’, the second at address 4, the third at address 8 and so on. So when you want the element referenced by the index 3 you can just multiply 3 * 4 and get the address of the 3rd element.

Multi-Dimensional Array



A multi-dimensional array is simply an array of arrays. A spreadsheet is a form of multi-dimensional array. The diagram above shows a 3 element array where each element is also a 3 element array.

As mentioned above a multi-dimensional array can be useful in implementing a spreadsheet, where the first array is an array of rows and each element is an array of columns. They can be used for many things from tracking pieces in a chess game to drawing graphics on your screen and much more.



C

An array in C can contain elements of only one type/size and is not resizable.

Create an array:
int a[3] = {5, 10, 15};

Access an array element:
a[1] ==> 10

Python

An array in Python is called a list. It can contain elements of any type/size and is resizable.

Create an array:
a = [5, 10, 15]

Access an array element:
a[1] ==> 10

Scheme

An array in Scheme is called a vector. It can contain elements of any type/size and may be resizable.

Create an array:
(define a (vector 5 10 15))
(define b #(5 10 15))

Access an array element:
(vector-ref a 1) ==> 10

2009/12/10

What In The Hell Are Linked Lists?

What In The Hell Are Linked Lists?


Singly-Linked Lists


This is the first in a series of articles describing programming concepts. This series assumes you have some basic experience in programming, but attempts to describe these concepts as simplistically as possible. This series will mainly use Python as the language of choice for examples. This is not because of any perceived superiority of Python over other languages, rather Python is basically executable psudo-code so we can express the concepts simply and give you a chance to play with them inside an interpreter.

A linked list is one of the simplest data structures, at its heart it is simply a two element node (or cell). The first slot contains the data and the second slot contains a pointer to the next node. A simple illustration demonstrates this concept more concretely...


The graphic above represents the list (1, 2, 3). Each node in the list has two parts, data and pointer. The last node has a special symbol 'Nil' to tell both us and the computer that we have reached the end of the list. The python code for each node in the list is below...

class Node:

def __init__(self, data=None, next=None):
self.data = data
self.next = next


def __repr__(self):
return repr(self.data)


To make a list out of these nodes we would need to make a node for each element in the list (1, 2 and 3) and link them together.

>>> a = Node(1, Node(2, Node(3, None)))
>>> print a
1
>>> print a.next
2
>>> print a.next.next
3
>>> print a.next.next.next
None

Notice the nesting of the calls to Node above. Linked lists are an inherently nested data structure. To construct the list without having to nest the calls to Node we would have to work backwards. We have to do this because if we try to create the first node (1) there is not currently a second node to which we can point to.

c = Node(3, None)
b = Node(2, c)
a = Node(1, b)

The graphic below shows another way to look at Nodes that reflects their nested nature.


Programming languages like Lisp and Scheme make heavy use of linked lists. Consequently they have a rich library of functions for dealing with them. The code below shows how you create a linked list in Scheme using the 'cons' function. 'cons' is short for construct and the nodes are called 'cons cells'.

> (define a (cons 1 (cons 2 (cons 3 '()))))
> a
(1 2 3)

Now that we have created a list with cons (note its similarity to the Python example of Node) we can use the built-in procedures car and cdr to access elements of that list.

> (car a)
1
> (cdr a)
(2 3)
> (car (cdr a))
2
> (cdr (cdr a))
(3)
> (car (cdr (cdr a)))
3
> (cdr (cdr (cdr a)))
()

Most Scheme implementations usually define shorthand functions for performing these operations, like cadr and caddr (and many more). Many also provide more descriptive names such as first, rest, second, third...

> (cadr a)
2
> (caddr a)
3
> (cddr a)
(3)
> (first a)
1
> (rest a)
(2 3)
> (second a)
2
> (third a)
3

Constructing linked lists and accessing their elements is usually most elegantly accomplished using recursion. The Scheme code below shows how we would recursively construct a list of integers, we have to do it backward to avoid a call to (reverse).

> (define (make-list-of-integers start stop)
(let loop ((s stop) (aList '()))
(if (< s start)
aList
(loop (- s 1) (cons s aList)))))
> (make-list-of-integers 0 5)
(0 1 2 3 4 5)



Circular Linked Lists


A circular list is a linked-list where the last element's pointer is set to the first element of the list. There are many uses for circular lists (EDIT: list some uses).

The graphic below shows the nodes of a circular list.




Using the Node class that we defined above we can create a circular list in Python as follows...

> a = Node(1)
> a.next = Node(2, Node(3, a))
> a.next
2
> a.next.next
3
> a.next.next.next
1
> a.next.next.next.next
2

The Scheme code for defining a circular list is below...

> (define a (cons 1 (cons 2 (cons 3 '()))))
> (set-cdr! (cddr a) (car a))
> a
(1 2 3 . 1)



Doubly-Linked Lists


Usually the term linked list refers to singly-linked lists as we have implemented them above. However there are other ways of implementing linked lists as well. The next most common form of linked list is the doubly-linked list. This is very similar to singly-linked lists except that each node in the list also contains a pointer to the previous node as shown in the graphic below.


Below is the Python code to implement doubly-linked lists.

class dNode:
def __init__(self, prev=None, data=None, next=None):
self.prev = prev
self.data = data
self.next = next

def __repr__(self):
return str(self.data)

> a = dNode(data=1)
> b = dNode(data=2)
> c = dNode(data=3)
> d = dNode(data=4)
> a.next = b
> b.prev, b.next = a, c
> c.prev, c.next = b, d
> d.prev = c

> a
1
> a.next
2
> a.next.next
3
> a.next.next.prev
2
> a.next.next.next.prev
3

To make the doubly-linked list easier to use we would have to add some helper functions. As you can see it is fairly difficult to create a doubly-linked list, so we'll define an append function here that will make creating one a little easier.

def dAppend(dLinkedList, nodeToAppend):
if dLinkedList.next == None:
dLinkedList.next = nodeToAppend
nodeToAppend.prev = dLinkedList
else:
dAppend(dLinkedList.next, nodeToAppend)

> a = dNode(data=1)
> dAppend(a, dNode(data=2)
> dAppend(a, dNode(data=3)
> a
1
> a.next
2
> a.next.next
3
> a.next.next.prev
2
> a.next.prev
1

The simplest way to achieve this is through a recursive function as defined above. However Python does not perform tail-call optimization and will either run out of memory or reach its recursion limit (default is 1000) on large lists. Below is a non-recursive version of the same function.

def dNodeAppend(dLinkedList, nodeToAppend):
node = dLinkedList
while True:
if node.next == None:
node.next = nodeToAppend
nodeToAppend.prev = node
break
else:
node = node.next

>>> a = dNode(data=1)
>>> dNodeAppend(a, dNode(data=2))
>>> dNodeAppend(a, dNode(data=3))
>>> dNodeAppend(a, dNode(data=4))
>>> a
1
>>> a.next
2
>>> a.next.next
3
>>> a.next.next.next
4
>>> a.next.next.next.prev
3
>>> a.next.next.prev
2
>>> a.next.prev
1






Links

How to Think Like A Computer Scientist - Chapter 17 - Linked Lists
This chapter describes a way of implementing linked lists in Python. It also covers circular lists.
http://greenteapress.com/thinkpython/html/chap17.html

An Introduction to Scheme and its Implementation - Lists
http://www-pu.informatik.uni-tuebingen.de/users/sperber/pfg-2001/scheme/schintro-v14/schintro_93.html

Wikipedia Article on Linked Lists
http://en.wikipedia.org/wiki/Linked_list