Binary Search Trees (BST)

23.01

•	The elements in a BST are characterized by the values of their key 
	and we assume that any key value that occurs occurs at most once.
	This assumption guarantees that the elements are uniquely 
	characterized by the value of their keys. 
•	The nodes in a BST, say T = [L:E:R], are ordered according to the
	following principle:
	•	all nodes in L have keys smaller than the key of root element E
	•	all nodes in R have keys larger than the key of root element E
	•	and this ordering property holds for all nodes in the tree.

Thus, for any node, characterized by a key value K, it must be true 
that all descendants on the left have keys smaller than K, and 
that all descendants on the right have key values larger than K.

An operation like getData(k, T) then satisfies
	getData(k, []) = Error(Empty tree)
	getData(k, [L:E:R]) = 
	   if k < key(E) then getData(k, L)
	   if k = key(E) return(theDataOf(E))
	   if k > key(E) then getData(k, R)

API



23.01

TABLE OF CONTENTS HC 23

23.02

 

insert BST

23.02

The operations create and construct are now extended by insert- and delete 
operations that are adapted to the BST properties (and preserve BST!!)

The insert operation takes an element F with given key value K and a tree T
and inserts the element at the proper location in the tree, preserving the 
properties of BST. If an element with key K already exists then no action! Or
the frequency field of the treenode is increased by 1.
This proper location is found in the following way. If the element F to be 
inserted has a key with value K, this key is compared with the key of the root 
of the tree. Then: 
•	if K < the root key, element F must be inserted in subtree L, 
and conversely, 
•	if K > the root key, element F must be inserted in subtree R. 
Hence:

	insert(F, []) = [[]:F:[]]
	insert(F, [L:E:R])  = 
	   if key(F) < key(E) then [insert(F,L):E:R]
	   if key(F) = key(E) then no action or increase freq
	   if key(F) > key(E) then [L:E:insert(F,R)]

API



23.01

TABLE OF CONTENTS HC 23

23.03

 

delete BST (1)

23.03

The delete operation takes a given key value K and a tree T and intends to 
delete the element with this key from the tree, preserving BST properties.
Of course, if no such element exists, nothing will be deleted.

	delete(K, []) = no action (element is not present)
	delete(K, [L:E:R])	= 
	   if K < key(E) [delete(K, L):E:R]
	   if K = key(E) element found: actual delete follows
	   if K > key(E) [L:E:delete(K, R)]

The actual delete operation that must be executed at the moment that the
element with the given key value has been found, is somewhat complicated. 
Cases: the node that must be deleted has 0 children, 
a left child only, a right child only, or it has two 2 children.
 
	delete([[ ]:E:[ ]])  = [ ]
	delete([ L :E:[ ]])  =  L
	delete([[ ]:E: R] )  =  R
	delete( [L :E: R] )  =  ?

The first three cases are simple. 

API



23.02

TABLE OF CONTENTS HC 23

23.04

 

delete BST (2)

23.04

 
but the fourth case, where the node that must be deleted is decorated
with two non-empty subtrees, requires a little thinking.

•	In order to preserve the order of the key values, we might replace the 
	element that must be deleted by its successor or by its predecessor. 
	Its successor is the leftmost node in its right subtree, and 
	its predecessor is the rightmost node in its left subtree. 
	Stated otherwise:
	•	the leftmost node in the right subtree of is the node with the
		smallest key larger than the key of the node that must be deleted, 
		and
	•	the rightmost node in the left subtree of is the node with the 
		largest key smaller than the key of the node that must be deleted.

This observation leads to two equivalent strategies. 
Let us concentrate on the successor of the element that must be deleted 
and call this successor min(R) (minimal key on the right subtree). 
The following tree transformation must now be performed:

	[L:E:R] => [L:min(R):delete(min(R),R)]

API



23.03

TABLE OF CONTENTS HC 23

23.05

 

Auxiliary functions

23.05

This transformation first deletes the leftmost node in the right subtree R
ie node min(R). What remains from R is a tree R' used in  
	[L:min(R):R'].
Thus
	delete([L:E:R])  = construct(L, min(R), delete(min(R),R))
A completely equivalent transformation is given by
	delete([L:E:R])  = construct(delete(max(L),L),max(L),R))

Some auxiliary functions:

	eltPtr createElt(keyType k, dataPtr pd);
	eltPtr createElt (keyType k, dataPtr pd) {
		eltPtr pe = (eltPtr)malloc(sizeof(eltType));
		pe->key = k; pe->data = pd;
		return(pe);
	}
	treePtr constructTree(tree lhs, eltType e, tree rhs);
	treePtr  constructTree (tree lhs, eltType e, tree rhs) {
		treePtr t = newTreeNode(e);
		t->lhs = lhs; t->rhs = rhs;
		return(t);
	}

API



23.04

TABLE OF CONTENTS HC 23

23.06

 

Interface for BST (1)

23.06



tree createTree();
/*
//	pre    true
//	action create a new empty tree and return this tree
//	res    a new empty tree exists
*/
bool isEmptyTree(tree t);
/*
//	pre    t exists
//	action determine whether tree t is empty or not
//	res    true is tree t is empty otherwise false
*/
tree insert(eltType e, tree t);
/*
//	pre    t exists and e is well-defined
//	action insert e into t and preseve BST properties
//	res    the new tree is BST and has been returned
*/

API



23.05

TABLE OF CONTENTS HC 23

23.07

 

Interface for BST (2)

23.07


tree delete(keyType k, tree t);
/*
//	pre    t exists and k is well-defined
//	action remove node with key k from tree t and preserve BST
//	       if no node with key k exists then no action
//	res    the new tree is BST and has been returned
*/
dataPtr getData(keyType k, tree t);
/*
//	pre    t exists and k is well-defined
//	action find node with key k in tree t and return its data
//	       if no node with key k exists then NULL is returned
//	res    acccording to action
*/
void setData(keyType k, dataPtr pd, tree t);
/*
//	pre    t exists and k is well-defined
//	action find node with key k in tree t and update its data
//	       if no node with key k exists then no action
//	res    according to action
*/

API



23.06

TABLE OF CONTENTS HC 23

23.08

 

Interface for BST (3)

23.08


void preOrderTraversal(tree t);
/*
//	pre    t exists
//	action visit the nodes in pre order (and print key values)
//	res    true
*/
void inOrderTraversal(tree t);
/*
//	pre    t exists
//	action visit the nodes in in order (and print key values)
//	res    true
*/
void postOrderTraversal(tree t);
/*
//	pre    t exists
//	action visit the nodes in post order (and print key values)
//	res    true
*/

API



23.07

TABLE OF CONTENTS HC 23

23.09

 

Traversals

23.09


tree processTree(tree t, eltType f(eltType));
/*
//	pre    t exists
//	action visit the nodes and process the elements using f
//	res    the processed tree is returned
*/
Only the different traversal functions are really new. 
The hierarchical structure leads to three systematic ways 
in which the nodes of the tree, [lhs:root:rhs] , can be visited. 

•	preOrderTraversal:
	•	visit the root 
	•	visit the nodes of the subtree lhs in pre-order 
	•	visit the nodes of the subtree rhs in pre-order 
•	inOrderTraversal:
	•	visit the nodes of the subtree lhs in in-order 
	•	visit the root 
	•	visit the nodes of the subtree rhs in in-order 
•	postOrderTraversal:
	•	visit the nodes of the subtree lhs in post-order 
	•	visit the nodes of the subtree rhs in post-order 
	•	visit the root 

API



23.08

TABLE OF CONTENTS HC 23

23.10

 

getPointerToNode()

23.10

To get an idea how to implement the operations from the interface
we will first discuss a general approach. 
Functions like insert, delete, getData and setData all need to find 
the location where the action must take place. 
This location is found by comparing keys, repeatedly. 
Consider a function like

	treePtr getPointerToNode(keyType k, treePtr t);

Compare k with kt  the key of the elt stored in the node pointed at by t
and then decide:

•	if (k < kt) continue the search in the lhs-subtree of t
•	if (k == kt) the node has been found  
•	if (k > kt) continue the search in the rhs-subtree of t 

and, if the search ends with t == NULL 
then no node exists with key equal to k.

API



23.09

TABLE OF CONTENTS HC 23

23.11

 

getPointerToNodeRec()

23.11


There are two basic ways to implement this function. 
I discuss the recursive and the iterative approach.

•	recursive implementation: getPointerToNodeRec
	this function returns p such that p->elt->key is equal to k
	or NULL if no node exists that stores an element with key k


	treePtr getPointerToNodeRec(keyType k, treePtr t) {
	   if (t)
	     switch(sign(k - t->elt->key)) {
	       case -1: return(getPointerToNodeRec(k, t->lhs));
	       case  0: return(t);
	       case +1: return(getPointerToNodeRec(k, t->rhs));
	     }
	  return(NULL);
	}

API



23.10

TABLE OF CONTENTS HC 23

23.12

 

getPointerToNodeIter()

23.12

iterative implementation: getPointerToNodeIter

	treePtr getPointerToNodeIter(keyType k, treePtr t) {
	  while (t)
	    switch(sign(k - t->elt->key)) {
	      case -1: t = t->lhs; break;
	      case  0: return(t);
	      case +1: t = t->rhs; break;
	    }
	  return(NULL);
	}

NB:	the code while(t) ... is shorthand for while (t != NULL) ...
	     and if (!t) ...  is shorthand for if (t == NULL) ...
	               
Of course you might have decided to write void-returning functions 
that export their result by reference:

	void getPointerToNode(keyType k, treePtr* t);

The code given above must be modified to deal with this decision.

API



23.11

TABLE OF CONTENTS HC 23

23.13

 

Intermezzo: implementation details

23.13

An interesting question is the following: 
Is it possible to use getPointerToNode in the implementation of

	tree insert(eltPtr pe, tree t);
say
	tree insert(eltPtr pe, tree t) {
	  treePtr p = getPointerToNode(pe->key, t);
	  if (p)         // element with key pe->key present 
	    (p->freq)++;
	  else          // element with key pe->key not yet present
	    p = newTree(NULL, pe, NULL);
	  return(t);
	}

No, this is not possible! Try to find out why not!
Experimental programming: replace the return stament by return(p);
and now you get a tree with only 1 element! Whew!
Indeed, what you need is the future parent! 
Remember lists (HC22.23,24,32: you need last to insert a new last node!) 

API



23.12

TABLE OF CONTENTS HC 23

23.14

 

getAddressOfPointerToNode()

23.14

Even better, the address of the future location of the new node!
For syntax see:  HC23.29  
This implies that you must use a function that returns the address of
the future location if you want to use this function to implement insert:

	treePtr* getAddressOfPointerToNode(keyType k, treePtr t);
	treePtr* getAddressOfPointerToNode(keyType k, treePtr t) {
	  treePtr *address = &t;
	  while (*address)
	    switch(sign(k - (*address)->elt->key)) {
	      case -1: address = &(*address)->lhs; break;
	      case  0: return(address);
	      case +1: address = &(*address)->rhs; break;
	    }
	  return(address);
	}

API



23.13

TABLE OF CONTENTS HC 23

23.15

 

insert BST again

23.15

And now insert can be implemented using getAddressOfPointerToNode:
 

	tree insert(eltPtr pe, tree t);
	tree insert(eltPtr pe, tree t) {
	  treePtr *location = getAddressOfPointerToNode(pe->key, t);
	  if (*location)          // element already present
	    ((*location)->freq)++;
	  else                    // insert the element
	    *location = newTree(NULL, pe, NULL);
	  return(t);
	}

It is a useful and rewarding exercise to rewrite the two functions given above
as procedures (void returning functions that export their result by reference):

	void getAddressOfPointerToNode(keyType k, treePtr *t);
	void insert(eltPtr pe, tree *t);

See  SRC  after PR22 has passed. 

API


SRC

23.14

TABLE OF CONTENTS HC 23

23.16

 

delete from BST (1)

23.16

Above (sheets 23.03-05) we have seen that deletion from a BST
requires some care if the node to be deleted has two children.

Consider a bst t with non-empty subtrees t->lhs and t->rhs
and let kt denote the key of the element stored in the node pointed at by t.
Observation:
•	the largest key < kt is found in the leftmost node on the rhs
	and this node has less than 2 children by definition!!
and
•	the smallest key > kt is found in the rightmost node on the lhs
	and this node has less than 2 children by definition!!
	
Now, if node t is to be deleted, you might proceed as follows:

•	determine pointer p' to the leftmost node on the rhsswap the elements of p' and tdelete node p' (this node has less than 2 children)
or
•	determine pointer p" to the rightmost node on the lhsswap the elements of p" and tdelete node p" (this node has less than 2 children)

API



23.15

TABLE OF CONTENTS HC 23

23.17

 

delete from BST (2)

23.17

In my implementation I will use p' 
if the lhs subtree of t is larger in size than the rhs subtree, 
otherwise I will use p". Why? I will try to keep subtrees balanced in size.

From these considerations it follows that some auxiliary functions 
might be of help in the implementation of the delete operation:

	int theNumberOfChildren(treePtr t);
	int theNumberOfChildren(treePtr t) {
	  return(((t->lhs) ? 1 : 0) + ((t->rhs) ? 1 : 0));
	}
	treePtr lmn(treePtr t); // ptr to leftmost node in tree t
	treePtr lmn(treePtr t)  {
	  return((!t->lhs) ? t : lmn(t->lhs));
	}
	treePtr rmn(treePtr t); // ptr to rightmost node in tree t
	treePtr rmn(treePtr t)  {
	  return((!t->rhs) ? t : rmn(t->rhs));
	}

API



23.16

TABLE OF CONTENTS HC 23

23.18

 

delete from BST (3)

23.18


treePtr transform(treePtr t); // from 2 to 1 or 0 children
treePtr transform(treePtr t) {
  int sL = theSizeOf(t->lhs),
      sR = theSizeOf(t->rhs);
  treePtr p = (sL < sR) ? lmn(t->rhs) : rmn(t->lhs);
  eltPtr pe;
  pe = p->elt; p->elt = t->elt; t->elt = pe;
  return(p);
}
treePtr del(treePtr t);  // node with less than 2 children
treePtr del(treePtr t) {
  treePtr p = t;
  t = (p->lhs) ? p->lhs : p->rhs; // link t to child (if any) 
  free(p->elt);                   // release the element 
  free(p);                        // release the node
  return(t);
}

Using these auxiliary functions it is not difficult to write the delete.

API



23.17

TABLE OF CONTENTS HC 23

23.19

 

BST, QST and OST

23.19

Generalizations of BST (bintree) to QST (quadtree) and OST (octtree).
The BST has been based on a domain of keyvalues. The key of the root,
say k, divides this domain into two parts: 
•	the lhs subtree contains keys smaller than k
•	the rhs subtree contains keys larger  than k.

The idea can be immediately generalized to 2-dimensional keys say [x,y]. 
Then the key [k',k"] of the root element divides this domain into 
4 parts (x smaller or larger than k', and also y smaller or larger than k". 
These 4 parts are called quadrants. 
Each treenode now has 4 subtrees, one for each of the quadrants, and 
the tree is called a QST(quad search tree).
Similarly, 3-dimensional keys [k,k',k"] are from a domain of search of 
the form [x,y,z] and a specific key [k,k',k"] divides this domain into 
8 octants. 
Each treenode now has 8 subtrees, one for each of the octants, and 
the tree is called an OST(oct search tree).

Modifying the interface and the implementation of BST to QST and OST is
not difficult although it takes some time from setup to endoftesting.

API



23.18

TABLE OF CONTENTS HC 23

23.20

 

General Tree (1)

23.20

The general tree is a tree in which each parent (node) can have 
an arbitrary number of children (subtrees). Each node must be able 
to accomodate as many pointers as it has children:
•	If the number of children is known a priori, then 
	the treenode can be designed as an array of treepointers. 
•	Here, however we are interested in an arbitrarily number of children 
	and  have to invent other ways to represent such a structure. 

Again we design a treenode that only contains two pointers. 
Let the treenode itself be called parent,
•	its first pointer is called lmc and points to the leftmost child,
•	its second pointer is called sib(ling) and this pointer refers 
	to a brother or sister of the parent.
•	Thus the relation between the pointers is that between 
	nephew and uncle or nephew and aunt. 
•	Following sibling-links you visit all brothers and sisters 
	of the parent implying that all children of the parent can be found 
	by first going to the lmc (left most child) and then following the 
	sibling links of this child.

API



23.19

TABLE OF CONTENTS HC 23

23.21

 

General Tree (2)

23.21

In many problems it is advantageous to be able to move from any node to its 
parent. Therefore the treenodes are further decorated by a parent field. 
These parent links are not shown in the picture that follows!
Declaration of General Tree with header node:

	typedef struct treeNode* treePtr; 
	typedef struct treeNode {
	  eltPtr elt;
	  treePtr lmc;    // pointer to leftmost child
	  treePtr sib;    // pointer to sibling 
	  treePtr parent; // pointer to the parent 
	} treeNode;

	typedef struct treeHeader* tree;
	typedef struct treeHeader {
	  treePtr root; // pointer to the root of the tree
	  treePtr  cur; // pointer to the current node of interest 
	  int     size; // number of nodes currently present 
	} treeHeader;

	tree t;  // t is a variable of type tree 
	         // t is a pointer to the header node

API



23.20

TABLE OF CONTENTS HC 23

23.22

 

General Tree (3)

23.22



 
NB:	The lcm pointers are drawn down to the left,
	the sibling pointers are drawn horizontally,
	and the parent pointers are not shown.

Points of interest are the following:
•	The general tree is a binary tree! since each node has 2 essential pointers. 
•	Just rotate the picture over 45 degrees, clockwise!
•	With 2 pointer fields we have constructed DLL, BST and GENTREE!

API



23.21

TABLE OF CONTENTS HC 23

23.23

 

Look back

23.23

Let us look back for a single moment what we have achieved by playing 
around with the datastructure toolkit.

•	The simplest self-referential structure was constructed from the most 
	primitive parts, called listnodes. 
	Each listnode has the capability to store the address of another listnode 
	and in that way we were able to build our first datastructure: SLL
•	This type of list has some properties that restrict its usefulness. 
	In particular the list can only be traversed easily in one direction only. 
	Restrictions were removed by the introduction of nodes that contain 
	two pointers. These pointers have been interpreted in several ways:
	•	the notion of prev and next lead us to the design of DLL
	•	the notion of lhs and rhs lead us to the design of BST, and 
	•	the introduction of lmc and sib made it possible to construct 
		completely General Trees.

The datastructure toolkit can be used in many more ways, some of which lead 
to sophisticated datastructures that have many desired properties concerning 
structured organization of information. Moreover the datastructures can often 
be designed in ways that support excellent performance of many useful 
operations such as insertion, removal, updating, retrieving and searching. 

API



23.22

TABLE OF CONTENTS HC 23

23.24

 

Table (1)

23.24

 
A table is essentially an array of lists (often called hashtable).



The declaration can be written from this picture.

API



23.23

TABLE OF CONTENTS HC 23

23.25

 

Table (2)

23.25

Or it is designed as an array of binary search trees:



and again, the declaration can be written from this picture.

API



23.24

TABLE OF CONTENTS HC 23

23.26

 

List of trees (1)

23.26

A list of trees might be of interest:
Simply choose any list type and modify the listnode in order to store 
a key, a tree, and one or two links to connect to other listnodes. 
The key should be used to characterize the corresponding tree and for the 
tree you can choose between a bare tree or a tree with treeheader. 



The declaration of the trees can be taken from the previous section and 
the list is defined on the next sheet:

API



23.25

TABLE OF CONTENTS HC 23

23.27

 

List of trees (2)

23.27

 
	typedef struct listNode* listPtr;
	typedef struct listNode {
		keyType  key;
		tree       t;
		listPtr next;
	} listNode;

	typedef struct listHeader {
		listPtr first, cur, last;
		int     listSize;
	} listHeader;

	typedef listHeader* table;


The last item on the datastructure menu are trees of lists.
Here you start with treenodes that contain fields to store a key, a list, 
and two links to connect to the subtrees on the left and on the right. 
The key should be used to characterize the corresponding list and for the 
list you can choose between a bare list or a list with listheader. 
Draw a nice picture and write a complete declaration!

API



23.26

TABLE OF CONTENTS HC 23

23.28

 

Success!!

23.28


Some programs related to binary search trees can 
be found under SRC after PR22 has passed!

 
I wish you a lot of fun (and success) building 
dynamic datastructures in C 
during the courses OSN and Graphics.

API


SRC

23.27

TABLE OF CONTENTS HC 23

23.29

 

p->next and &(*p)->next

23.29

On complicated syntax:
•	the direct field-access operator .	
	if s is a struct with field f 
	then s.f stands for the value of field f
•	the indirect field-access operator ->
	if p is a pointer to a struct s with field f then *p stands for s and
	(*p).f stands for the value of field f (braces are needed!)
	and p->f is a shorthand with exactly the same meaning as (*p).f
•	the address operator & and the dereferencing operator *
	if v denotes any variable then &v stands for its address, and
	if p is a pointer to v 
	then the value of p is equal to &v and *p is equal to the value of v
•	a combination of &, . and ->
	if q denotes a pointer to a pointer p to a struct s with field f
	then the address of field f is given by &((*q)->f). Why?
	•	(*q)->f == p->f     since the value of q is p          == (*p).f   since p->f is shorthand for (*p).f
	*	        == s.f      since *p stands for s.
Then since -> binds stronger than & 
you can omit outer braces and get &(*q)->f.
 Back to HC23.14  

API



23.28

TABLE OF CONTENTS HC 23

23.30

 

xx

23.30

WEB BROWSERS

De HC's PRG2 nummers 16 tot en met 20 nog een keer doorgelopen
en de volgende pagina's gevonden die zich niet goed gedragen:

-	Explorer		18.11
-	Netscape		17.02 en 17.23
				18.08, 18.11 en 18.21
				19.10 en 19.26
				20.27
-	Omniweb 4.2		alle pagina's oke
-	Omniweb 4.5		foute tab setting
-	Safari			tabs 2 keer te groot
	                en ik kan niet vinden waar ik ze kan zetten
					

API



23.29

TABLE OF CONTENTS HC 23

23.31

 

xx

23.31


API



23.30

TABLE OF CONTENTS HC 23

23.32

 

xx

23.32


API



23.31

TABLE OF CONTENTS HC 23

23.01

 

TABLE OF CONTENT HC 23

TOC

01: Binary Search Trees (BST)
02: insert BST
03: delete BST (1)
04: delete BST (2)
05: Auxiliary functions
06: Interface for BST (1)
07: Interface for BST (2)
08: Interface for BST (3)
09: Traversals
10: getpointerToNode()
11: getPointerToNodeRec()
12: getPointerToNodeIter()
13: Intermezzo: implementation details
14: getAddressOfPointerToNode()
15: insert BST again
16: delete from BST (1)
17: delete from BST (2)
18: delete from BST (3)
19: BST, QST and OST
20: General Tree (1)
21: General Tree (2)
22: General Tree (3)
23: Look back ...
24: Table (1)
25: Table (2)
26: List of trees (1)
27: List of trees (2)
28: Success!!
29: p->next and &(*p)->next
30:
31:
32:

API



TOC