| Binary Search Trees (BST) | 23.01 |
![]() |
• The elements in a BST are characterized by the values of their key and we assume that any key value that occurs occurs at most once. This assumption guarantees that the elements are uniquely characterized by the value of their keys. • The nodes in a BST, say T = [L:E:R], are ordered according to the following principle: • all nodes in L have keys smaller than the key of root element E • all nodes in R have keys larger than the key of root element E • and this ordering property holds for all nodes in the tree. Thus, for any node, characterized by a key value K, it must be true that all descendants on the left have keys smaller than K, and that all descendants on the right have key values larger than K. An operation like getData(k, T) then satisfies getData(k, []) = Error(Empty tree) getData(k, [L:E:R]) = if k < key(E) then getData(k, L) if k = key(E) return(theDataOf(E)) if k > key(E) then getData(k, R) |
API |
23.01 | | 23.02 |
| insert BST | 23.02 |
![]() |
The operations create and construct are now extended by insert- and delete operations that are adapted to the BST properties (and preserve BST!!) The insert operation takes an element F with given key value K and a tree T and inserts the element at the proper location in the tree, preserving the properties of BST. If an element with key K already exists then no action! Or the frequency field of the treenode is increased by 1. This proper location is found in the following way. If the element F to be inserted has a key with value K, this key is compared with the key of the root of the tree. Then: • if K < the root key, element F must be inserted in subtree L, and conversely, • if K > the root key, element F must be inserted in subtree R. Hence: insert(F, []) = [[]:F:[]] insert(F, [L:E:R]) = if key(F) < key(E) then [insert(F,L):E:R] if key(F) = key(E) then no action or increase freq if key(F) > key(E) then [L:E:insert(F,R)] |
API |
23.01 | | 23.03 |
| delete BST (1) | 23.03 |
![]() |
The delete operation takes a given key value K and a tree T and intends to delete the element with this key from the tree, preserving BST properties. Of course, if no such element exists, nothing will be deleted. delete(K, []) = no action (element is not present) delete(K, [L:E:R]) = if K < key(E) [delete(K, L):E:R] if K = key(E) element found: actual delete follows if K > key(E) [L:E:delete(K, R)] The actual delete operation that must be executed at the moment that the element with the given key value has been found, is somewhat complicated. Cases: the node that must be deleted has 0 children, a left child only, a right child only, or it has two 2 children. delete([[ ]:E:[ ]]) = [ ] delete([ L :E:[ ]]) = L delete([[ ]:E: R] ) = R delete( [L :E: R] ) = ? The first three cases are simple. |
API |
23.02 | | 23.04 |
| delete BST (2) | 23.04 |
![]() |
but the fourth case, where the node that must be deleted is decorated with two non-empty subtrees, requires a little thinking. • In order to preserve the order of the key values, we might replace the element that must be deleted by its successor or by its predecessor. Its successor is the leftmost node in its right subtree, and its predecessor is the rightmost node in its left subtree. Stated otherwise: • the leftmost node in the right subtree of is the node with the smallest key larger than the key of the node that must be deleted, and • the rightmost node in the left subtree of is the node with the largest key smaller than the key of the node that must be deleted. This observation leads to two equivalent strategies. Let us concentrate on the successor of the element that must be deleted and call this successor min(R) (minimal key on the right subtree). The following tree transformation must now be performed: [L:E:R] => [L:min(R):delete(min(R),R)] |
API |
23.03 | | 23.05 |
| Auxiliary functions | 23.05 |
![]() |
This transformation first deletes the leftmost node in the right subtree R ie node min(R). What remains from R is a tree R' used in [L:min(R):R']. Thus delete([L:E:R]) = construct(L, min(R), delete(min(R),R)) A completely equivalent transformation is given by delete([L:E:R]) = construct(delete(max(L),L),max(L),R)) Some auxiliary functions: eltPtr createElt(keyType k, dataPtr pd); eltPtr createElt (keyType k, dataPtr pd) { eltPtr pe = (eltPtr)malloc(sizeof(eltType)); pe->key = k; pe->data = pd; return(pe); } treePtr constructTree(tree lhs, eltType e, tree rhs); treePtr constructTree (tree lhs, eltType e, tree rhs) { treePtr t = newTreeNode(e); t->lhs = lhs; t->rhs = rhs; return(t); } |
API |
23.04 | | 23.06 |
| Interface for BST (1) | 23.06 |
![]() |
tree createTree();
/*
// pre true
// action create a new empty tree and return this tree
// res a new empty tree exists
*/
bool isEmptyTree(tree t);
/*
// pre t exists
// action determine whether tree t is empty or not
// res true is tree t is empty otherwise false
*/
tree insert(eltType e, tree t);
/*
// pre t exists and e is well-defined
// action insert e into t and preseve BST properties
// res the new tree is BST and has been returned
*/
|
API |
23.05 | | 23.07 |
| Interface for BST (2) | 23.07 |
![]() |
tree delete(keyType k, tree t);
/*
// pre t exists and k is well-defined
// action remove node with key k from tree t and preserve BST
// if no node with key k exists then no action
// res the new tree is BST and has been returned
*/
dataPtr getData(keyType k, tree t);
/*
// pre t exists and k is well-defined
// action find node with key k in tree t and return its data
// if no node with key k exists then NULL is returned
// res acccording to action
*/
void setData(keyType k, dataPtr pd, tree t);
/*
// pre t exists and k is well-defined
// action find node with key k in tree t and update its data
// if no node with key k exists then no action
// res according to action
*/
|
API |
23.06 | | 23.08 |
| Interface for BST (3) | 23.08 |
![]() |
void preOrderTraversal(tree t);
/*
// pre t exists
// action visit the nodes in pre order (and print key values)
// res true
*/
void inOrderTraversal(tree t);
/*
// pre t exists
// action visit the nodes in in order (and print key values)
// res true
*/
void postOrderTraversal(tree t);
/*
// pre t exists
// action visit the nodes in post order (and print key values)
// res true
*/
|
API |
23.07 | | 23.09 |
| Traversals | 23.09 |
![]() |
tree processTree(tree t, eltType f(eltType)); /* // pre t exists // action visit the nodes and process the elements using f // res the processed tree is returned */ Only the different traversal functions are really new. The hierarchical structure leads to three systematic ways in which the nodes of the tree, [lhs:root:rhs] , can be visited. • preOrderTraversal: • visit the root • visit the nodes of the subtree lhs in pre-order • visit the nodes of the subtree rhs in pre-order • inOrderTraversal: • visit the nodes of the subtree lhs in in-order • visit the root • visit the nodes of the subtree rhs in in-order • postOrderTraversal: • visit the nodes of the subtree lhs in post-order • visit the nodes of the subtree rhs in post-order • visit the root |
API |
23.08 | | 23.10 |
| getPointerToNode() | 23.10 |
![]() |
To get an idea how to implement the operations from the interface we will first discuss a general approach. Functions like insert, delete, getData and setData all need to find the location where the action must take place. This location is found by comparing keys, repeatedly. Consider a function like treePtr getPointerToNode(keyType k, treePtr t); Compare k with kt the key of the elt stored in the node pointed at by t and then decide: • if (k < kt) continue the search in the lhs-subtree of t • if (k == kt) the node has been found • if (k > kt) continue the search in the rhs-subtree of t and, if the search ends with t == NULL then no node exists with key equal to k. |
API |
23.09 | | 23.11 |
| getPointerToNodeRec() | 23.11 |
![]() |
There are two basic ways to implement this function. I discuss the recursive and the iterative approach. • recursive implementation: getPointerToNodeRec this function returns p such that p->elt->key is equal to k or NULL if no node exists that stores an element with key k treePtr getPointerToNodeRec(keyType k, treePtr t) { if (t) switch(sign(k - t->elt->key)) { case -1: return(getPointerToNodeRec(k, t->lhs)); case 0: return(t); case +1: return(getPointerToNodeRec(k, t->rhs)); } return(NULL); } |
API |
23.10 | | 23.12 |
| getPointerToNodeIter() | 23.12 |
![]() |
• iterative implementation: getPointerToNodeIter treePtr getPointerToNodeIter(keyType k, treePtr t) { while (t) switch(sign(k - t->elt->key)) { case -1: t = t->lhs; break; case 0: return(t); case +1: t = t->rhs; break; } return(NULL); } NB: the code while(t) ... is shorthand for while (t != NULL) ... and if (!t) ... is shorthand for if (t == NULL) ... Of course you might have decided to write void-returning functions that export their result by reference: void getPointerToNode(keyType k, treePtr* t); The code given above must be modified to deal with this decision. |
API |
23.11 | | 23.13 |
| Intermezzo: implementation details | 23.13 |
![]() |
An interesting question is the following: Is it possible to use getPointerToNode in the implementation of tree insert(eltPtr pe, tree t); say tree insert(eltPtr pe, tree t) { treePtr p = getPointerToNode(pe->key, t); if (p) // element with key pe->key present (p->freq)++; else // element with key pe->key not yet present p = newTree(NULL, pe, NULL); return(t); } No, this is not possible! Try to find out why not! Experimental programming: replace the return stament by return(p); and now you get a tree with only 1 element! Whew! Indeed, what you need is the future parent! Remember lists (HC22.23,24,32: you need last to insert a new last node!) |
API |
23.12 | | 23.14 |
| getAddressOfPointerToNode() | 23.14 |
![]() |
Even better, the address of the future location of the new node! For syntax see: HC23.29 This implies that you must use a function that returns the address of the future location if you want to use this function to implement insert: treePtr* getAddressOfPointerToNode(keyType k, treePtr t); treePtr* getAddressOfPointerToNode(keyType k, treePtr t) { treePtr *address = &t; while (*address) switch(sign(k - (*address)->elt->key)) { case -1: address = &(*address)->lhs; break; case 0: return(address); case +1: address = &(*address)->rhs; break; } return(address); } |
API |
23.13 | | 23.15 |
| insert BST again | 23.15 |
![]() |
And now insert can be implemented using getAddressOfPointerToNode: tree insert(eltPtr pe, tree t); tree insert(eltPtr pe, tree t) { treePtr *location = getAddressOfPointerToNode(pe->key, t); if (*location) // element already present ((*location)->freq)++; else // insert the element *location = newTree(NULL, pe, NULL); return(t); } It is a useful and rewarding exercise to rewrite the two functions given above as procedures (void returning functions that export their result by reference): void getAddressOfPointerToNode(keyType k, treePtr *t); void insert(eltPtr pe, tree *t); See SRC after PR22 has passed. |
API |
23.14 | | 23.16 |
| delete from BST (1) | 23.16 |
![]() |
Above (sheets 23.03-05) we have seen that deletion from a BST requires some care if the node to be deleted has two children. Consider a bst t with non-empty subtrees t->lhs and t->rhs and let kt denote the key of the element stored in the node pointed at by t. Observation: • the largest key < kt is found in the leftmost node on the rhs and this node has less than 2 children by definition!! and • the smallest key > kt is found in the rightmost node on the lhs and this node has less than 2 children by definition!! Now, if node t is to be deleted, you might proceed as follows: • determine pointer p' to the leftmost node on the rhs • swap the elements of p' and t • delete node p' (this node has less than 2 children) or • determine pointer p" to the rightmost node on the lhs • swap the elements of p" and t • delete node p" (this node has less than 2 children) |
API |
23.15 | | 23.17 |
| delete from BST (2) | 23.17 |
![]() |
In my implementation I will use p' if the lhs subtree of t is larger in size than the rhs subtree, otherwise I will use p". Why? I will try to keep subtrees balanced in size. From these considerations it follows that some auxiliary functions might be of help in the implementation of the delete operation: int theNumberOfChildren(treePtr t); int theNumberOfChildren(treePtr t) { return(((t->lhs) ? 1 : 0) + ((t->rhs) ? 1 : 0)); } treePtr lmn(treePtr t); // ptr to leftmost node in tree t treePtr lmn(treePtr t) { return((!t->lhs) ? t : lmn(t->lhs)); } treePtr rmn(treePtr t); // ptr to rightmost node in tree t treePtr rmn(treePtr t) { return((!t->rhs) ? t : rmn(t->rhs)); } |
API |
23.16 | | 23.18 |
| delete from BST (3) | 23.18 |
![]() |
treePtr transform(treePtr t); // from 2 to 1 or 0 children treePtr transform(treePtr t) { int sL = theSizeOf(t->lhs), sR = theSizeOf(t->rhs); treePtr p = (sL < sR) ? lmn(t->rhs) : rmn(t->lhs); eltPtr pe; pe = p->elt; p->elt = t->elt; t->elt = pe; return(p); } treePtr del(treePtr t); // node with less than 2 children treePtr del(treePtr t) { treePtr p = t; t = (p->lhs) ? p->lhs : p->rhs; // link t to child (if any) free(p->elt); // release the element free(p); // release the node return(t); } Using these auxiliary functions it is not difficult to write the delete. |
API |
23.17 | | 23.19 |
| BST, QST and OST | 23.19 |
![]() |
Generalizations of BST (bintree) to QST (quadtree) and OST (octtree). The BST has been based on a domain of keyvalues. The key of the root, say k, divides this domain into two parts: • the lhs subtree contains keys smaller than k • the rhs subtree contains keys larger than k. The idea can be immediately generalized to 2-dimensional keys say [x,y]. Then the key [k',k"] of the root element divides this domain into 4 parts (x smaller or larger than k', and also y smaller or larger than k". These 4 parts are called quadrants. Each treenode now has 4 subtrees, one for each of the quadrants, and the tree is called a QST(quad search tree). Similarly, 3-dimensional keys [k,k',k"] are from a domain of search of the form [x,y,z] and a specific key [k,k',k"] divides this domain into 8 octants. Each treenode now has 8 subtrees, one for each of the octants, and the tree is called an OST(oct search tree). Modifying the interface and the implementation of BST to QST and OST is not difficult although it takes some time from setup to endoftesting. |
API |
23.18 | | 23.20 |
| General Tree (1) | 23.20 |
![]() |
The general tree is a tree in which each parent (node) can have an arbitrary number of children (subtrees). Each node must be able to accomodate as many pointers as it has children: • If the number of children is known a priori, then the treenode can be designed as an array of treepointers. • Here, however we are interested in an arbitrarily number of children and have to invent other ways to represent such a structure. Again we design a treenode that only contains two pointers. Let the treenode itself be called parent, • its first pointer is called lmc and points to the leftmost child, • its second pointer is called sib(ling) and this pointer refers to a brother or sister of the parent. • Thus the relation between the pointers is that between nephew and uncle or nephew and aunt. • Following sibling-links you visit all brothers and sisters of the parent implying that all children of the parent can be found by first going to the lmc (left most child) and then following the sibling links of this child. |
API |
23.19 | | 23.21 |
| General Tree (2) | 23.21 |
![]() |
In many problems it is advantageous to be able to move from any node to its parent. Therefore the treenodes are further decorated by a parent field. These parent links are not shown in the picture that follows! Declaration of General Tree with header node: typedef struct treeNode* treePtr; typedef struct treeNode { eltPtr elt; treePtr lmc; // pointer to leftmost child treePtr sib; // pointer to sibling treePtr parent; // pointer to the parent } treeNode; typedef struct treeHeader* tree; typedef struct treeHeader { treePtr root; // pointer to the root of the tree treePtr cur; // pointer to the current node of interest int size; // number of nodes currently present } treeHeader; tree t; // t is a variable of type tree // t is a pointer to the header node |
API |
23.20 | | 23.22 |
| General Tree (3) | 23.22 |
![]() |
|
API |
23.21 | | 23.23 |
| Look back | 23.23 |
![]() |
Let us look back for a single moment what we have achieved by playing around with the datastructure toolkit. • The simplest self-referential structure was constructed from the most primitive parts, called listnodes. Each listnode has the capability to store the address of another listnode and in that way we were able to build our first datastructure: SLL • This type of list has some properties that restrict its usefulness. In particular the list can only be traversed easily in one direction only. Restrictions were removed by the introduction of nodes that contain two pointers. These pointers have been interpreted in several ways: • the notion of prev and next lead us to the design of DLL • the notion of lhs and rhs lead us to the design of BST, and • the introduction of lmc and sib made it possible to construct completely General Trees. The datastructure toolkit can be used in many more ways, some of which lead to sophisticated datastructures that have many desired properties concerning structured organization of information. Moreover the datastructures can often be designed in ways that support excellent performance of many useful operations such as insertion, removal, updating, retrieving and searching. |
API |
23.22 | | 23.24 |
| Table (1) | 23.24 |
![]() |
A table is essentially an array of lists (often called hashtable). |
API |
23.23 | | 23.25 |
| Table (2) | 23.25 |
![]() |
Or it is designed as an array of binary search trees: |
API |
23.24 | | 23.26 |
| List of trees (1) | 23.26 |
![]() |
A list of trees might be of interest: Simply choose any list type and modify the listnode in order to store a key, a tree, and one or two links to connect to other listnodes. The key should be used to characterize the corresponding tree and for the tree you can choose between a bare tree or a tree with treeheader. |
API |
23.25 | | 23.27 |
| List of trees (2) | 23.27 |
![]() |
typedef struct listNode* listPtr; typedef struct listNode { keyType key; tree t; listPtr next; } listNode; typedef struct listHeader { listPtr first, cur, last; int listSize; } listHeader; typedef listHeader* table; The last item on the datastructure menu are trees of lists. Here you start with treenodes that contain fields to store a key, a list, and two links to connect to the subtrees on the left and on the right. The key should be used to characterize the corresponding list and for the list you can choose between a bare list or a list with listheader. Draw a nice picture and write a complete declaration! |
API |
23.26 | | 23.28 |
| Success!! | 23.28 |
![]() |
Some programs related to binary search trees can be found under SRC after PR22 has passed! I wish you a lot of fun (and success) building dynamic datastructures in C during the courses OSN and Graphics. |
API |
23.27 | | 23.29 |
| p->next and &(*p)->next | 23.29 |
![]() |
On complicated syntax: • the direct field-access operator . if s is a struct with field f then s.f stands for the value of field f • the indirect field-access operator -> if p is a pointer to a struct s with field f then *p stands for s and (*p).f stands for the value of field f (braces are needed!) and p->f is a shorthand with exactly the same meaning as (*p).f • the address operator & and the dereferencing operator * if v denotes any variable then &v stands for its address, and if p is a pointer to v then the value of p is equal to &v and *p is equal to the value of v • a combination of &, . and -> if q denotes a pointer to a pointer p to a struct s with field f then the address of field f is given by &((*q)->f). Why? • (*q)->f == p->f since the value of q is p • == (*p).f since p->f is shorthand for (*p).f * == s.f since *p stands for s. Then since -> binds stronger than & you can omit outer braces and get &(*q)->f. Back to HC23.14 |
API |
23.28 | | 23.30 |
| xx | 23.30 |
![]() |
WEB BROWSERS De HC's PRG2 nummers 16 tot en met 20 nog een keer doorgelopen en de volgende pagina's gevonden die zich niet goed gedragen: - Explorer 18.11 - Netscape 17.02 en 17.23 18.08, 18.11 en 18.21 19.10 en 19.26 20.27 - Omniweb 4.2 alle pagina's oke - Omniweb 4.5 foute tab setting - Safari tabs 2 keer te groot en ik kan niet vinden waar ik ze kan zetten |
API |
23.29 | | 23.31 |
| xx | 23.31 |
![]() |
|
API |
23.30 | | 23.32 |
| xx | 23.32 |
![]() |
|
API |
23.31 | | 23.01 |