Week 7: Binary Search Trees, Sets (ADT) and Maps (ADT)
Objectives
- Learn about graph structures in a glimpse
- Specializing a graph into trees
- Specializing trees into binary trees
- Specializing trees into binary search trees (BST)
- Implementing set (ADT) using BST
Trees
In previous weeks, we have learned about:
- Arrays
- Linked Lists
- Queues
- Stacks
These structures are conceptually linear structures. However, there also exist very important kind of structures that are inherently non-linear (e.g trees and graphs). Trees are special case of graphs, where nodes and edges (links) do not form a cycle.
Tree Glossary
- Root: is the top node in the tree.
- Child: any node that is emerged from an upper node.
- Parent/Internal Node: node with at least one child.
- Siblings: nodes sharing the same parent.
- Leaf: node with no children.
- Edge: the link between two nodes (i.e a parent and its child).
- Path: the sequence of links and nodes to reach from one node to a descedant.
- Height of node: the number of links between a node and the furthest leaf.
- Depth of node: the number of links between a node and the root.
Synonyms
- Node = Vertex = Point
- Edge = Link = Arc
Violating Tree Structure
wikipedia
The following structures are not trees:
Graphs that are not trees | Why? |
---|---|
cycle B→C→E→D→B. B has more than one parent (inbound edge). |
|
undirected cycle 1-2-4-3. 4 has more than one parent (inbound edge). |
|
two non-connected parts, A→B and C→D→E. There is more than one root. |
Binary Search Trees (BST)
Binary trees is a special case of trees where each node can have at most 2 children. Also, these children are named: left child or right child. A very useful specialization of binary trees is binary search tree (BST) where nodes are conventionally ordered in a certain manner. By convention, the \(\text{left children} < \text{parent} < \text{right children}\), and this rule propagates recursively across the tree.
A binary search tree (BST) |
Specialization from graph->tree->BT->BST |
Motivation
Efficient search & insertion/deletion in logarithmic time $O(\log(n))$
- Arrays:
- (+) efficient search on sorted arrays $O(\log(n))$,
- (-) ineffiecient insertion/deletion $O(n)$.
- Linked lists:
- (-) inefficient search $O(n)$,
- (+) efficient insertion/deletion $O(1)$.
Intuition
- Tree combines the advantages of arrays and linked lists.
- The nature of BST (i.e being ordered) makes it potential for extensive applications. For example, implementing set (ADT).
Design and Implementation: Using Linked Structures
Tree structure allows us to use recursive routines to implement the basic operations. Think of each node in a tree as a separate standalone tree. Trees can be embedded in arrays or implemented as a linked nodes (i.e using pointers). In this tutorial, we will implement the BST using linked nodes. Any node in the tree (including the root) will be represented by the type BSTNode
that has the following C++ `struct:
struct BSTNode
{
int data;
BSTNode *left;
BSTNode *right;
};
Operations
Is Empty
bool isEmpty( BSTNode *tree )
{
return tree == nullptr;
}
Is Leaf
bool isLeaf( BSTNode *tree )
{
return tree->left == nullptr && tree->right == nullptr;
}
Size
int size( BSTNode *tree )
{
if ( !isEmpty( tree ))
return 1 + size( tree->left ) + size( tree->right );
else return 0;
}
Insertion
Insertion in BST, realize recursion. Author |
void insert( BSTNode *&tree, int data )
{
if ( isEmpty( tree ))
tree = new BSTNode{ data , nullptr , nullptr };
else
{
if ( data < tree->data )
insert( tree->left, data );
else insert( tree->right, data );
}
}
Search
Search in BST, realize recursion. Author |
bool find( BSTNode *tree, int data )
{
if ( isEmpty( tree ))
return false;
else
{
if ( data == tree->data )
return true;
else if ( data < tree->data )
return find( tree->left , data );
else return find( tree->right , data );
}
}
BST Traversal
Pre- In- Post-order Traversals |
---|
Made using: Khan Academy Computer Science |
Traversal: In-order
In-order Traversal in BST, realize recursion. Source |
void inorder( BSTNode *tree )
{
if( tree )
{
inorder( tree->left );
std::cout << "[" << tree->data << "]";
inorder( tree->right );
}
}
Traversal: Pre-order
Pre-order Traversal in BST, realize recursion. Source |
void preorder( BSTNode *tree )
{
if( tree )
{
std::cout << "[" << tree->data << "]";
preorder( tree->left );
preorder( tree->right );
}
}
Traversal: Post-order
Post-order Traversal in BST, realize recursion. Source |
void postorder( BSTNode *tree )
{
if( tree )
{
postorder( tree->left );
postorder( tree->right );
std::cout << "[" << tree->data << "]";
}
}
Traversal: Breadth-first
Breadth-first Traversal in BST. Created by Stephanie Wong |
Clear the whole tree
void clear( BSTNode *&tree )
{
if ( !isEmpty( tree ))
{
clear( tree->left );
clear( tree->right );
delete tree;
tree = nullptr;
}
}
Removal of element
Case | Action | Example |
---|---|---|
Case I: Node to be removed has no children | Simplest case. Algorithm sets corresponding link of the parent to NULL and deletes the node | remove( tree , -4 ) |
Case II: Node to be removed has one child | It this case, node is cut from the tree and algorithm links single child (with it’s subtree) directly to the parent of the removed node | remove( tree , 18 ) |
Case III: Node to be removed has two children | This is the most complex case. To solve it, (1) find a minimum value in the right subtree; (2) replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate! (3) apply remove to the right subtree to remove a duplicate. |
|
Source: http://www.algolist.net/ |
void remove( BSTNode *&tree, int data )
{
if ( isEmpty( tree )) return;
if ( data == tree->data )
{
if ( !isEmpty( tree->left ) && !isEmpty( tree->right ))
{
BSTNode *minRight = minNode( tree->right );
tree->data = minRight->data;
remove( tree->right, minRight->data );
} else
{
BSTNode *discard = tree;
if ( isLeaf( tree ))
tree = nullptr;
else if ( !isEmpty( tree->left ))
tree = tree->left;
else
tree = tree->right;
delete discard;
}
} else if ( data < tree->data )
remove( tree->left, data );
else remove( tree->right, data );
}
Asbtract Data Types Built Upon BST
Set
We can harness the sorted property of BST to make efficient insertions and removals. Also, with a slight modification to the insert
function, we can make this function only insert unique values, hence our new set (ADT) will always contain unique values.
Operations
isEmpty
: same as BSTsize
: same as BSTcontains
: same asfind
of BSTremove
: same as BSTinsert
: slight modification ofbst::insert
, such that insertion is done when the element isn’t a duplicate of existing element, possible implementation:- use
contains
to check if element doesn’t already exist, - then, if the condition holds, use
bst::insert
, otherwise, do nothing.
- use
union
: given two sets $S_1$ and $S_2$ make a new data structure $S_3 = S_1 \cup S_2$, possible implementation:- make an empty set
S3
, - iterate over elements of
S1
inserting each element toS3
, and similarly forS2
.
- make an empty set
intersect
: given two sets $S_1$ and $S_2$ make a new data structure $S_3 = S_1 \cap S_2$, possible implementation:- make an empty set
S3
, - iterate over elements of
S1
inserting each element that also exists inS2
intoS3
.
- make an empty set
equals
: given two sets $S_1$ and $S_2$, check the equality of the two sets, possible implementation:- first, check that $S_1$ and $S_2$ sizes are equal,
- then, iterating in-order in parallel in both $S_1$ and $S_2$ to validate the equality of traversed elements.
Map
Synonyms: Associative containers, dictionary, symbol table.
A map is a collection of searchable key-value pairs, where each key has a value.
Example 1, we can have a map representing count of words in a page or textbook, so the key here is the word, while the value is the count of this word.
Example 2, for the function that counts characters in DNA:
int countCharacter( std::string dna, char query )
{
int count = 0;
for ( int i = 0; i < dna.size(); ++i)
{
if ( query == dna[i] )
++count;
}
return count;
}
int main( int argc, char **argv )
{
if( argc == 2 )
{
std::string dna = getDNA( argv[1] );
int countA = countCharacter( dna , 'A');
int countC = countCharacter( dna , 'C');
int countG = countCharacter( dna , 'G');
int countT = countCharacter( dna , 'T');
std::cout << "A:" << countA << std::endl
<< "C:" << countC << std::endl
<< "G:" << countG << std::endl
<< "T:" << countT << std::endl;
}
return 0;
}
This function was called four times (i.e to count A, C, G, and T). However, by using map data structure we can run this function to count all characters in a single run!
int main( int argc, char **argv )
{
if( argc == 2 )
{
std::string dna = getDNA( argv[1] );
char_map::CharMap dnaCounter = char_map::create();
for( int i = 0 ; i < dna.size() ; ++i )
map::value( dnaCounter , dna[i] )++;
char_map::printAll( dnaCounter );
}
return 0;
}
Implementing a Dictionary (i.e Map) Using BST
Map implementation using BST would be as easy as implementing a set using the concrete routines of BST.
create
: returns empty dictionary.isEmpty
: checks if dictionary is empty.size
: returns the size of the dictionary.insert
: inserts new key to the dictionary.remove
: remove an element by its key.at
: returns a reference to the value associated with a given dictionary. Crashes if the key is not found.value
: returns a reference to the value associated with a given key. If key not found, then insert a new key with the given key then returns a reference to the newly created value.contains
: checks if a key exists in the dictionary.
struct MapNode
{
std::string key;
int value;
MapNode *left;
MapNode *right;
};