Heaps

  • Heap is a very useful data structure with many applications (e.g Heapsort and Priority Queues (ADT)). Heap elements are typically allocated as a dynamic array. However, the elements are conceptually forming a complete tree.
Max-Heap Logical Representation
heaptree
Max-Heap Storage
heapconcrete
Max-Heap
heap1
Creative Commons - Maxinator

Heap Applications

  • Sorting Algorithms (Heapsort)
  • The Shortest Path Problem (Dijkstra’s Algorithm)
  • Data Compression Algorithms (Huffman Tree)
  • Unsupervised Machine Learning (Agglomerative Clustering)

Glossary

  • Complete Tree: A balanced tree in which the distance from the root to any leaf is either $\lfloor \log(n) \rfloor$ or $\lfloor \log(n)-1 \rfloor$. source.

Operations

Insert

Inserting an element at the end of the heap typically violates the heap property (i.e each parent is greater than its children for max-heaps), so after each insertion we need to recover back the heap property.

The algorithm of insertion (source: wikipedia):

  1. Insert the new element to the bottom level of the heap.
  2. Compare the added element with its parent; if they are in the correct order, stop.
  3. If not, swap the element with its parent and repeat step (2) recursively.
Example: Insert 15
Steps Layout
We first place the new element 15 in the position marked by the X as a leaf.
However, the heap property is violated since 15 > 8, so we need to swap the 15 and the 8
The heap property is still violated since 15 > 11, so we need to swap again
Source: wikipedia  

Extract

In a similar way to insertion, when we pop the maximum of max-heap (its root), we violate the heap property by replacing the last children in the heap to be the new root. In order to recover the heap property, we use the following procedures (source: wikipedia):

  1. Replace the root of the heap with the last element on the last level.
  2. Compare the new root with its children; if they are in the correct order, stop.
  3. If not, swap the element with one of its children and repeat step (2). (Swap with its smaller child in a min-heap and its larger child in a max-heap.)
Steps Layout
Extract element 11.
11 is replaced by the the left-most leaf 4.
Heap property is violated (8 is greater than 4). Swapping the two elements 4 and 8 is enough to recover the heap.
Source: wikipedia  

Min-heap Implementation Using Arrays

Implementation: Class Members

template< typename T >
class Heap
{
public:
    size_t size() const
    {
        // Return heap size
    }

    void insert(T value)
    {
        // 1. Insert as leaf
        // 2. Recover heap properties
    }

    T extract()
    {
        // 1. Extract the root
        // 2. Recover heap properties
    }
private:
    // Private methods
private: 
    // Private data members
};

Implementation: Storage Array

template< typename T >
class Heap
{
public:
    // Return heap size
    size_t size() const {}

    // 1. Insert as leaf
    // 2. Recover heap properties
    void insert(T value){}

    // 1. Extract the root
    // 2. Recover heap properties
    T extract(){}
private:
    // Private methods
private: 
    // Private data members
    std::vector< T > data;
};

Implementation: From Parent Index to Child Index + Vice versa

template< typename T >
class Heap
{
public:
    // Return heap size
    size_t size() const {}

    // 1. Insert as leaf
    // 2. Recover heap properties
    void insert(T value){}

    // 1. Extract the root
    // 2. Recover heap properties
    T extract(){}
private:
    // Private methods
    static size_t leftChildIdx(size_t parent)
    {
        return parent * 2 + 1;
    }

    static size_t rightChildIdx(size_t parent)
    {
        return parent * 2 + 2;
    }

    static size_t parentIdx(size_t child)
    {
        if (child % 2 == 1)
            return (child - 1) / 2;
        else
            return (child - 2) / 2;
    }
private: 
    // Private data members
    std::vector< T > data;
};

Implementation: Heap size

template< typename T >
class Heap
{
public:
    // Return heap size
    size_t size() const { return data.size();}

    // 1. Insert as leaf
    // 2. Recover heap properties
    void insert(T value){}

    // 1. Extract the root
    // 2. Recover heap properties
    T extract(){}
private:
    // Private methods
    static size_t leftChildIdx(size_t parent){... }
    static size_t rightChildIdx(size_t parent){... }
    static size_t parentIdx(size_t child){... }
private: 
    // Private data members
    std::vector< T > data;
};

Implementation: Insert & SiftUp

  1. The insert operation adds a new node as the left-most leaf
  2. To recover the heap properties the siftUp is applied on the new node: if node is greater than its parent, then heap is satisfied and terminate, otherwise, swap the node with parent and repeat recursively.
  • Worst case time: to go up along all levels $h= \lfloor \log(n) \rfloor$.
  • $O(T(n)) = O(h) = O(\log(n))$
template< typename T >
class Heap
{
public:
    // Return heap size
    size_t size() const { return data.size();}

    void insert(T value)
    {
        data.push_back(value);
        size_t childIdx = size() - 1;
        siftUp( childIdx ); // Recover heap
    }

    // 1. Extract the root
    // 2. Recover heap properties
    T extract(){}
private:
    // Private methods
    void siftUp( size_t child )
    {
        auto parent = parentIdx(child);
        if( child > 0  && data[child] < data[parent])
        {
            std::swap(data[child], data[parent]);
            siftUp( parent );
        }
    }

    static size_t leftChildIdx(size_t parent){... }
    static size_t rightChildIdx(size_t parent){... }
    static size_t parentIdx(size_t child){... }
private: 
    // Private data members
    std::vector< T > data;
};

Implementation: Extract & SiftDown

  1. The extract operation removes the root and replace it by the left-most leaf.
  2. To recover the heap properties the siftDown is applied on the new root: if root is less than its children, then heap is satisfied and terminate, otherwise, swap the root with the minimum children and repeat recursively.
  • Worst case time: to go down along all levels $h= \lfloor \log(n) \rfloor$.
  • $O(T(n)) = O(h) = O(\log(n))$
template< typename T >
class Heap
{
public:
    // Return heap size
    size_t size() const { return data.size();}

    void insert(T value){... }

    // 1. Extract the root
    // 2. Recover heap properties
    T extract()
    {
        if( data.empty())
        {
            std::cout << "Empty Heap!\n";
            exit( 1 ); // Crash
        }
        size_t child = size() - 1;
        std::swap(data[child], data[0]);
        T value = data.back();
        data.pop_back();
        siftDown(0);
        return value;
    }
private:
    // Private methods
    void siftUp( size_t child ){... }

    void siftDown( size_t parent)
    {
        size_t left = leftChildIdx(parent);
        size_t right = rightChildIdx(parent);
        size_t length = size();
        size_t minimum = parent;

        if (left < length && data[left] < data[minimum])
            minimum = left;

        if (right < length && data[right] < data[minimum])
            minimum = right;

        if (minimum != parent)
        {
            std::swap(data[minimum], data[parent]);
            siftDown( minimum );
        }
    }

    static size_t leftChildIdx(size_t parent){... }
    static size_t rightChildIdx(size_t parent){... }
    static size_t parentIdx(size_t child){... }
private: 
    // Private data members
    std::vector< T > data;
};

Implementation: Heapifying Arbitrary Array

template< typename T >
class Heap
{
public:
    size_t size() const{... }
    void insert(T value){... }
    T extract(){... }

    static Heap make( std::vector< T > data )
    {
        Heap h;
        h.data.swap( data ); // O(1)
        if( h.size() <= 1 ) return h;

        auto lastChild = h.size() - 1;
        for( int subHp = parentIdx( lastChild ); subHp >= 0 ; --subHp )
            h.siftDown( subHp );
        return h;
    }
private:
    // Private methods
    void siftUp( size_t child ){... }
    void siftDown( size_t parent){... }
    static size_t leftChildIdx(size_t parent){... }
    static size_t rightChildIdx(size_t parent){... }
    static size_t parentIdx(size_t child){... }
private: 
    // Private data members
    std::vector< T > data;
};

Final Implementation of Heap

Heap Applications: Heapsort

std::vector< int > heapSort( std::vector< int > a )
{
    auto h = Heap<int>::make( a ); // Heapify: O(n)
    a.clear();
    while( h.size() > 0 ) // O( n * log(n) )
        a.push_back( h.extract()); // O(log(n))
    return a;
}

Time Complexity: \(O(T(n)) = O(n) + O(n\log(n)) = O(n\log(n))\)

  Advanced: avoiding $O(n)$ deep copy
Heapsort

Download the source files

mkdir -p section07/Heap
cd section07/Heap
wget -i https://raw.githubusercontent.com/sbme-tutorials/sbme-tutorials.github.io/master/2020/data-structures/snippets/section07/Heap/download.txt

To build:

mkdir build
cd build
cmake ..
make