Saturday, September 20, 2014

Tree Data Structure

Siblings - Nodes with the same parent
Depth of a node- number of edges from the root to the node.
Height of a node - number of edges from the node to the deepest leaf. 
Height of a tree - height of a root.

Binary Tree- tree where each node can have no more than two children (a node can have just one)
full binary tree - A binary tree in which each node has exactly zero or two children. There are no nodes with exactly one child.
Complete binary - completely filled, with the possible exception of the bottom level, which is filled from left to right.
Binary Search Tree - A binary tree is a binary search tree (BST) if and only if an inorder traversal of the binary tree results in a sorted sequence.

Binary Search Tree

  • The keys in a binary search tree are always stored in such a way as to satisfy the binary-search-tree property:
Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y:key <= x:key. If y is a node in the right subtree of x, then y:key >= x:key.
  • The expected height of a randomly built binary search tree on n distinct keys is O(lg n)

Tree Traversals

  • In-Order - Left , Root, Right
  • Pre-Order - Root, Left, Right
  • Post- Order  Left, Right ,Root

INORDER-TREE-WALK(x)
if x !=NIL
INORDER-TREE-WALK(x:left)
print x:key
INORDER-TREE-WALK(x:right)

Complexity θ(n)


Tree Search

TREE-SEARCH(x,k)
if x == NIL or k == x:key
return x
if k < x:key
return TREE-SEARCH(x:left)
else
return TREE-SEARCH(x:right)


ITERATIVE-TREE-SEARCH(x, k)
while x != NIL and k != x:key
if k < x:key
x = x:left
else
x = x:right
return x

Complexity O(h) - h = Height of the tree

Minimum And Maximum

TREE-MINIMUM(x)
while x:left != NIL
x = x:left
return x

TREE-MAXIMUM(x)
while x:right !=NIL
x = x:right
return x

Complexity O(h)

Height of Tree

int height(node)
{
if (node = null)
    return 0;
else
    left = height(node->left);
    right = height(node->right);
    return 1 + max(left, right);
}

Complexity θ(n)

Tree Insert /Delete

Complexity O(h)


Red Black Trees

  • One of many search-tree schemes that are “balanced” in order to guarantee that basic dynamic-set operations take O(lg n) time in the worst case.
  • By constraining the node colors on any simple path from the root to a leaf, red-black trees ensure that no such path is more than twice as long as any other, so that the tree is approximately balanced.
  • A red-black tree with n internal nodes has height at most 2 lg(n +1)
  • Insert and delete will run in O(lg n) time

  • A red-black tree is a binary tree that satisfies the following red-black properties:
    • Every node is either red or black.
    • The root is black.
    • Every leaf (NIL) is black.
    • If a node is red, then both its children are black.
    • For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.

Trie


  • Trie is an efficient information retrieval data structure.
  • Using trie, search complexities can be brought to optimal limit (key length).
  • If we store keys in binary search tree, a well balanced BST will need time proportional to O(M * log N), where M is maximum string length and N is number of keys in tree.
  • Using trie, we can search the key in O(M) time.
  • However the penalty is on trie storage requirements. Memory requirements of trie is O(ALPHABET_SIZE * key_length * N) where N is number of keys in trie.
  • There are efficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory requirements of trie.
  • TRIE has a wide variety of applications in :
    • Spell checking. Word completion
    • Data compression
    • Computational biology
    • Routing table for IP addresses
    • Storing/Querying XML documents etc.
  • For alphabetical keys, each node has an array of (27) pointers to its branches, one for each of the 26 alphabet characters and one for blank (“ ”). The keys are stored in leaf (information) nodes. To access an information node containing a key, we need to move down a series of branch nodes following appropriate branch based on the alphabetical characters composing the key. All trie_node fields that neither intern node nor to an leaf node are represented using null pointers.

Standard STL Containers (Data Structurs)

A container is a holder object that stores a collection of other objects (its elements). They are implemented as class templates, which allows a great flexibility in the types supported as elements.

Container class templates

Sequence containers: (Elements in sequence containers are ordered in a strict linear sequence. Individual elements are accessed by their position in this sequence.)
  • array  - Array class vector
  • Vector
  • Deque -Double ended queue
  • forward_list
  • List

Container adaptors:
  • Stack - LIFO stack
  • Queue -FIFO queue
  • priority_queue -Priority queue

Associative containers: (Elements in associative containers are referenced by their key and not by their absolute position in the container.)
  • set
  • Multiset-Multiple-key set
  • map
  • Multimap-Multiple-key map

Unordered associative containers:
  • unordered_set
  • unordered_multiset
  • unordered_map
  • unordered_multimap


Array
  • Arrays are fixed-size sequence containers: they hold a specific number of elements ordered in a strict linear sequence.
  • The elements are stored in contiguous memory locations, allowing constant time random access to elements.
  • The container uses implicit constructors and destructors to allocate the required space statically. Its size is compile-time constant.
Complexity :
  • Almost all the operations are O(1)
  • Remove/Insert operations that handles element between elements require O(n),because the elements need to be shifted.
Note! Hence, an array as a data structure is used where remove/insert operations are not required.

Vector
  • Vectors are sequence containers representing arrays that can change in size.
  • Vectors use contiguous storage locations for their elements.
  • vectors use an array as their underlying storage
  • size can change dynamically, with their storage being handled automatically by the container.
  • If a reallocation happens, all iterators, pointers and references related to the container are invalidated,
  • vector containers may allocate some extra storage to accommodate for possible growth.
vectors consume more memory than arrays in exchange for the ability to manage storage and grow dynamically in an efficient way.

Complexity :
  • Accessors : [], at, size,empty, begin,end,front,back,capacity   - Constant time
  • Modifiers : pushback (ammortized),popback  - constant time        
insert,assign,erase - Linear

Deque

  • sequence containers with dynamic sizes that can be expanded or contracted on both ends
  • provide a functionality similar to vectors, but with efficient insertion and deletion of elements also at the beginning of the sequence, and not only at its end.
  • Generally implemented as a dynamic array. (Vector of vectors)
  • unlike vectors, deques are not guaranteed to store all its elements in contiguous storage locations
  • vectors use a single array that needs to be occasionally reallocated for growth, the elements of a deque can be scattered in different chunks of storage, with the container keeping the necessary information internally to provide direct access to any of its elements in constant time and with a uniform sequential interface (through iterators).

Complexity :
  • Accessors : [], at, size,empty, begin,end,front,back   - Constant time
  • Modifiers : pushback (ammortized),pushpopback  - constant time        
insert,assign,erase - Linear

List
  • Lists are sequence containers that allow constant time insert and erase operations anywhere within the sequence, and iteration in both directions.
  • List containers are implemented as doubly-linked lists;
  • contain in different and unrelated storage locations.
  • Each element keeps information on how to locate the next and the previous elements, allowing constant time insert and erase operations before or after a specific element (even of entire ranges), but no direct random access.
  • Lists perform generally better in inserting, extracting and moving elements in any position within the container for which an iterator has already been obtained, and therefore also in algorithms that make intensive use of these, like sorting algorithms.
  • they lack direct access to the elements by their position
  • They also consume some extra memory to keep the linking information associated to each element

Complexity :
  • Accessors : begin,end, empty, front,back   - Constant time
Size -Linear
  • Modifiers :popfront, pushfront, popback, pushback - constant time        
insert, erase - Linear

-------------------------------------------------------------------------------------------------------------
Stack
  • Stacks are a type of container adaptor, specifically designed to operate in a LIFO context (last-in first-out), where elements are inserted and extracted only from one end of the container.
  • The container shall support the following operations:
    • empty
    • size
    • back
    • push_back
    • pop_back
  • The standard container classes vector, deque and list fulfill these requirements. By default, if no container class is specified for a particular stack class instantiation, the standard container deque is used.

Complexity : Operations on the underline container class will be called and complexity depend on them. (Constant)

Queue:
  • queues are a type of container adaptor, specifically designed to operate in a FIFO context (first-in first-out), where elements are inserted into one end of the container and extracted from the other.
  • This underlying container shall support at least the following operations:
    • empty
    • size
    • front
    • back
    • push_back
    • pop_front
  • The standard container classes deque and list fulfill these requirements. By default, if no container class is specified for a particular queue class instantiation, the standard container deque is used.

Complexity : Operations on the underline container class will be called and complexity depend on them. (Constant)

Priority Queue
  • Priority queues are a type of container adaptors, specifically designed such that its first element is always the greatest of the elements it contains, according to some strict weak ordering criterion.
  • Elements are popped from the "back" of the specific container, which is known as the top of the priority queue.
  • The underline container shall be accessible through random access iterators and support the following operations:
    • empty()
    • size()
    • front()
    • push_back()
    • pop_back()

  • The standard container classes vector and deque fulfill these requirements. By default, if no container class is specified for a particular priority_queue class instantiation, the standard container vector is used.

Complexity : top, size, empty - Operations on the underline container class will be called and complexity depend on them. (Constant)
  push, pop - log n

----------------------------------------------------------------------------------------------------------------------

Set / MultiSet

  • Unique keys - No two elements in the container can have equivalent keys. (only in set)
  • Ordered  - The elements in the container follow a strict order at all times.
  • Associative - Elements in associative containers are referenced by their key and not by their absolute position in the container.
  • Sets are typically implemented as binary search trees.(balanced binary search trees, typically red-black trees) Thus, they provide logarithmic storage and retrieval times.
  • MultiSet  -multiple elements can have equivalent values.



Complexity : size, empty, begin, end - Constant
insert, erase - log n
clear - Linear

Map:
  • Maps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order.
  • key values - sort and uniquely identify the elements
  • mapped values - store the content associated to this key.
  •  implemented as binary search trees.

Complexity : size, empty, begin, end - Constant
[], at, insert, delete,find,upper bound ,lower bound - log n
Clear - linear




References