MyCodeIdeas: 2014

Saturday, September 20, 2014

Tree Data Structure

Siblings - Nodes with the same parent

Depth of a node- number of edges from the root to the node.

Height of a node - number of edges from the node to the deepest leaf.

Height of a tree - height of a root.

Binary Tree- tree where each node can have no more than two children (a node can have just one)

full binary tree - A binary tree in which each node has exactly zero or two children. There are no nodes with exactly one child.

Complete binary - completely filled, with the possible exception of the bottom level, which is filled from left to right.

Binary Search Tree - A binary tree is a binary search tree (BST) if and only if an inorder traversal of the binary tree results in a sorted sequence.

Binary Search Tree

The keys in a binary search tree are always stored in such a way as to satisfy the binary-search-tree property:

Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y:key <= x:key. If y is a node in the right subtree of x, then y:key >= x:key.

The expected height of a randomly built binary search tree on n distinct keys is O(lg n)

Tree Traversals

In-Order - Left , Root, Right
Pre-Order - Root, Left, Right
Post- Order Left, Right ,Root

INORDER-TREE-WALK(x)

if x !=NIL

INORDER-TREE-WALK(x:left)

print x:key

INORDER-TREE-WALK(x:right)

Complexity θ(n)

Tree Search

TREE-SEARCH(x,k)

if x == NIL or k == x:key

return x

if k < x:key

return TREE-SEARCH(x:left)

else

return TREE-SEARCH(x:right)

ITERATIVE-TREE-SEARCH(x, k)

while x != NIL and k != x:key

if k < x:key

x = x:left

else

x = x:right

return x

Complexity O(h) - h = Height of the tree

Minimum And Maximum

TREE-MINIMUM(x)

while x:left != NIL

x = x:left

return x

TREE-MAXIMUM(x)

while x:right !=NIL

x = x:right

return x

Complexity O(h)

Height of Tree

int height(node)

{

if (node = null)

return 0;

else

left = height(node->left);

right = height(node->right);

return 1 + max(left, right);

}

Complexity θ(n)

Tree Insert /Delete

Complexity O(h)

Red Black Trees

One of many search-tree schemes that are “balanced” in order to guarantee that basic dynamic-set operations take O(lg n) time in the worst case.
By constraining the node colors on any simple path from the root to a leaf, red-black trees ensure that no such path is more than twice as long as any other, so that the tree is approximately balanced.
A red-black tree with n internal nodes has height at most 2 lg(n +1)
Insert and delete will run in O(lg n) time

A red-black tree is a binary tree that satisﬁes the following red-black properties:

Every node is either red or black.
The root is black.
Every leaf (NIL) is black.
If a node is red, then both its children are black.
For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.

Trie

Trie is an efficient information retrieval data structure.
Using trie, search complexities can be brought to optimal limit (key length).
If we store keys in binary search tree, a well balanced BST will need time proportional to O(M * log N), where M is maximum string length and N is number of keys in tree.
Using trie, we can search the key in O(M) time.
However the penalty is on trie storage requirements. Memory requirements of trie is O(ALPHABET_SIZE * key_length * N) where N is number of keys in trie.
There are efficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory requirements of trie.
TRIE has a wide variety of applications in :

Spell checking. Word completion
Data compression
Computational biology
Routing table for IP addresses
Storing/Querying XML documents etc.

For alphabetical keys, each node has an array of (27) pointers to its branches, one for each of the 26 alphabet characters and one for blank (“ ”). The keys are stored in leaf (information) nodes. To access an information node containing a key, we need to move down a series of branch nodes following appropriate branch based on the alphabetical characters composing the key. All trie_node fields that neither intern node nor to an leaf node are represented using null pointers.

Standard STL Containers (Data Structurs)

A container is a holder object that stores a collection of other objects (its elements). They are implemented as class templates, which allows a great flexibility in the types supported as elements.

Container class templates

Sequence containers: (Elements in sequence containers are ordered in a strict linear sequence. Individual elements are accessed by their position in this sequence.)

array - Array class vector
Vector
Deque -Double ended queue
forward_list
List

Container adaptors:

Stack - LIFO stack
Queue -FIFO queue
priority_queue -Priority queue

Associative containers: (Elements in associative containers are referenced by their key and not by their absolute position in the container.)

set
Multiset-Multiple-key set
map
Multimap-Multiple-key map

Unordered associative containers:

unordered_set
unordered_multiset
unordered_map
unordered_multimap

Array

Arrays are fixed-size sequence containers: they hold a specific number of elements ordered in a strict linear sequence.
The elements are stored in contiguous memory locations, allowing constant time random access to elements.
The container uses implicit constructors and destructors to allocate the required space statically. Its size is compile-time constant.

Complexity :

Almost all the operations are O(1)
Remove/Insert operations that handles element between elements require O(n),because the elements need to be shifted.

Note! Hence, an array as a data structure is used where remove/insert operations are not required.

Vector

Vectors are sequence containers representing arrays that can change in size.
Vectors use contiguous storage locations for their elements.
vectors use an array as their underlying storage
size can change dynamically, with their storage being handled automatically by the container.
If a reallocation happens, all iterators, pointers and references related to the container are invalidated,
vector containers may allocate some extra storage to accommodate for possible growth.

vectors consume more memory than arrays in exchange for the ability to manage storage and grow dynamically in an efficient way.

Complexity :

Accessors : [], at, size,empty, begin,end,front,back,capacity - Constant time
Modifiers : pushback (ammortized),popback - constant time

insert,assign,erase - Linear

Deque

sequence containers with dynamic sizes that can be expanded or contracted on both ends
provide a functionality similar to vectors, but with efficient insertion and deletion of elements also at the beginning of the sequence, and not only at its end.
Generally implemented as a dynamic array. (Vector of vectors)
unlike vectors, deques are not guaranteed to store all its elements in contiguous storage locations
vectors use a single array that needs to be occasionally reallocated for growth, the elements of a deque can be scattered in different chunks of storage, with the container keeping the necessary information internally to provide direct access to any of its elements in constant time and with a uniform sequential interface (through iterators).

Complexity :

Accessors : [], at, size,empty, begin,end,front,back - Constant time
Modifiers : pushback (ammortized),pushpopback - constant time

insert,assign,erase - Linear

List

Lists are sequence containers that allow constant time insert and erase operations anywhere within the sequence, and iteration in both directions.
List containers are implemented as doubly-linked lists;
contain in different and unrelated storage locations.
Each element keeps information on how to locate the next and the previous elements, allowing constant time insert and erase operations before or after a specific element (even of entire ranges), but no direct random access.
Lists perform generally better in inserting, extracting and moving elements in any position within the container for which an iterator has already been obtained, and therefore also in algorithms that make intensive use of these, like sorting algorithms.
they lack direct access to the elements by their position
They also consume some extra memory to keep the linking information associated to each element

Complexity :

Accessors : begin,end, empty, front,back - Constant time

Size -Linear

Modifiers :popfront, pushfront, popback, pushback - constant time

insert, erase - Linear

-------------------------------------------------------------------------------------------------------------

Stack

Stacks are a type of container adaptor, specifically designed to operate in a LIFO context (last-in first-out), where elements are inserted and extracted only from one end of the container.
The container shall support the following operations:

empty
size
back
push_back
pop_back

The standard container classes vector, deque and list fulfill these requirements. By default, if no container class is specified for a particular stack class instantiation, the standard container deque is used.

Complexity : Operations on the underline container class will be called and complexity depend on them. (Constant)

Queue:

queues are a type of container adaptor, specifically designed to operate in a FIFO context (first-in first-out), where elements are inserted into one end of the container and extracted from the other.
This underlying container shall support at least the following operations:

empty
size
front
back
push_back
pop_front

The standard container classes deque and list fulfill these requirements. By default, if no container class is specified for a particular queue class instantiation, the standard container deque is used.

Complexity : Operations on the underline container class will be called and complexity depend on them. (Constant)

Priority Queue

Priority queues are a type of container adaptors, specifically designed such that its first element is always the greatest of the elements it contains, according to some strict weak ordering criterion.
Elements are popped from the "back" of the specific container, which is known as the top of the priority queue.
The underline container shall be accessible through random access iterators and support the following operations:

empty()
size()
front()
push_back()
pop_back()

The standard container classes vector and deque fulfill these requirements. By default, if no container class is specified for a particular priority_queue class instantiation, the standard container vector is used.

Complexity : top, size, empty - Operations on the underline container class will be called and complexity depend on them. (Constant)

push, pop - log n

----------------------------------------------------------------------------------------------------------------------

Set / MultiSet

Unique keys - No two elements in the container can have equivalent keys. (only in set)
Ordered - The elements in the container follow a strict order at all times.
Associative - Elements in associative containers are referenced by their key and not by their absolute position in the container.
Sets are typically implemented as binary search trees.(balanced binary search trees, typically red-black trees) Thus, they provide logarithmic storage and retrieval times.
MultiSet -multiple elements can have equivalent values.

Complexity : size, empty, begin, end - Constant

insert, erase - log n

clear - Linear

Map:

Maps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order.
key values - sort and uniquely identify the elements
mapped values - store the content associated to this key.
implemented as binary search trees.

Complexity : size, empty, begin, end - Constant

[], at, insert, delete,find,upper bound ,lower bound - log n

Clear - linear

References

http://www.cs.northwestern.edu/~riesbeck/programming/c++/stl-summary.html

Thursday, July 24, 2014

Security Issues in IPV6

Following are some security issues in IPv6.

Security practitioners need education/training on IPv6.

IPv6 will come to the networks under administrator's control – it's only a matter of time. As with any new networking technology, it's essential that admins learn the basics of IPv6, especially the addressing scheme and protocols, in order to facilitate incident handling and related activities.

Security tools need to be upgraded.

IPv6 is not backwards compatible. The hardware and software used to route traffic across networks and perform security analyses won't work with IPv6 traffic unless they are upgraded to versions that support the protocol. This is especially important to remember when it comes to perimeter-protection devices. Routers, firewalls and intrusion-detection systems may require software and/or hardware upgrades in order to "speak" IPv6. Many manufacturers already have these upgrades available. For example, Cisco networking devices support IPv6 as of IOS release 12.0S.

Existing equipment may require additional configuration.

The devices that do support IPv6 typically treat it as an entirely separate protocol (as they should). Therefore, the access control lists, rule bases and other configuration parameters may need to be reevaluated and translated to support an IPv6 environment. Contact the appropriate manufacturers for specific instructions.

Tunneling protocols create new risks.

The networking and security communities have invested time and energy in ensuring that IPv6 is a security-enabled protocol. However, one of the greatest risks inherent in the migration is the use of tunneling protocols to support the transition to IPv6. These protocols allow the encapsulation of IPv6 traffic in an IPv4 data stream for routing through non-compliant devices. Therefore, it's possible that users on the network can begin running IPv6 using these tunneling protocols before admins are ready to officially support it in production. If this is a concern, block IPv6 tunneling protocols (including SIT, ISATAP, 6to4 and others) at the perimeter.

IPv6 autoconfiguration creates addressing complexity.

Autoconfiguration, another interesting IPv6 feature, allows systems to automatically gain a network address without administrator intervention. IPv6 supports two different autoconfiguration techniques. Stateful autoconfiguration uses DHCPv6, a simple upgrade to the current DHCP protocol, and doesn't reflect much of a difference from a security perspective. On the other hand, keep an eye on stateless autoconfiguration. This technique allows systems to generate their own IP addresses and checks for address duplication. This decentralized approach may be easier from a system administration perspective, but it raises challenges for those of us charged with tracking the use (and abuse!) of network resources.

Effective rate limiting is hard to achieve

Rate limiting is a straightforward tactic which is probably use to protect the network from automated attack tools. This works on IPv4 networks, making automated attacks less likely to succeed or harder to launch by forcing hackers to deliberately slow their automated attack tools, or to use multiple hosts from which to launch attacks on the network.

The tactic doesn't really work on IPv6 networks. That's because IPv6 networks are so vast that it's impractical to rate limit at the 128-bit address level, Vyncke pointed out. In any case, hackers may be allotted millions or even billions of IPv6 addresses, meaning that to rate limit effectively admins would need to limit addresses at the 48-bit or 64-bit level. Right now it's simply not clear what practical approach should be used to provide the same level of protection. "The industry has yet to learn how to do it," Vyncke warned.

Reputation-based protection does not (yet) exist

Many security software vendors use the reputation of IP addresses to filter out malicious websites that are known sources of malware. While reputation systems for IPv4 addresses already exist, it's a bit of a chicken-and-egg situation when it comes to IPv6. No one has established an IPv6 reputation database, so no one is using reputation-based security with IPv6 addresses -- and therefore no one is building a reputation database. It's something the security industry will surely eventually adopt, but for now it’s a missing piece in the security puzzle.

Logging systems may not work properly

The key feature of IPv6 is that it uses 128-bit addresses, which are stored as a 39-digit string. IPv4 addresses, on the other hand, are written in the form 192.168.211.255 and may therefore be stored in a 15-character field. If the logging systems expect 15-character IP addresses, they may crash when they encounter "monster" 39 -digit IPv6 addresses (creating possible buffer overflow error-related security problems) or they may only store only the first 15 characters, rendering the logged information useless. The only solution is to upgrade all the logging systems to support IPv6 addresses.

IPv6 may run by default

Although admins may think that they are running an IPv4-only data center, with IPv4-only IDS, monitoring and so on, but IPv6 could be activated and running without knowledge. That's because in some circumstances (such as an attacker on the network sending router advertisements), devices on he network can start communicating with each other by default over IPv6 using link-local addresses. (For more information, see the IETF Rogue IPv6 Router Advertisement Problem Statement.) "Your IDS will see none of this traffic, so you should definitely upgrade it to IPv6 now, and make sure that its operators are trained to use IPv6," warned Vyncke.

SIEM systems may not work properly

Another problem with IPv6 is that every host -- inside or outside the network perimeter -- can have multiple IPv6 addresses simultaneously. This is not usual in the IPv4 world, and it can cause serious problems. "For example, how do admins know by looking at the logs that different entries refer to the same host?" asked Vyncke. In order to make sense of logs it is needed to be able to correlate addresses to hosts, but Vyncke warned that thus far no SIEM system fully supports IPv6 fully. It may support it at the network level, for example, but the correlation engine may not.

Simple log analysis using grep won't work

Yet another problem is that the same IPv6 address can be written in multiple ways, for example: 2001:0DB8:0BAD::0DAD

2001:DB8:BAD:0:0:0:0:DAD

2001:db8:bad::dad (this is the canonical RFC 5952 format)

As a result, a grep search through the log files is not going to work as before. If devices log in using different IPv6 formats, it may have to reconfigure the way they log or change the way it is search to catch all the information in the logs about a device.

References:

[1]Security on IPv6 by Dequan Yang et al - IEEE Advanced Computer Control (ICACC), 2010 2nd International Conference on (Volume:3 )
[1]Global Information Assurance Certification Paper- SANS Institute.
[2]Understanding Data Security - Trends and Predictions -By Paul Rubens October 18, 2012
http://www.esecurityplanet.com/network-security/7-ipv6-security-risks.html

Improving network reliability of Wireless Sensor Networks

When talking about network reliability, we should consider the different types of networks as well. For example, Wireless Sensor Networks (WSNs) have the potential of revolutionizing the way wireless technology has been used over the decades.Monitoring critical structures such as high speed railway bridges

requires the monitoring network to be highly reliable. However, there is still a lack of reliability studies of WSNs since,

No attempt to define an accurate fault model from experimental evidence.
Fault forecasting methodologies have never been applied to WSNs.
The majority of research results are proved by means of simulation.
No attempt to make the sensor nodes intelligent enough to recall the information of interest despite of corrupted signal sensed at the destination and hence to enhance the reliability of communication

Reliability in WSN reflects a functional unit’s ability to meet performance specifications over a specified period of time and this is often expressed as a probability or mean time to failure (MTTF). Factors affectio the reliablity in WSNs are as follows.

Hardware failure
Inappropriate communication scheme
Constrained resources in sensor nodes
Error prone wireless communication medium

Various models and techniques are used to improve reliability in WSNs.

1) Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. Sensor nodes are considered to be homogeneous since the researches in the field of WSNs have been evolved, but some nodes may be of different energy to prolong the lifetime of a WSN and its reliability.

2) Also routing algorithms have been proposed for real-time wireless sensor networks using a hybrid algorithm that can increase reliability and network lifetime criterions[1]

3) A Reliability framework also build for data transport based on the different operational phases of the WSN protocols. For this, a fault model was established to capture the possible failures along with generalized data transport and reliability semantics and also developed a reliability block model based approach that exploits the decomposition of the complex data transport problem into operations and simplifies the investigation of the overall reliability of data transport. [2]

References
[1] Sanaz Naziri, Majid Haghparast and Somayeh Hasanpoor, “Improving Lifetime and Reliability in Routing Real-Time Wireless Sensor Networks based on Hybrid Algorithm” at Australian Journal of Basic and Applied Sciences, 5(9): 1105-1109, 2011

[2] Faisal Karim Shaikh, Abdelmajid Khelil and Neeraj Suri, “On Modeling the Reliability of Data Transport in Wireless Sensor Networks”, in IEEE international conference in parallel, distribute and network based processing, pp. 395-402, 2007