Tuesday, December 15, 2015

Getting Started With WSO2 ESB

Need of Integration:

Today's business applications rarely live in isolation. Users expect instant access to all business functions an enterprise can offer, regardless of which system the functionality may reside in. This requires disparate applications to be connected into a larger, integrated solution, which is generally achieved through some form of "middleware". Middleware provides the "plumbing" such as data transport, data transformation, and routing.

What is ESB?

An enterprise service bus (ESB) is a "software architecture" model used for designing and implementing communication between mutually interacting software applications in a service-oriented architecture(SOA). There are two major uses of the ESB.
  • For enterprise integration - the connection of disparate resources to fulfill customer and business scenarios.
  • As an infrastructure backbone for service-oriented architecture (SOA)


WSO2 ESB is a fast and lightweight, enterprise service bus. It is based on Apache Synapse, the lightweight ESB from the ASF and WSO2 Carbon, the OSGi based components framework for SOA. WSO2 Carbon makes it possible to easily install and configure additional features into the ESB runtime.

  • Full XML and Web Services Support with major protocol/standard support.
    • Transport - HTTP/S, POP/IMAP, SMTP, JMS, AMQP, FIX, Raw TCP, Raw UDP, SAP, File transports (FTP/SFTP/CIFS)
    • Content Interchange Formats - SOAP 1.1, SOAP 1.2, POX, HTML, Plain text, binary, JSON, Hessian
    • WS-* Standards -   WS-Addressing, WS-Security, WS-Reliable Messaging, WS-Policy, WS-Discovery, MTOM/SwA
  • Has built in support for many common requirements
    • Content based routing
    • Service virtualization
    • Load balancing
    • Fail-over sending
    • Protocol switching
    • Message transformation
    • Logging & monitoring
    • Message splitting and aggregation
    • Enterprise integration patterns
  • Internationalized Graphical Console - WSO2 ESB provides a set of management services and a graphical user interface to configure/manage/monitor the running ESB server.    
  • Server Management and System Monitoring 
Modes of Operations:

WSO2 ESB supports 4 modes of operation

  • Message mediation (ESB as a message router. It can filter, transform, drop messages or forward them to remote endpoints for further processing when operating in this mode.)
  • Service mediation (Expose service endpoints on ESB. It acts as a proxy to a real Web Service hosted in a Web Services container like Apache Axis2 or WSO2 WSAS.)
  • Task scheduling (Run periodic tasks on ESB)
  • Eventing (ESB as an event broker)
Most real world scenarios require the ESB to operate in multiple modes at the same time.

ESB Functional Components:
Each functional component serves a specific purpose and functional components can be mixed and matched to implement various integration scenarios and patterns. Configuring WSO2 ESB for a given scenario requires:
  • Identifying the right set of components
  • Putting them together in the optimal manner
The basic functional components of WSO2 ESB are as follows.
  • Mediators: This is the fundamental component of any message flow implementation of WSO2 ESB. A mediator has an input message, an output message and some associated configuration. This configuration is a pure XML based configuration that can be use to transfrom, alter or manipulate the incoming message of a Mediator to form the out going message.
  • Sequence: A sequence is a sequential arrangement of a set of mediators. It resembles an industrial pipeline. A given sequence may have an arbitrary number of mediators and message flows through them in sequential manner.
  • End Points: a logical representation of an actual back-end service or a group of back-end services (i.e LoadBalancing and FailOver)
  • Proxy Services:  a 'virtual service', that provides the required business functionality by seamless integration of backend services. It is a combination of three sequences and a target endpoint.
    • In-sequence: All the incoming requests to the proxy service are dispatched to the in-sequence.
    • Out-sequence: All the responses coming back from the backend service implementation are dispatched to this sequence. 
    • Fault sequence: If an error occurs during service mediation the faulty message is handed to the fault sequence for error handling work.
  • REST API: The REST API allows to expose RESTful interfaces and mediate RESTful invocations by mapping REST concepts to SOAP via proxy services.
  • Message Stores/Processors: A message store is used to temporarily store messages before they are delivered to their destination by a message processor. This approach is useful for serving traffic to back-end services that can only accept messages at a given rate, whereas incoming traffic to the ESB arrives at different rates.A message processor is used to deliver messages that have been temporarily stored in a message store.
  • Templates: Very large amount of configuration files in the form of sequences, endpoints, proxies and transformations can be required to satisfy all the mediation requirements of the system.In such cases a number of configuration files will be scattered all over and would be extremely hard to manage. ESB templates try to minimize this redundancy by creating prototypes that users can reuse and utilize as and when needed. 
  • Tasks: A task in WSO2 ESB allows you to run a piece of code triggered by a timer.
  • Local Entires: The local registry acts as a memory registry where you can store text strings, XML strings, and URLs. These entries can be retrieved from a mediator.These entries are top level entries which are globally visible within the entire system.
  • Priority Executors: Priority executors can be used to execute sequences with a given priority. This allows the user to control the resources allocated to executing sequences and prevent high priority messages from getting delayed and dropped.
  • Registry: WSO2 ESB makes use of a registry to store various configurations and artifacts such as sequences and endpoints. Simply put, a registry is a content store and a metadata repository. 



Saturday, October 3, 2015

Clustering Techniques

Clustering techniques are used to improve performance and availability of a complex system. Generally speaking a cluster is intended as a redundant set of services providing the same set of functionalities.

Cluster quality can be measured by:
  • Reliability the ability to successfully provide responses on each incoming request
  • Availability the uptime of the server (usually measured as % of annual uptime)
  • Performance measured by the average of the time spent by the service to provide responses or by the throughput
  • Scalability is the ability to handle a growing amount of work in a capable manner without degradation in the quality of service (e.g. non-decreasing throughput)

High-availability (HA) clusters are groups of services that can be reliably utilized with a minimum (or null) down-time. Without clustering, if a service crashes or is too busy, the user asking for that service will never get a quick response. HA clustering should be designed to remedies this situation by detecting service down (using a whatchdog), and immediately restarting it. During this operation the service will be provided by a failover instance of the same service.

Scaling a software usually means adding more instances of the product. We can distinguish two different ways of scaling:
  • Horizontal Scalability also known as scale out which can be performed by adding additional HW resources to the existing pool. In our context this means adding more physical or virtual machines with GeoServer installed.
  • Vertical Scalability also known as scale up which can be performed by getting more powerful hardware (single box with more CPUs/memory), and that normally needs to be compounded by adding more SW instances as well on the same boxes (as in practice no software is 100% linearly scalable) In our context this means installing more GeoServer instances on the existing HW to fully utilize available resources, in particular the extra CPUs.

Cluster Configurations
  • Active/active:
All the node of the cluster are active at the same time. The traffic is distributed on all the services.
This gives maximum performance but if a single HW instance fails the overall throughput and response time may suffer. On the other side this set-up provide maximum performance.

  • Active/passive :
Provides a fully redundant instance of each node, which is only brought on-line when its associated primary node fails.This gives maximum availability but does not fully utilize all the resources available since at most times part of them will be not utilized.

References : http://geoserver.geo-solutions.it/

Monday, June 1, 2015

Socket Programming

Sockets allow communication between two different processes on the same or different machines. To be more precise, it's a way to talk to other computers using standard Unix file descriptors.

Types of sockets:
  • Internet Sockets (DARPA Internet addresses )
  • Unix Sockets (path names on a local node)
  • X.25 Sockets (CCITT X.25 addresses)

Types of Internet Sockets:
  • Stream Sockets (SOCK_STREAM)
  • Datagram Sockets (SOCK_DGRAM) - connectionless sockets
  • Raw Sockets
  • Sequenced Packet Sockets

Stream Sockets (SOCK_STREAM)
  • reliable two-way connected communication streams
  • Same order will be maintained on both sides
  • error-free
  • eg:telnet
  • Use TCP - (So data arrives sequentially and error-free.)

Datagram Sockets (SOCK_DGRAM) - connectionless sockets
  • if you send a datagram, it may arrive. It may arrive out of order.
  • If it arrives, the data within the packet will be error-free.
  • use IP for routing.
  • But not TCP. It is UDP
  • used either when a TCP stack is unavailable or when a few dropped packets here and there is not a problem
  • Ex: tftp (trivial file transfer protocol, a little brother to FTP), dhcpcd (a DHCP client), multiplayer games, streaming audio, video conferencing
  • Why would you use an unreliable underlying protocol?  speed
  • How to implement reliable SOCK_DGRAM applications:  tftp and similar programs have their own protocol on top of UDP. For example, the tftp protocol says that for each packet that gets sent, the recipient has to send back a packet that says, "I got it!" (an "ACK" packet.) If the sender of the original packet gets no reply in, say, five seconds, he'll re-transmit the packet until he finally gets an ACK.

Raw Sockets
  • These provide users access to the underlying communication protocols, which support socket abstractions.
  • normally datagram oriented, though their exact characteristics are dependent on the interface provided by the protocol.
  • provided mainly for those interested in developing new communication protocols, or for gaining access to some of the more cryptic facilities of an existing protocol

Sequenced Packet Sockets:
  • similar to a stream socket, with the exception that record boundaries are preserved

Hostname Resolution:
  • The process of finding out dotted IP address based on the given alphanumeric host name.
  • Done by Domain Name Systems (DNS)
  • The correspondence between host names and IP addresses is maintained in a file /ect/hosts

Layered Architecture:

A layered model more consistent with Unix might be:
  • Application Layer (telnet, ftp, etc.)
  • Host-to-Host Transport Layer (TCP, UDP)
  • Internet Layer (IP and routing)
  • Network Access Layer (Ethernet, wi-fi, or whatever)

  • All you have to do for stream sockets is send() the data out.
  • All you have to do for datagram sockets is encapsulate the packet in the method of your choosing and sendto() it out.
  • The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer

How to Make Client
The steps involved in establishing a socket on the client side are as follows:

  • Create a socket with the socket() system call.
  • Connect the socket to the address of the server using the connect() system call.
  • Send and receive data. There are a number of ways to do this, but the simplest way is to use the read() and write() system calls.

How to make a Server:
The steps involved in establishing a socket on the server side are as follows:

  • Create a socket with the socket() system call.
  • Bind the socket to an address using the bind() system call. For a server socket on the Internet, an address consists of a port number on the host machine.
  • Listen for connections with the listen() system call.
  • Accept a connection with the accept() system call. This call typically blocks the connection until a client connects with the server.
  • Send and receive data using the read() and write() system calls

  • To resolve the problem of identifying a particular server process running on a host, both TCP and UDP have defined a group of well-known ports.
  • defined as an integer number between 0 and 65535. This is because all port numbers smaller than 1024 are considered well-known -
  • The port assignments to network services can be found in the file /etc/services.
  • Normally it is a practice to assign any port number more than 5000.

Network Byte Order
  • not all computers store the bytes that comprise a multibyte value in the same order.
    • Little Endian: In this scheme, low-order byte is stored on the starting address (A) and high-order byte is stored on the next address (A + 1).
    • Big Endian: In this scheme, high-order byte is stored on the starting address (A) and low-order byte is stored on the next address (A+1).
  • Network Byte Order:To allow machines with different byte order conventions communicate with each other, the Internet protocols specify a canonical byte order convention for data transmitted over the network

The select Function
  • The select function indicates which of the specified file descriptors is ready for reading, ready for writing, or has an error condition pending.
  • When an application calls recv or recvfrom, it is blocked until data arrives for that socket.
  • An application could be doing other useful processing while the incoming data stream is empty. Another situation is when an application receives data from multiple sockets.
  • Calling recv or recvfrom on a socket that has no data in its input queue prevents immediate reception of data from other sockets.
  • The select function call solves this problem by allowing the program to poll all the socket handles to see if they are available for non-blocking reading and writing operations.

Blocking vs Non Blocking Sockets

  • By default, TCP sockets are in "blocking" mode. Its possible to set a descriptor so that it is placed in "non-blocking" mode.

Blocking Mode:
  • When you call recv() to read from a stream, control isn't returned to your program until at least one byte of data is read from the remote site.
  • This process of waiting for data to appear is referred to as "blocking". This is same for connect(), write().

Non Blocking Mode:
  • When placed in non-blocking mode, you never wait for an operation to complete.
  • If you call "recv()" in non-blocking mode, it will return any data that the system has in it's read buffer for that socket.
  • But, it won't wait for that data.
  • If the read buffer is empty, the system will return from recv() immediately saying ``"Operation Would Block!"''
  • Non-blocking sockets can also be used in conjunction with the select() API. In fact, if you reach a point where you actually WANT to wait for data on a socket that was previously marked as "non-blocking", you could simulate a blocking recv() just by calling select() first, followed by recv().
  •  Programs that use non-blocking sockets typically use one of two methods when sending and receiving data.

  1. polling-=when the program periodically attempts to read or write data from the socket using a timer.
  2. asynchronous notification-the program is notified whenever a socket event takes place, and in turn can respond to that
  • When designing a high performance networking application with non-blocking socket I/O, the architect needs to decide which polling method to use to monitor the events generated by those sockets.

  1. Polling with select()
  2. Polling with poll()
  3. Polling with epoll()
  4. Polling with libevent


Multi-threaded Programming

  • Program in execution.
  • It has
    • text section - program code
    • program counter - current activity
    • contents in the processor's registers -
    • stack - temporary data ( function parameters, local variables, return values)
    • data section - global variables
    • heap - memory which is dynamically allocated

  • Process states
    • new
    • running
    • waiting
    • Ready
    • Terminated

  • basic unit of CPU utilization.
  • It has
    • Thread id
    • program counter
    • register set
    • Stack
    • Signal mask
    • Priority
    • Return value

  • It shares with other threads belonging to the same process
    • code section
    • data section
    • other operating system resources (open files, signals)
    • Process instructions
    • Current workign directory
    • User and group id

Posix(pthreads) Threads

Why Pthreads
  • Light Weight - When compared to the cost of creating and managing a process, a thread can be created with much less operating system overhead. Managing threads requires fewer system resources than managing processes.
  • Efficient Communications/Data Exchange- on a multi-processor architecture is to achieve optimum performance.

Programs having the following characteristics may be well suited for pthreads:
  • Work that can be executed, or data that can be operated on, by multiple tasks simultaneously:
  • Block for potentially long I/O waits
  • Use many CPU cycles in some places but not others
  • Must respond to asynchronous events
  • Some work is more important than other work (priority interrupts)

Thread-safeness: - an application's ability to execute multiple threads simultaneously without "clobbering" shared data or creating "race" conditions. A race condition occurs when two or more threads can access shared data and they try to change it at the same time

Thread Limits -the maximum number of threads permitted, and the default thread stack size are two important limits to consider when designing your program.

Pthread API

The subroutines which comprise the Pthreads API can be informally grouped into four major groups:

  • Thread management: Routines that work directly on threads - creating, detaching, joining, etc. They also include functions to set/query thread attributes (joinable, scheduling etc.)
    • pthread_create - create a new thread
    • pthread_join - wait for termination of another thread
    • pthread_exit - terminate the calling thread
  • Mutexes: Routines that deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion". Mutex functions provide for creating, destroying, locking and unlocking mutexes. These are supplemented by mutex attribute functions that set or modify attributes associated with mutexes.
  • Condition variables: Routines that address communications between threads that share a mutex. Based upon programmer specified conditions. This group includes functions to create, destroy, wait and signal based upon specified variable values. Functions to set/query condition variable attributes are also included.
  • Synchronization: Routines that manage read/write locks and barriers.The threads library provides three synchronization mechanisms:
    • mutexes - Mutual exclusion lock: Block access to variables by other threads. This enforces exclusive access by a thread to a variable or set of variables.
    • joins - Make a thread wait till others are complete (terminated).
    • condition variables - data type pthread_cond_t. The condition variable mechanism allows threads to suspend execution and relinquish the processor until some condition is true. A condition variable must always be associated with a mutex to avoid a race condition created by one thread preparing to wait and another thread which may signal the condition before the first thread actually waits on it resulting in a deadlock.