Monday, June 1, 2015

Socket Programming

Sockets allow communication between two different processes on the same or different machines. To be more precise, it's a way to talk to other computers using standard Unix file descriptors.

Types of sockets:
  • Internet Sockets (DARPA Internet addresses )
  • Unix Sockets (path names on a local node)
  • X.25 Sockets (CCITT X.25 addresses)

Types of Internet Sockets:
  • Stream Sockets (SOCK_STREAM)
  • Datagram Sockets (SOCK_DGRAM) - connectionless sockets
  • Raw Sockets
  • Sequenced Packet Sockets

Stream Sockets (SOCK_STREAM)
  • reliable two-way connected communication streams
  • Same order will be maintained on both sides
  • error-free
  • eg:telnet
  • Use TCP - (So data arrives sequentially and error-free.)

Datagram Sockets (SOCK_DGRAM) - connectionless sockets
  • if you send a datagram, it may arrive. It may arrive out of order.
  • If it arrives, the data within the packet will be error-free.
  • use IP for routing.
  • But not TCP. It is UDP
  • used either when a TCP stack is unavailable or when a few dropped packets here and there is not a problem
  • Ex: tftp (trivial file transfer protocol, a little brother to FTP), dhcpcd (a DHCP client), multiplayer games, streaming audio, video conferencing
  • Why would you use an unreliable underlying protocol?  speed
  • How to implement reliable SOCK_DGRAM applications:  tftp and similar programs have their own protocol on top of UDP. For example, the tftp protocol says that for each packet that gets sent, the recipient has to send back a packet that says, "I got it!" (an "ACK" packet.) If the sender of the original packet gets no reply in, say, five seconds, he'll re-transmit the packet until he finally gets an ACK.

Raw Sockets
  • These provide users access to the underlying communication protocols, which support socket abstractions.
  • normally datagram oriented, though their exact characteristics are dependent on the interface provided by the protocol.
  • provided mainly for those interested in developing new communication protocols, or for gaining access to some of the more cryptic facilities of an existing protocol

Sequenced Packet Sockets:
  • similar to a stream socket, with the exception that record boundaries are preserved

Hostname Resolution:
  • The process of finding out dotted IP address based on the given alphanumeric host name.
  • Done by Domain Name Systems (DNS)
  • The correspondence between host names and IP addresses is maintained in a file /ect/hosts

Layered Architecture:



A layered model more consistent with Unix might be:
  • Application Layer (telnet, ftp, etc.)
  • Host-to-Host Transport Layer (TCP, UDP)
  • Internet Layer (IP and routing)
  • Network Access Layer (Ethernet, wi-fi, or whatever)

  • All you have to do for stream sockets is send() the data out.
  • All you have to do for datagram sockets is encapsulate the packet in the method of your choosing and sendto() it out.
  • The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer




How to Make Client
The steps involved in establishing a socket on the client side are as follows:

  • Create a socket with the socket() system call.
  • Connect the socket to the address of the server using the connect() system call.
  • Send and receive data. There are a number of ways to do this, but the simplest way is to use the read() and write() system calls.


How to make a Server:
The steps involved in establishing a socket on the server side are as follows:

  • Create a socket with the socket() system call.
  • Bind the socket to an address using the bind() system call. For a server socket on the Internet, an address consists of a port number on the host machine.
  • Listen for connections with the listen() system call.
  • Accept a connection with the accept() system call. This call typically blocks the connection until a client connects with the server.
  • Send and receive data using the read() and write() system calls


Ports:
  • To resolve the problem of identifying a particular server process running on a host, both TCP and UDP have defined a group of well-known ports.
  • defined as an integer number between 0 and 65535. This is because all port numbers smaller than 1024 are considered well-known -
  • The port assignments to network services can be found in the file /etc/services.
  • Normally it is a practice to assign any port number more than 5000.

Network Byte Order
  • not all computers store the bytes that comprise a multibyte value in the same order.
    • Little Endian: In this scheme, low-order byte is stored on the starting address (A) and high-order byte is stored on the next address (A + 1).
    • Big Endian: In this scheme, high-order byte is stored on the starting address (A) and low-order byte is stored on the next address (A+1).
  • Network Byte Order:To allow machines with different byte order conventions communicate with each other, the Internet protocols specify a canonical byte order convention for data transmitted over the network


The select Function
  • The select function indicates which of the specified file descriptors is ready for reading, ready for writing, or has an error condition pending.
  • When an application calls recv or recvfrom, it is blocked until data arrives for that socket.
  • An application could be doing other useful processing while the incoming data stream is empty. Another situation is when an application receives data from multiple sockets.
  • Calling recv or recvfrom on a socket that has no data in its input queue prevents immediate reception of data from other sockets.
  • The select function call solves this problem by allowing the program to poll all the socket handles to see if they are available for non-blocking reading and writing operations.

Blocking vs Non Blocking Sockets

  • By default, TCP sockets are in "blocking" mode. Its possible to set a descriptor so that it is placed in "non-blocking" mode.

Blocking Mode:
  • When you call recv() to read from a stream, control isn't returned to your program until at least one byte of data is read from the remote site.
  • This process of waiting for data to appear is referred to as "blocking". This is same for connect(), write().

Non Blocking Mode:
  • When placed in non-blocking mode, you never wait for an operation to complete.
  • If you call "recv()" in non-blocking mode, it will return any data that the system has in it's read buffer for that socket.
  • But, it won't wait for that data.
  • If the read buffer is empty, the system will return from recv() immediately saying ``"Operation Would Block!"''
  • Non-blocking sockets can also be used in conjunction with the select() API. In fact, if you reach a point where you actually WANT to wait for data on a socket that was previously marked as "non-blocking", you could simulate a blocking recv() just by calling select() first, followed by recv().
  •  Programs that use non-blocking sockets typically use one of two methods when sending and receiving data.

  1. polling-=when the program periodically attempts to read or write data from the socket using a timer.
  2. asynchronous notification-the program is notified whenever a socket event takes place, and in turn can respond to that
  • When designing a high performance networking application with non-blocking socket I/O, the architect needs to decide which polling method to use to monitor the events generated by those sockets.

  1. Polling with select()
  2. Polling with poll()
  3. Polling with epoll()
  4. Polling with libevent



 References: 

Multi-threaded Programming

Process  
  • Program in execution.
  • It has
    • text section - program code
    • program counter - current activity
    • contents in the processor's registers -
    • stack - temporary data ( function parameters, local variables, return values)
    • data section - global variables
    • heap - memory which is dynamically allocated

  • Process states
    • new
    • running
    • waiting
    • Ready
    • Terminated

Thread
  • basic unit of CPU utilization.
  • It has
    • Thread id
    • program counter
    • register set
    • Stack
    • Signal mask
    • Priority
    • Return value

  • It shares with other threads belonging to the same process
    • code section
    • data section
    • other operating system resources (open files, signals)
    • Process instructions
    • Current workign directory
    • User and group id

Posix(pthreads) Threads

Why Pthreads
  • Light Weight - When compared to the cost of creating and managing a process, a thread can be created with much less operating system overhead. Managing threads requires fewer system resources than managing processes.
  • Efficient Communications/Data Exchange- on a multi-processor architecture is to achieve optimum performance.

Programs having the following characteristics may be well suited for pthreads:
  • Work that can be executed, or data that can be operated on, by multiple tasks simultaneously:
  • Block for potentially long I/O waits
  • Use many CPU cycles in some places but not others
  • Must respond to asynchronous events
  • Some work is more important than other work (priority interrupts)

Thread-safeness: - an application's ability to execute multiple threads simultaneously without "clobbering" shared data or creating "race" conditions. A race condition occurs when two or more threads can access shared data and they try to change it at the same time

Thread Limits -the maximum number of threads permitted, and the default thread stack size are two important limits to consider when designing your program.


Pthread API

The subroutines which comprise the Pthreads API can be informally grouped into four major groups:

  • Thread management: Routines that work directly on threads - creating, detaching, joining, etc. They also include functions to set/query thread attributes (joinable, scheduling etc.)
    • pthread_create - create a new thread
    • pthread_join - wait for termination of another thread
    • pthread_exit - terminate the calling thread
  • Mutexes: Routines that deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion". Mutex functions provide for creating, destroying, locking and unlocking mutexes. These are supplemented by mutex attribute functions that set or modify attributes associated with mutexes.
  • Condition variables: Routines that address communications between threads that share a mutex. Based upon programmer specified conditions. This group includes functions to create, destroy, wait and signal based upon specified variable values. Functions to set/query condition variable attributes are also included.
  • Synchronization: Routines that manage read/write locks and barriers.The threads library provides three synchronization mechanisms:
    • mutexes - Mutual exclusion lock: Block access to variables by other threads. This enforces exclusive access by a thread to a variable or set of variables.
    • joins - Make a thread wait till others are complete (terminated).
    • condition variables - data type pthread_cond_t. The condition variable mechanism allows threads to suspend execution and relinquish the processor until some condition is true. A condition variable must always be associated with a mutex to avoid a race condition created by one thread preparing to wait and another thread which may signal the condition before the first thread actually waits on it resulting in a deadlock.