Erasure Codes

Typhoon-like System:

The following abstract provides a superficial understanding of the algorithm. The algorithm is under study and developments will be documented soon.

1) Introduction

A system to distribute archival copies of file acrosss an arbitrary set of servers. Features are:


 * Distribution of copies of files to other servers in a network.


 * It only places a portion of the file on each server.
 * Before distributing, each file is encoded using an erasure code that will allow the Typhoon system to automatically recover the file in the event that one or more servers (upto one-half) become inpoerable.
 * The encoding and decoding algorithms incorporated in this system are called Tornado Codes (TC). TC are a special type of erasure codes. The basic idea of an erasure code is to partition data into blocks and augment the blocks with redundant information.
 * Tornado codes are based on probabilistic assumptions.

2) Encoding Algorithm


 * A file is partitioned into a set of equal size fragments called data nodes.


 * The algorithm also creates a set of "Check nodes" that are equal in size and population to the data nodes.
 * Using specially designed bipartite graphs (help required!!), each node is assigned two or more nodes to be its neighbours.
 * The contents of the check node is set to be the bit-wise XOR of the value of its neighbours.

3) Decoding Algorithm


 * 'Check nodes' created at the time of encryption to be used to restore their neighbours.


 * The check node can only recover if the contents of that & only one left-neighbour of that node is missing.
 * To restore the missing node, the contents of the check node is XORed with the contents of its left-neighbours, and the resulting value is assigned to the missing neighbour.
 * The success is guaranteed by the fact that during the decode process there will always be at least be one check node that is only missing to one neighbour.
 * Ideally, tornado nodes should be trasferred in a random order over the network to minimize the effect of certain types of network data loss behaviour.

4) Components

i) Logical Components of a Typhoon-like System:


 * Replication Mechanism to deqal with encoding and decoding of a file. Communicates with the filestores to distribute (gather) prieces of a file.


 * Filestore to store, or transmit, fragments of a file.
 * A Naming & Location Service (NLS) is to be provided to track which pool should be used to store or retrieve a particular file.

Client machine generates read/write requests. One or more sets of servers (called 'data pools'). Each server will be running a Replication mechanism and a Filestore.

ii) Other features which may be included:


 * A caching server for increasing the speed at which the client request are service.


 * For file update a versioning scheme where the NLS assigns a unique internal name to each version of a file.
 * Multithreaded systems to obtain reasonable levels of optimizations. To simultaneously receive and decode data, and thus enabling to cease the retrieval process once the entire file has been recovered.
 * A server may ignore the requests to perform load balancing.

5) Advantages over Reed-Solomon:


 * Reed-Solomon family of erasure perform quadratic computation time which makes practical implementation limited to small files only. However, Tornado codes give up some efficiency in exchange for linear time encoding and decoding.


 * Software-based implementations of Tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than Reed-Solomon erasure code.
 * Reed-Solomon based system requires an average of 1877 secondds to gather & decode a 3MB file. In contrast, a multithreaded Typhoon system based on Tornado takes only 1.5 seconds to retrieve 3MB file.
 * Tornado codes require a larger amount of data, compared to Reed Solomon Code. The net result is a slight increase in network traffic for a drastic drop in computational requirements.

(*More benefits in stock)