Raptor Codes - Part 2

Describes the fountain code and the math behind it.

Fountain Code Overview

We introduce the general use case for fountain codes, describe ideal abstract properties of a fountain code, describe its application to the scenarios described in the previous part.

Suppose we have a block of data, hereafter called a source block, that is to be reliably transmitted over a packet network.
The source block is typically partitioned into equal sized portions of data, hereafter called source symbols, that are typically sized to fit into a packet.
In the following, we let $k$ be the number of source symbols in the source block.

An effective approach to reliably transmitting the source block is to use an encoder at senders to generate encoded symbols from the source symbols of the source block and then to use decoders at receivers to decode the source symbols of the source block from received encoded symbols, where typically each encoded symbol is the same size as a source symbol.
The basic idea is for senders to send encoded symbols in packets, and then a receiver can use the encoded symbols received in packets to try and decode the original source block even if some of the packets are not received.
There are a variety of reasons a receiver may not receive some of the packets:
- Transmission over a wireless network that intermittently experiences interference or noise, causing unrecoverable packet errors that are discarded at the receiver
- Packet losses due to intermittent congestion that causes packet buffer overflows in routers
- The receiver being only intermittently subscribed to the sessions in which packets are transmitted
Packets that arrive too late at the receiver to be useful when there are time constraints on consumption of the data in the source block

Fountain Codes in particular are especially effective codes that can be used to provide reliable transmission.
The ideal abstract properties of a fountain code are as follows:
- A sender should be able to use a fountain encoder to generate as many encoded symbols required from a source block.
- A receiver that receives any subset of $k$ encoded symbols should be able to use fountain decoder to decode an exact copy of the original source block, independent of which subset of the generated encoded symbols is received and independent of whether the subset was generated by one sender or generated by morer than one sender from the original block of source data.
- The computation time for encoding and decoding should be linear, i.e., the time to generate each encoded symbol should be linearly proportional to its size, and similarly the time to decode an original source block from encoded symbols should be linearly proportional to the size of the source block.

These properties bring to mind a "fountain": When filling a bucket from a fountain water, which particular drops fill the bucket does not matter, only the amount of water in the bucket matters.
Similarly, with a fountain c ode, which particular encoded symbols are received does not matter, only the number of encoded symbols received matters.

In the point-to-point scenario, the sender can generate encoded symbols using a fountain encoder from the source block, and place the encoded symbols into packets, which are transmitted via the UDP protocol, for example.
In a real-time application, the packets can be sent at any rate that is below the rate at which the encoded symbols can be generated by the fountain encoder.
Provided that this rate is very high, there will essentially be no limit on the transmission speed.
Reliability of this transmission method is provided by the fountain property: as soon as the receiver collects $k$ encoded symbols, it can decode the source symbols of the original source block.
As $k$ encoded syymbols are the absolute bare minimum the receiver needs to collect to be able to decode the $k$ source symbols, the transmission is optimal from an information point of view.

In the case of point-to-multipoint transmission, the sender generates encoded symbols and places them into packets and transmits the packets via, for example, broadcast or multicast.
The fundamental properties of the fountain code guarantee that each receiver is capable of decoding the original data from reception of the minimal amount of data possible.
Thus, one sender is capable of efficiently and reliably delivering to a potentially limitless number of receivers.

In case of multipoint-to-point transmission, the various senders use fountain encoders applied to the common copy of the source block they each possess.
The receiver collects encoded symbols from the various senders, by the properties of the fountain code, from the point of view of the receiver, the mix of senders from which it receives encoded symbols does not matter.
As soon as the receiver has collected $k$ encoded symbols from the combined set of encoded symbols from the various senders, the original source block can be decoded.

Now that we know that fountain codes provide an elegant solution to various reliable transmission problem, we need to understand how to construct them.
We need to understand how to construct them.
We now outline an approach that eventually leads to realizing almost ideal fountain codes.
For a given vector $(x_1, \ldots ,x_k)$ of source symbols, a fountain encoder produces potentially limitless stream of encoded symbols $y_1, y_2, \ldots$
Here a symbol refers to a bit or a sequence of bits.
In many applications, symbols are of the same size as the payloadj of the transmitted packets, though this is not necessarily the case.
In general, the size of the symbols is often dictated by the underlying application and requirements.

The fountain codes that we initially describe operate on 1-bit symbols.
Note that codes for larger symbols can be obtained using simple parallel concatenation, i.e., to generate a code that operates on $t-\text{bit}$ symbols simply perform the same operations as would be performed on 1-bit symbols to each of the $t$ symbols in parallel.

These fountain codes are governed by a probability distribution $\mathcal{D}$ on the vector space $\mathbb{F}_2^k$ .
The encoding procedure for generating encoded symbols $y_j$ is as follows:
- Sample $\mathcal{D}$ to obtain a vector $(a_{1}, \ldots, a_{k}) \in \mathbb{F}_2^k$
- Compute $y_j = \sum_{i} a_i x_i$

The samplings of the fountain encoder are independent from encoded symbol to encoded symbol; this is extremely important as it induces a uniformity property on the encoded symbols generated and ensures that the code has the fountain properties.
Note that when the encoded symbols are placed into packets for transmission, typically an identifier is also placed in the header of each packet, called ESI (encoded symbol identifier), that uniquely identifies the encoded symbols contained in the packet.
The ESI is used by the decoder to determine the vector $(a_{1}, \ldots, a_{k})$ corresponding to each encoded symbol received in the packet.