Reed-Solomon Encoder

This article describes how to make a performance optimized Reed-Solomon encoder.  The actual encoder is for an full-length, narrow-sense RS code.  In this case, the code generated was an code that could allow at least 8 bytes to be corrected, depending on the decoder.  The encoding logic is quite small, and very fast.  An updated structure can be found here.

The generation of the parity information for a RS encoder is done simply by polynomial division.  This is similar to the CRC calculations, except it is done in a larger field.  While the CRC calculates the division in GF2[x] — polynomials with coefficients in GF2 — the RS encoder performs the division on GF256[x] (for 8bit symbols).  This means the coefficients aren’t just trivial XORs.  Luckily, the resulting logic does reduce down to just a few XORs.

The articles on Galois Field LFSRs, Galois Field Math, and Multiple Shifts of Galois Field LFSRs should be consulted first, as they provide the context for what follows.

The polynomial division is just like in other fields, and can be done with the normal long-division method.  The RS encoder conceptually works as follows:

  1. Generate a polynomial from the message.  c(x) = m(x) \cdot x^k.  Where “k” is the number of parity bytes, and m(x) is the data to be transmitted.  The bytes of m(x) will map to GF256 elements.
  2. A RS generator polynomial, g(x), will be provided.  Perform polynomial division to find the remainder of c(x) / g(x).  The remainder provides the parity bytes directly.

The generator is determined by multiplying (x + \alpha^b) for b = 1, 2, … k.  This can be done using a program, or by consulting a table.  One property that this format has is that the highest order coefficient will be 1 — thus the generator is known to be monic.  This property will become useful for the implementation.

Here is a quick example of a short long division method.  The values are left a bit abstract.  The polynomials were arbitrary:

(\alpha^3 x^3 + \alpha^{20} x^2 + 0 x + \alpha^{190}) / (x^2 + \alpha^{10} x + \alpha^{100})

For GF256, each element is its own additive inverse.  Thus, to cancel out the \alpha^3 leading term, we multiply the divisor by \alpha^3 — this works because the divisor is monic.  This gives:

(\alpha^3 x^3 + \alpha^{20} x^2 + 0 x + \alpha^{190}) - (\alpha^3 x^3 + \alpha^{13} x^2 + \alpha^{103} x) = (\alpha^{20}-\alpha^{13}) x^2 + \alpha^{103} x + \alpha^{190}

When resolved, (\alpha^{20}-\alpha^{13}) will have a value based on how GF256 was generated.  For now, lets assume this value resolves to \alpha^r.  The next step is to cancel out this leading coefficient of \alpha^r.  Thus the monic divisor is multiplied by \alpha^r

(\alpha^r x^2 + \alpha^{103} x + \alpha^{190}) - (\alpha^r x^2 + \alpha^{10+r} x + \alpha^{100+r}) = (\alpha^{103} - \alpha^{10+r}) + (\alpha^{190} - \alpha^{100+r})

Again, these will be resolvable based on how GF256 was generated.  The above result will be the remainder of this division.

Reed-Solomon Encoder

Reed-Solomon Encoder

The interesting thing to realize is that in each case, the divisor was multiplied by the MSB of the remaining quotient.  Or, put another way, the MSB of the remaining quotient is multiplied by each value of the constant-valued divisor.  Thus an implementation can use a Galois Field “multiply by constant”.  This ends up being a multi-bit shift of the MSB, as opposed to a more intensive general GF multiply.  The remaining operation is the subtraction of this product with the appropriate value of the remaining quotient. This is simply another XOR.

Thus, the RS polynomial division can be performed with a minimum amount of XOR based logic.  FPGAs are very good with XOR based logic, as both the LUT’s and the carry chains can perform XORs, allowing for very wide XOR operations to operate very fast, and in a small area.  The multi-shift operation can be easily generated in a script.

The remaining part of the design is a fancy shift register that can shift values in, perform the logic operation, and shift parity information out.  Because my goal was to allow a valid output on each cycle, I decided to allow each stage to have its own logic for shifting in an input, performing the multiply-add, or retaining its value.  This allows the user to start loading data into the register while the parity data is being shifted out.  A second delay line was used for the message data path.

For the 16 byte parity case, the RS encoder occupies only 81 slices.  A quick post-PAR timing analysis shows the worst path is due to routing, but still is able to meet the 500MHz clock rate on a Virtex-6.  Next I will have to build a RS decoder.  That’s a substantially more difficult project though.

Update:  The VHDL code for the above encoder is now available:

This entry was posted in FPGA, Math and tagged , . Bookmark the permalink.

Comments are closed.