Reed Solomon Decoder, Overview

Recently, I had written a Reed-Solomon encoder module out of curiosity.  The next step was to write the decoder.  The resulting decoder was written to work with my encoder — allowing a valid input every cycle.  The project was very rewarding.  Each step of the decoding process is similar, but presents different challenges.  My first cut did not live up to my expectations, but with relatively minor changes I was able to cut both the clock period and area used in half.

Reed-Solomon Decoder, Block Diagram

Reed-Solomon Decoder, Block Diagram

The RS decoder that I decided to go with is a fairly classic design.  It works on “syndrome decoding” and has five steps:

  1. Calculate the syndromes.
  2. Use the syndromes to determine the “error locater polynomial.”
  3. Find the roots of the error locater polynomial.  The inverses of these roots give the locations of errors.
  4. Use the syndromes, roots, and error locater polynomial to determine the error magnitudes.
  5. Use the information about error location and magnitude to actually correct the errors.

Each of these is a somewhat complicated process, and I plan to write articles about each of the steps with more details on how they work and implementation details.

In the above block diagram, the previous locater polynomial, syndromes, and roots are registered at each block.  This allows full pipelining of the decoding — each block can work on a different codeword at any given time.  This allows the input codeword data to be valid on every clock cycle.

Normally, I do not like parallel paths in my block diagrams.  The more interconnected the block diagram is, the more points of failure exist in the connections of the blocks.  In this case, I didn’t see any natural way to distribute the buffer over the decoding stages.  The data could be passed through each stage anyways, but this wouldn’t have any meaning — the codewords passed through would not be related to the current codeword that stage is working on.

The first attempt at the decoder weight in at around 3000 slices on a Virtex-6 -1 part.  The timing results were heavily build dependent, but typically between 166MHz and 200MHz.  Some basic optimizations allowed me to trim this down to around 2000 slices, and a 250MHz clock rate.

I then integrated a new Galois Field multiply, which provided much better performance.  The final decoder weighed in at around 1650 slices on a Virtex-6 -1 part.  The resulting performance could be 333MHz, though to achieve this I had turned on some aggressive optimization.

The VHDL code for the decoder is available below:

This entry was posted in FPGA and tagged , . Bookmark the permalink.

Comments are closed.