Misusing Simulations

Simulations are a wonderful tool for verification, and for debugging problems.  There is a fine line between using simulation for debugging, and misusing simulation as a design aid.  When simulation is done for debug the intent is to find things like typos or omissions or accidental deviations from your design. When it is used as a design aid, it is done to try to find a fix for a logical problem.

Of course it’s unrealistic to expect that sims will never be used as a tool to fix a logical problem.  They will.  The real issue is that simulations can give very misleading information and a false sense of security.  The mindset must be towards finding why the HDL doesn’t match the original design, or why the original design fails to solve the problem.  Too often, there is a focus on just getting the design to stop failing with the current test case.

The general flow goes like this — too little thought it put into detailing a design before the HDL is written.  The HDL is then written and some detail in the design is wrong — either due to a coding error, or design oversight.  Simulations are then used to diagnose the problem AND find a solution.  The solution fixes the symptoms but misses the original problem.  Later, the inputs differ from the test case and the design fails.  Now the design has code that doesn’t make sense and differs from the original concept.

Pipelining ends up causing a lot of problems.  This occurs when an operation that would normally be defined only in terms of its current inputs becomes defined in terms of multiple delays of the inputs. This happens when the inputs or control signals are not delayed correctly for each stage where they are used.

A very common example occurs when counters are used.  “Off by one” problems are very common.  This is shown below by code that has a clear intent:

-- Attempt to output a pulse every 256 inputs to mark block/packet
-- boundaries.  (assume all values reset to 0)
p_oops : process (Clk) is
begin
  if Clk'event and Clk = '1' begin
    if wr = '1' then
      cnt <= cnt + 1;
    end if;
    if cnt = x"FF" then
      pkt_done <= '1';
    else
      pkt_done <= '0';
    end if;
    EOF  <= pkt_done and wr;
    Dout <= Din;
  end if;
end process;
-- The code fails for the streaming case
-- wr     1  1  1  1  1
-- cnt   FD FE FF 00 01
-- done   0  0  0  1  0
-- EOF    0  0  0  0  1
-- din   FD FE FF 00 01
-- dout  FC FD FE FF 00

The above code is not correct.  Notice that EOF is late by “one cycle”.  Simulation will clearly show that this is the case.

-- Attempted fix
p_badfix : process (Clk) is
begin
  if Clk'event and Clk = '1' begin
    if wr = '1' then
      cnt <= cnt + 1;
    end if;
    if cnt = x"FE" then -- the incorrect fix, why is this done?
      pkt_done <= '1';
    else
      pkt_done <= '0';
    end if;
    EOF  <= pkt_done and wr;
    Dout <= Din;
  end if;
end process;
-- The code is fixed for the streaming case
-- wr     1  1  1  1  1
-- cnt   FD FE FF 00 01
-- done   0  0  1  0  0
-- EOF    0  0  0  1  0
-- din   FD FE FF 00 01
-- dout  FC FD FE FF 00

The above code is an example of an attempted fix.  Notice that it does fix the symptoms.  Simulation will now work correctly.  The code now works for at least the one use-case the author tested it for.  There isn’t any explanation for why count is compared to 254 though.  Someone reading the code in the future might wonder about that.  The issue with the code is that it assumed that wr would be high for a several cycles.

-- Other cadence (for above incorrect code)
-- wr     1  0  1  0  1  0  1
-- cnt   FD FE FE FF FF 00 00
-- done   0  0  1  1  0  0  0
-- EOF    0  0  0  1  0  0  0
-- din   FD xx FE xx FF xx 00
-- dout  xx FD xx FE xx FF xx

Notice that the “fix” didn’t work.  For this write cadence, the 255th input is marked as the end of packet.  There are other cases where nothing is marked as the end of packet.

The real issue is the use of pkt_done.  There are multiple ways to fix this, some make the code more complex, others make it less complex.  Trying to fix symptoms can result in the more complex solutions being implemented.  This is fine, but does make the code less readable.  One example fix is shown below:

-- There are many ways to fix the design.  This is one example.
p_fix : process (Clk) is
begin
  if Clk'event and Clk = '1' begin
    if wr = '1' then
      cnt <= cnt + 1;
    end if;
    EOF  <= pkt_done and wr;
    Dout <= Din;
  end if;
end process;
pkt_done <= '1' when (cnt = x"FF") else '0';

The point of this is that simulation results are used to verify logic.  Logic isn’t designed to get correct sims.  It is important to look at the original design and try to work out why it was wrong, and why the suggested fix would make sense.  Simply looking at sim waveforms and finding logic expressions to get the waveforms to do what you want isn’t a reliable way to “fix” logic.  Further, even when it does work, it often leads to more confusing code that becomes difficult to maintain or extend.

This entry was posted in FPGA, Fundamentals. Bookmark the permalink.

Comments are closed.