Overpipelining

There are a handful of popular coding styles for VHDL/Verilog.  The best examples of the two prevalent ones can be seen with state machines.  The academic books like to show everything as two processes — one combinatorial, one sequential.  A lot of people like to avoid this an just write one sequential process.  This has led to numerous problems, the most common being overpipelining.

The biggest reason against using two process is that it adds a significant number of signals into the design AND splits all of the code up across the file.  It is very nice when a logical structure can be defined and seen all on one screen.  Having two processes makes this difficult.  As a result, it is common to see code written in mostly clocked processes.

This leads to a phenomenon that I call “overpipelining”.  Here, overpipelining refers to adding register stages as a code convenience.  This differs from regular pipelinging where the registers are added for performance reasons.  While legitimate pipelining breaks logic up for timing reasons, overpipelining breaks logic up for typographical reasons.

Here is an example of code that is written in one process, and could easily be converted to use two processes:

-- original code example:
p_original : process (Clk) is
begin
  if Clk'event and Clk = '1' then
    if    (wcnt=3 and wr='1') and not (rcnt=3 and rd='1') then
      cnt <= cnt + 1;
    elsif (rcnt=3 and rd='1') and not (wcnt=3 and wr='1') then
      cnt <= cnt - 1;
    end if;
  end if;
end process;

Notice that there are common subexpressions that are fairly large (for the small text width on this webpage at least).  It would be great if these common subexpressions could have their own names:

-- overpipelined example
p_overpipe : process (Clk) is
begin
  if Clk'event and Clk = '1' then
    -- (valid in vhdl2008)
    wr_done_ev <= wr when (wcnt=3) else '0';
    rd_done_ev <= rd when (rcnt=3) else '0';
    if    wr_done_ev = '1' and rd_done_ev = '0' then
      cnt <= cnt + 1;
    elsif wr_done_ev = '0' and rd_done_ev = '1' then
      cnt <= cnt - 1;
    end if;
  end if;
end process;

Notice that the behavior is mostly the same.  The code is a bit more readable, as “(wcnt=3 and wr=’1′)” now has a more descriptive name.  The downside is that the logic is now pipelined — a change in ‘wr’ or ‘rd’ will result in cnt changing on the cycle AFTER the next cycle.  The original code had cnt changing on the next cycle.

VHDL offers two (other) solutions — use variables, or place the code outside of the process.

-- Using Variables Example
p_variables : process (Clk) is
  variable wr_done_ev : std_logic;
  variable rd_done_ev : std_logic;
begin
  if Clk'event and Clk = '1' then
    -- must be referenced only after this line,
    -- otherwise these will make registers.
    wr_done_ev := wr when (wcnt=3) else '0';
    rd_done_ev := rd when (rcnt=3) else '0';
    if    wr_done_ev = '1' and rd_done_ev = '0' then
      cnt <= cnt + 1;
    elsif wr_done_ev = '0' and rd_done_ev = '1' then
      cnt <= cnt - 1;
    end if;
  end if;
end process;

Now the code works the same as the original.  It has the common subexpressions, and it has the expected one cycle of latency.  But variables have several issues — the use of only blocking assignments means the assignments must occur before they are used in the processes.  Otherwise registers will be added.  This also means that subexpressions that  contain other subexpressions must have the blocking assignments in the correct order.  Overall, it trades one way for things to go wrong with another.

-- Using External Assignment Example
p_variables : process (Clk) is
begin
  if Clk'event and Clk = '1' then
    if    wr_done_ev = '1' and rd_done_ev = '0' then
      cnt <= cnt + 1;
    elsif wr_done_ev = '0' and rd_done_ev = '1' then
      cnt <= cnt - 1;
    end if;
  end if;
end process;
wr_done_ev <= wr when (wcnt=3) else '0';
rd_done_ev <= rd when (rcnt=3) else '0';

Above is the other solution — place the subexpressions below (or above) the process.  This works, and is fairly safe.  But it does place these declarations further away from where they will be used.  Overall, it’s not a pretty style.  Essentially, it has similar issues as just using two processes — to read any piece of code requires reading multiple sections of the file.

This makes it more likely that the overpipelined method will be used.  The overpipelining leads to issues with things like state machines.  For example, if “cnt” from the examples is used in a state that occurs after a ‘rd’ or ‘wr’.

-- in combinatorial process
case (csm) is
  -- states
  when WR_STATE =>
    wr  <= '1';
    next_csm <= CHECK_STATE_WAIT;
  when CHECK_STATE_WAIT =>
    -- this state is needed because wr_done_ev has just been set
    -- to '1'.  cnt will not be updated until next cycle.
    wr <= '0';
    next_csm <= CHECK_STATE;
  when CHECK_STATE =>
    wr <= '0';
    if (cnt = 5) then
      next_csm <= DONE_STATE;
    else
      next_csm <= LOOP_AGAIN_STATE;
    end if;
  -- other states
end case;

The value of “cnt” will not be ready if it is overpipelined.  This might lead to an unnecessary addition of a wait state to fix the problem.  It might also lead to more creative solutions.

This entry was posted in Verilog, VHDL. Bookmark the permalink.

Comments are closed.