Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

AES-128 Sbox - Rom Issue

Hello,

I am a student and for my final year project am carrying out an investigation into the hardware implementation of AES-128 in an Altera FPGA - using SystemVerilog.

In the AES algorithm there are 2 mandatory look up tables known as the SBox and Inverse-SBox which both hold 256 x 8bit values.

Currently I am using a 2 separate array's of bytes to store these values, however I want to look at the possibility of using some of the on-fpga memory bits - probably in the form of a ROM however I have very limited knowledge of this and was looking for a push in the right direction.

My concern is that ROM's have a clock cycle delay, which will obviously slow down my overall design, however for my project it will be a good discussion. Is there a way to implement a ROM in the memory bits without this clock cycle delay? Some asynchronous ROM?

Any replies would be greatly welcome :)

8 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You can refer Altera's cookbook. I dont think you can infer asynchronous ROM. I don't understand the meaning of "slow design". If you mean latency, proper pipelining will decrease the overall latency rather than increasing it.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The M*K blocks in Altera's FPGAs can only implement synchronous RAM/ROMs. Ie, they have the 1 cycle delay.

    Asynchronous ROMs need to be implemented in LUTs.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thank you both for your replies - very much appreciated.

    My main issue with using a Synchronous ROM is as follows...

    My design uses a FSM, having 13 states - and on each clock cycle the state is increased (generally speaking, in each state, 1 round of encryption/decryption is performed). When using an array, thus the LUT's the instruction...

    for (shortint c = 0; c < 4; c++) begin

    State[r][c] = Sbox[State[r][c] >> 4][State[r][c] & 8'h0f];

    end

    Would carry out all 4 substitutions within a single clock cycle, however by using a ROM that piece itself would take 4 clock cycles.

    How would I synchronise my current FSM using a ROM - would I have to increment states every X clock cycles instead of every 1?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The solution is to simply use 4 synchronous ROMs, one for each "c". Or actually, 2 since each M*K block has two independent read ports.

    So, AFAIK as I can see, you can use M*K blocks for your problem, without performance penalty.

    To use them, you basically two options.

    One, you can use a ROM function such as LPM_ROM or ALT_ROM.

    LPM_ROM is portable, ALT_ROM has more features, such as 2 ports.

    It will require some changes to your code and you need to write a .HEX/.MIF file for the ROM's contents.

    The other solution, which I prefer, is to infer the ROM from your code.

    This will require you to change your code to follow a ROM template Quartus can recognize, which may involve a bit of trial and error.

    Take a look into the HDL coding guidelines to see which templates are supported and start from a simple case.

    Also, Quartus may also decide that, despite your efforts, your ROM is best implemented in LUTs.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Great reply- thanks!

    However, even if I was to use 4 x ROM (or 2 x Dual Port), I don't understand how there can be no performance penalty? As you still need to wait a full clock cycle to get a result?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You need to be able to generate the ROM's read address in the clock cycle before you need the data.

    I was looking at your code snipped and it looked possible. But I may be wrong here.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Why don't you use a single ROM and change states of your state machine after 4 cycles