CRC16 can be reduced to the following C:
uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
uint32_t t = crc ^ (byte_val & 0xff);
t = (t ^ t << 4) & 0xff;
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}
Which can trivially be converted to VHDL:
t1 <= crc_in(7 downto 0) xor data(7 downto 0);
t2 <= t1 xor t1(3 downto 0) & B"0000";
crc_out <= X"0000" & (X"00" & crc_in(15 downto 8)) xor (t2 & X"00")
xor (B"00000" & t2 & B"000") xor (X"000" & t2(7 downto 4));
Which is 4 levels of XOR.
As I said earlier, if you really need to generate the CRC of a 53 byte buffer in parallel every clock (I can't imaging why!) then you probably need to make use of the linearity of CRC calculations.
Basically, if you CRC random data, then change a single bit, the difference in the CRC is independant of the original data.
So, for a fixed length packet, you can easily determine which CRC bits each input bit changes and xor those values for every set bit onto the CRC for an all-zero pattern.