Wow. That's a lot of logic. Check out:
http://www.altera.com/literature/manual/stx_cookbook.pdf?gsa_pos=1&wt.oss_r=1&wt.oss=cookbook or check out:
http://www.easics.com/webtools/crctool How many clock cycles do you need the result in? The fastest and smallest CRC is to just do 1 bit at a time, but that would take 64 clock cycles. If your system has no problem with that, it should be pretty timing. These algorithms roll up the output of 8 or 16 or whatever operations into one clock cycle.
By the way, CRC generation has probably been one of the most vexing things I've worked on. Even with a known polynominal, there are some other parameters like everything starts at 1, or you invert the input and/or the output. If you're off from those, then the output just looks like noise(CRC's are similar to random number generators). Just an FYI to have a good test in place. (Unless you're calculating the CRC to send and using the same algorithm to calculate the CRC on the other side to see if they match. In that case, as long as you're consistent it all works out. My problem involved creating a CRC to match the output of some other hardware that didn't fully describe how they calculated their CRC)