So you want a variable shifter, that can do up to a 255 bit shift? This isn't one of those things that is trivial as far as resources. At the most basic level, a variable index would work, but you'll end up with 256 individual 255:1 muxes. It's big and slow.
For speed improvements, you can pipeline it, i.e. do stages of 4:1 muxes, i.e. shift by 128 and/or 64 bits and then register. Then shift by 32 and/or 16 bits and then register. Etc. This is faster but takes up resources. Another idea is to run it at a faster clock rate(or a slower data rate). This way you just need 128:1 muxes and run it twice.
Not sure if anyone else has a good idea. You might want to re-post(as many won't make it this far) that you are looking for the best way to do a variable shift(up to 255 bits) on a 256/512 bit word.