Interesting. Do you have another FIFO that stores how wide each word is? (Specifically, if you write in words of widths 32, 6, 78, 133, then how do you know to read out 32 the first time, 6 on the next read, then 78 and then 133?
Anyway, you're limited by the memory as much as the FIFO. You can't dynamically change your write widths on the memory, so there's no FIFO that can take advantage of it.
The only thing I can think of is to have an "aggregator" block before the FIFO and the reverse after it, whereby you fill up 256 bits of however many words and then do a write. On the reverse side you read out 256 at a time and then however you know the widths, use that to strip out what you need. Of course this could theoretically add a lot of delay, i.e. if you're writing 256 words all of length 1, then it will require 256 writes before that data gets into the FIFO and you can access the first bit. You could add logic that looks at your aggregator when the FIFO is empty, but it gets more complicated.