Hi baldur,
If possible, I really, really **strongly** suggest you to use a Cyclone 3 device (EP3C5 or EP3C10 for example). There is a passive parallel configuration scheme which does not exists in Cyclone devices. FPGA configuration data are supplied on a byte wide scheme , and can clocked at least up to 100 MHz... Very simple to implement, yet very efficient.
For example, uncompressed bitstream for an EP3C10 is about 3.3 Mbits (~412 Kbytes). Let's say your ARM7 clock data bytes at 10 MHz (optimistic ?) , you can expect a configuration time of about 41 ms. The bad news is that in passive // mode, bitstream can not be compressed. But the good news is that configuration time is constant, so this is a guaranteed time.
I use passive // configuration scheme on a custom board with a EP3C10, but since I have no microcontroller on the board (I use a NIOS inside the FPGA) , I use a CPLD (from Xilinx :D ) which converts data bits from a SPI serial flash to 8 bits data, at an equivalent bit rate of 96 Mbits/s (12 Mbytes/s). I got configuration time of about 35 ms. Works very well.
Cyclone 3 are pure pleasure ... really low cost and low power and cool configuration modes.
Hope this help.