I understand, that you referred to figure 10-7 in-system programming of serial configuration devices. It has nothing to do with JTAG but is using the dedicated AS programming interface. JTAG programming of AS is shown in figure 10–29 programming serial configuration devices in-system using the jtag interface. But both variants are possible. The latter requires an additional step (programming file conversion) but can utilize an existing JTAG interface, e. g. used for debug or boundary scan.
I also understand that you use a circuit according to figure 10–6
multi-device as configuration in which devices receive the same data with a single sram object file. From Altera publications, it's clear that AS devices must use 3.3V VCCIO for bank 1. This setting is also checked by Quartus software.
I'm not aware of any Altera specification regarding the necessary AS buffers, on the other hand many warnings have been issued not to use buffers in (single device) AS configuration. I would probably use a fast buffer with low input capacitance, e. g. 74AUP1G34. The suitable number of buffers depends on PCB topology, there should be no longer stubs. Also the Altera comments regarding source series termination should be considered.
Regarding the intended 2-layer design. I think, that's not completely impossible but very difficult, when I see the large percentage of connected IO pins. Apart from possible EMC and signal quality issues, I wonder if the additional effort in PCB routing will finally pay? Also the PCB probably could be somewhat smaller with better routing on a multilayer board. That's a large
cake sheet anyway.
Another important point is SSO (simultaneous switching outputs) noise. This would be a big issue with 128 outputs as such, but worse with 2-layer PCB
and large QFP240 case. It could be meaningful to take some precautionary measures as driving the clock differentially to the chip. However, I see a danger of possible design failure due to SSO noise.