--- Quote Start ---
I told in the initial post that I implemented delay line by connecting rams I.e. o/p of one ram is feed to the second ram. Suppose you need an 8 channel for filter having 6 taps.. So you need 6 rams conmected as mentioned and the depth of each ram would be 8. For a particular channel, we place channel number on the address fields. In. This way our rams behave like delay line. For another channel, we can simpy change the address field on all rams.
Systolic arch implemented using DSPs only is for single channel, as far as I know.
--- Quote End ---
Any filter structure used for single channel can be extended to multiple channels.
Anyway, you haven't mentioned your structure (not delay line but computation structure and I assume it is direct form). In this case you can put as many registers on ram data as you like provided all data lines are equally delayed. This way your filter will see stages arriving all delayed by same latency.