--- Quote Start ---
I want the memory bank to be implemented as a bank of parallel BRAMs and not replicated.
--- Quote End ---
Can you clarify what you mean by a "bank of parallel BRAMs"? Since each BRAM on the FPGA only has two ports, if you have a large local buffer implemented on BRAMs with dynamic access, the compiler will have to replicate the whole buffer enough times to be able to satisfy all accesses to and from the buffer in parallel. Also it would help if you archive the report folder and post it here so that we can exactly see how many times and why the buffer is replicated.