Depending on the size of your data and how much "data movement" you have in each step, your bottleneck could fall on PCI-E, DDR or computation of the FPGA itself. If you transfer one byte of data through PCI-E to DDR, read it once from DDR, and process it once on the FPGA, your bottleneck is going to be the PCI-E transfer since it has the least bandwidth. If you transfer once through PCI-E but read multiple times from DDR but process only once each time, then the bottleneck will fall on the DDR transfer. Finally, if you transfer once through PCI-E, read once from DDR but do a lot of compute on the FPGA for every byte you read, then you might finally be able to actually saturate the compute capabilities of the FPGA. In your case, since your data size is very small, unless you are processing the same data a couple million times, your bottleneck is going to be the PCI-E transfer.