Connecting the CPU master to a pipeline bridge and then connecting the bridge to the slaves might help. SOPC Builder and Qsys use different interconnect structures so there will be some system architectures that will achieve a higher Fmax in SOPC Builder but in general Qsys will result in faster systems. With 97% memory utilization that would be the first spot I would look for potential optimizations since that is a fairly heavy utilization of the FPGA RAM blocks which will hinder the fitter. Does your SOPC Builder design also result in 97% memory utilization?