The bb_I difference from bb_Q may be due to your nco phase. For now assume it is ok and focus on cutting off the f1+f2 term in each branch.
The wanted signal will move close to dc and depending on your signal bandwidth. you need to cutoff clear of your signal bandwidth but kill the f1+f2 copy.
So far you have downconverted your signal to dc but now and since signal centre may shift either side of dc due to uncertainties of oscilillators bewteen Tx and Rx you need to force it to dc all the time. The error detector is just a multiplier as you have found out. apply IIR LPF (integrator) to the error as it will be very noisy. The gain of IIR need to be manually adjusted until error shows dampened oscillations heading towards zero with some jitter.
For testing you will need to move your RF centre either side and see that the loop does not lose lock otherwise the radio user will have to keep adjusting the tuning knob and lose interest in music.
If your loop fails to lock try adding the proportional term to the output of IIR through a scaler that can be adjusted.