1- Isn't that the supposed situation? I mean, I am expected to provide a fixed frequency and the carrier synchronizer has the job of recovering the phase, isn't it?
2- Do I understand that the first design (sin*cos) tracks phase only, while the above design (using sign circuitry) tracks frequency?
3- Surprisingly, before you post I was trying to simulate this design using my original BPSK signal, and the above design failed to demodulate properly even though the loop locked. I eventually did as you mentioned, I used the sin*cos approach and eliminated the "Phase Accumulator" and used the PhErr signal to control NCO. It did demodulate and lock! I tried to shift the local NCO phase and tried to set fc at 0.050001 (TX's fc is 0.05) and it locked.