Hi Siraj,
I don't want to discourage you bu I think scaling your error till it is dead is no good.
It looks like qpsk/QAM carrier tracking that I did years ago was based on different error circuitry involving slicing as well so we better ignore its details.
I thought of your BPSK design and is practically new to me but it looks like it doesn't specifically apply to BPSK. It looks like a basic form of PLL.
You might have to do matlab modelling to prove the concept in a simpler way away from BPSK as follows:
1)make your input a clean tone say at 0.1 Fs
2) apply NCO at -0.1 and do the rest as usual
at the error detector I and Q should be dc (if all ok) and error should be zero (or settle to zero)
repeat above with NCO initial freq set to -0.11 and repeat
since we don't expect I and Q to be dc but rather at a negative offset we should have the loop push them back to dc
repeat above with nco at -0.09
I and Q would be at positive offset and the loop should push them back to dc.
the error should change orientation.
Thus the whole issue is finding how to decide if a complex frequency(I/Q) is positive,0 or negative.
The actual values of error is not that important but its sense on the NCO is
You need to be aware of some other issues in modelling your loop but can do that later such as effect of filter delay (group delay) and actual design latency between error and nco update.
But for now the crucial point is finding a way to have good error detection (don't worry about its scaling)
I believe you need a lot more work to get this loop right and hopefully tell me.