--- Quote Start ---
Hhhhhhhh What do you want Mr Kazem!!! Matlab is meant to prove math through math... not like ModelSim prove math through hardware, even though it's not giving correct results. Problem is either in the idea, or in our predictions. Idea is true since it's not that hard to write mathematical equations in Matlab.
--- Quote End ---
I don't quite understand your thoughts. Unless proved otherwise I believe strongly now that the free simplified documented design for BPSK carrier tracking does not work efficiently because the error of sin*cos is vague except at very start cycle of sine wave. This start is marred with filter delay and vanishes quickly.
I suggets you use the costas loop for qpsk/qam which I did. This loop's error is based on slicer output.
You will need to see diagrams of that loop which is based on complex multiplication with sliced values of I/Q together with signal I/Q to produce error.
In your case you have only two sliced states either -1 or +1. So the concept of I/Q sliced data doesn't apply but may be you can use it but ignore the Q channel at the end.
For QPSK it was particulary efficient. So I guess for BPSK as well. Try your skill again with modelling it that way.
To slice the symbols you need to have threshold after matched rrc then feedback the results into error detector.