OKay. The counter is naturally no problem, as it's output is a register. Coming into the cordic(U2), there is a path that is 40 levels of logic. I think the only thing that will fix this is pipelining, i.e. adding registers along this path, which will increase your latency. Going down the path, you go through several hierarchies, u0, u1, u2, u4 and u6, so probably the most logical thing to do is add registers between these hierarchies. (If this is just dataflow, it's pretty straightforward. If any of these blocks loopback into the datapath, it gets much more complicated as you need to balance pipelining so all data lines up.)
The reason this never showed up in an Fmax report is that when compiled by itself, it came from an input pin and so wasn't analyzed as part of Fmax(which is generally considered register to register, and part of the reason Fmax is a limited picture).
For higher speed designs, and 150MHz is getting up there, you need to consciously pipeline throughout your code.
Finally, and I almost fear bringing it up, if you go to Tools -> Advisors -> Timing Optimization Advisors, there are some suggestions to help you move the levers/settings in Quartus to get better results. Bottom line is I don't think any of them will help you get the improvements you need, and I worry about relying on these too much(I assume you're early in the design, so as you add more logic it becomes more and more difficult), but they could be useful AFTER you modify the code and get it much closer, if you need to tweak out some more. It depends on the critical path, though.