lpm mux was written a long time ago, and I would be surprised how optimized it is. It will nicely insert pipelines, but I would be surprised if it identifies the "optimal" mux logic that can be put into a single LUT and then adds them from there. I'm guessing it does a good job but 500Mhz may be tough. As josyb said though, throw it down and see what happens.
Is it a 64:1 mux, or 64-bit data that is being muxed? How many channels? If the latter, your problem will probably be the select lines, which fan-out all over the place. I would manually replicate them in your code(or put a max fanout assignment on them) to try and get better control. (Actually, I wouldn't do anything besides writing it in HDL and seeing what fails, and then recoding from there...)