--- Quote Start ---
Doesn't the "increase the latency of the algorithm as a whole" part mean that the results come out slower? And doesn't that defeat the purpose of having faster calculations?
--- Quote End ---
Usually, an algorithm has a pipeline delay so that you get a delay such that:
Input A --- time delay n ---> Output A
But, assuming its a pipeline, 1 clock after Input A, you can input B into the algorithm, and 1 clock after output A, you get output B. Here is a quick diagram that has time increasing left to right:
Ip A ------- time n ------> Ouput A
Ip B ------- time n ------> Output B
Ip C ------- time n ------> Output C
etc.
So at this rate, you get 1 output per clock cycle, so while initially you have to wait a little longer for the output to arrive, once it does arrive you are getting 1 output per clock. If you clock it faster, you get a faster output rate.