say you have a design with a pipeline depth of 10. it runs at 100 MHz. you put your data in at time=0, and at time = 10*(1/100,000,000) seconds, you receive your output result. in this case, a new result will be calculated every 1/100,000,000 seconds.
in another version of the design, you only have a pipeline depth of 1. it only runs at 10 MHz. you put your data in at time=0, and at time = 1*(1/10,000,000) seconds, you receive your result. in this case a new result will be calculated every 1/10,000,000 seconds.
the 100 MHz design has much better throughput and the same "real-world" time latency (but 10 actual clock cycles) as the 10 MHz design at the expense of 10x the register usage.