TimeQuest does not have Tsu/Th/Tco, and you should always do external delays. From the original post, your concept of source and network latency are exactly the way I think of them. (Basically source is something TimeQuest can't know, while network is what it does know since it's all in the FPGA).
For the -reference_pin, I never use that(and if I did, it makes more sense on set_output_delay to something source synchronous). The reason I never use it is that it's short-hand for putting a create_generated_clock on an output port and then referencing that clock. I personally think this longer method makes more sense, and it gives the user more control, such as being able to name the clock going off chip.
Where it would be used with a set_input_delay is if you send a clock off chip and have a roundtrip delay(think of a read from external flash) back to your main clock delay. But again, put a generated clock on that and use that for your external clock. For the create_generated_clock part you quoted, I think it's talking about the fact that if you put a master clock in the middle of your design, then the latency starts there. For example, if you put a create_clock assignment to the output of a clock mux, then your latency will be calculated from that point, as if the clock magically appeared at that mux output, rather than tracing back to the source input ports. Of course, I don't think you should be putting a create_clock on the output of a mux, and instead should be doing a create_generated_clock assignment. (General rule of thumb is to use create_clock for input clock ports and external clocks, and create_generated_clock for anything internal to the FPGA or clocks going off chip).
Be careful about reading too much into the documentation. Not that it shouldn't be all correct, but you can get yourself caught up into mental knots. If you want to use the constraint, add it into your .sdc, do a report_timing, and then see how TimeQuest uses it. The weird corner cases usually don't matter, and the way it's used is pretty straightforward. (For as complicated as TQ can seem, at a basic level it takes the numbers you give it and either adds it or subtracts it in its slack calculations. I find looking at real examples are so much easier than talking about theoreticals.)
And speaking of corner cases, the one gotcha I've seen with set_clock_latency is if you do two constraints, an -early and a -fall, on the same clock into the FPGA. For example, if you're trying to mimic some variance and do a set_clock_latency -early 3.0 to fpga_clk, and also do a set_clock_latency -late 3.3 to fpga_clk. The problem is that internal paths will use both of these numbers, i.e. for a setup analysis, it will use 3.3ns as the source latency to your launching registers(I'm not saying source register to avoid confusion), and it will use 3.0ns as the source latency to your destination register. So if your clock period is 10ns, you've essentially cut your setup requirement by 300ps. Hold requirements will also get 300ps more strict. Bottom line is that you only want to use one number for source latency into the FPGA. (For external clocks, it's not a problem.)
Finally, I think set_clock_latency is pretty cool, but I don't see anyone using it. The more common practice is to just roll the number up into the set_input/output_delay constraints.