I don't have here the diagram you refer to. Anyway I think the first connection is correct: port s1 connected to TC instruction master and s2 to data master.
Connection to data master is mandatory to allow loading memory upon boot.
Then, usually only port s1 is used for fetching execution code.
Regarding the other question, TC memory and cache are independent devices: you can have either or both.
You can consider TCM very similar to cache, from the point of view of performance, since both rely on a dedicated data transfer channel which is not subject to delays due to bus arbitration. The difference is that cache has variable contents, changed automatically depending of code being executed; while TCM content is fixed but you can decide what it must be loaded into.
So usually TCM is convenient if you have a few functions or data frequently accessed, making them very very fast.