You can't really see which bit took how long because that's not how the fitter works. It looks at the design as a whole and uses random seeds along with timing driven guesswork to get the design to fit.
As designs get larger, especially if you have high clock rates, long combinational chains, or lots of stuff trying to pack into one small area, it takes longer to fit. This is in part due to congestion in resources, and partly due to trying to find a solution that can meet timing. As you pack more into a small space it takes more effort to find a way to pack everything in without running out of resources.
One thing you can do is to partition the design. By splitting up the design into smaller groups of related "stuff", you can give the fitter a helping hand by showing it what bits are intended to be closely related to each other. This helps it optimise how it is trying to fit the design. Additionally you can then use LogicLock regions to place sections of the design is parts of the FPGA which can also help it identify what goes where.
Based on the fact that you are talking about adding components to a NIOS processor, that suggests that you are building the system in Qsys. If this is the case and you are adding lots of components to the data (or instruction) bus of the NIOS processor, you basically increase the amount of behind the scenes logic that must be added. Qsys adds a lot of Avalon-MM fabric (glue logic) for address decoding, bus arbitration, and other mapping logic. This primarily ends up being a massive cloud of combinational logic with lots of stuff trying to pack together as close to the NIOS processor as possible. This is pretty much a perfect storm for increasing fitting times.
You can reduce this issue somewhat if speed in your system allows by adding Avalon-MM pipeline bridges into the design to split some of the peripherals off into smaller buses. These pipeline stages add some latency to access, but they also break up the glue logic up by adding extra register stages between them. By reducing the length of combinational paths and adding pipelines, you allow the fitter to move the logic further away from the NIOS processor without adversely affecting timing. This in turn reduces compile times by reducing congestion.
Of course you could also turn off some fitter optimisations to further reduce compile time. However doing this usually doesn't achieve the desired outcome and it generally increases the likelihood of timing issues.