I've had some experience writing a processor-free Ethernet interface, so if you're interested in going that path I can direct you towards the protocol documentation that you'll need. Writing your own state machines for UDP, ICMP, and ARP takes about 3 weeks. UDP really is worlds simpler then TCP.
The huge benefit you get with running communications through a processor is that it's much easier to add in support for various protocols. I'm not familiar with InterNiche, but you may want to start trying to lower expectations about how versatile the system will be.
The InterNiche device I checked out supports all the protocols you'll need, but an important part of TCP is the ability to fragment and reassemble packets, even when the fragments come in out of order. It's not clear to me if the InterNiche will be able to do this in the FPGA or if it expects software to reassemble packets. If you're receiving small packets, this shouldn't matter.
The Design-Reuse document is very thorough, but the RARP they mention is now replaced with DHCP.