Thank you Kazuyasu and mschnell!
I've made some progress. There was a hardware issue with the SSRAM on our board as well as a Linux issue. With the hardware issue fixed, I was getting an EFAULT ("bad address") trying to send a buffer from SSRAM. It turns out the kernel checks that user pointers are within the program memory in _access_ok in include/asm-nios2/uaccess.h. Adding the SSRAM area to that check fixed the EFAULT problem, but I just went ahead and took that check out. Since I can access all of memory from userspace anyways, it's not really protecting me from anything and the error was a pain to track down.
-#define access_ok(type,addr,size) _access_ok((unsigned long)(addr),(size))
+#define access_ok(type,addr,size) 1
Putting the buffer in external SSRAM instead of DDR SDRAM got me from 14Mbps to 16Mbps. I then changed the linker script to add an .ssram section and make sure everything else goes in main memory (DDR). diff is attached in case anyone is interested. I then went through the trace of sending a message (17 functions from the system call to the driver for UDP!) and added
__attribute__ ((section (".ssram")))
to all the functions. For anyone else trying it who hasn't used attributes before either: that goes right before the function definition like so:
__attribute__ ((section (".ssram")))
static int atse_hard_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
linux-2.6.x/System.map verifies that they get linked at the right addresses (started on TCP and rx stuff here):
05000000 T irq_exit
05000088 T do_sync_write
050001ac T vfs_write
050002a8 T sys_write
05000340 t atse_hard_start_xmit
05000684 T sock_sendmsg
0500076c T sys_connect
0500080c T sys_sendto
05000918 T sys_send
05000938 T sys_socketcall
05000b20 T release_sock
05000c3c T dev_hard_start_xmit
05000f00 T dev_queue_xmit
050012ec T netif_receive_skb
0500163c t process_backlog
05001748 t net_rx_action
05001908 T neigh_resolve_output
05001c7c t neigh_timer_handler
05002160 T __qdisc_run
0500249c t dst_output
050024bc t ip_finish_output2
050027ac T ip_queue_xmit
05002be0 T ip_output
05002cb4 T ip_push_pending_frames
05003120 t __tcp_ack_snd_check
050031c0 t tcp_transmit_skb
050039a0 T tcp_connect
05003db8 T tcp_send_ack
05003ea8 T tcp_v4_connect
05004330 T tcp_v4_rcv
05004ae8 T udp_flush_pending_frames
05004b1c t udp_push_pending_frames
05004f90 T udp_sendmsg
050056b0 T arp_xmit
050056c8 T arp_send
05005724 t arp_solicit
050059e4 T inet_stream_connect
05005d44 T inet_sendmsg
05005dc0 t packet_sendmsg_spkt
05005ff8 t packet_sendmsg
(What do those t's and T's mean?)
This along with 8k caches got me up to 19Mbps sending 1KB UDP messages. Still dismal for a gigabit connection, but it's an improvement. Unfortunately tightly coupled/internal memory is likely not an option because FPGA resources are limited, but now that I've got this figured out I'll try that once I know if that memory is available.
There are still problems though. This configuration only works with kernel debugging enabled. There are three options there that are enabled by default (CONFIG_DETECT_SOFTLOCKUP, CONFIG_DETECT_HUNG_TASK, and CONFIG_SCHED_DEBUG) and if I disable any of them the system doesn't boot with the modified linkage.
Also I've double and triple checked the MAC and PHY registers and it seems everything is configured for gigabit, but as my timings in the original post showed, I still seem to only be getting 100Mbps out of the PHY.
I'd like to submit/bring up some of these changes to the -devel list but I'm not sure how to make it more general.