Then you must focus on the altera_avalon_uart driver. For sure your fprintf sends the string immediately to the uart HAL driver but data is kept in the transmit buffer until the usleep completes, since this blocking functions prevents the HAL driver from sending data to the actual serial port hardware, as I described above.
What's your avalon_uart implementation? small or fast?
If it's the small one, you should have a uart poll call somewhere in a loop: you must place periodic calls to this in the sleep function, too.
If it's the fast implementation, you probably need to fix irq priorities.