Updated information.
At the moment I found a workaround to bypass the problem in my application.
I slightly changed the code to avoid the double answer: upon the receipt of the command data from the client, the application now waits for data from the upper layer and transmit all data with a single send() call. As I expected, the delay disappeared, at the price of a somehow more inefficient code.
I need to test if this is good for me in any condition.
Infact the tcp stack is working now in a very inefficient synchronous way, for the Niche stack requires every single packet to be acknowledged, instead of sending data until the tcp window goes to zero.
A wide tcp window is useless: it behaves as if the stack zeroes the tcp window whenever it sends some data!!!
Cris