You need a combination of hardware and software. What we did is to have a hardware counter that runs on a fast clock. We inspect the MII interface, sample the value of the counter at the beginning of the packet and place the timestamp value in a FIFO. The contents of the FIFO can be read back through an Avalon MM interface, so the software can read it back.
You also need to store a bit more information in the FIFO with each timestamp so that the software can know which timestamp comes from which packet, but I think you got the main idea.
You don't need to modify the packets contents because the timestamp that you need to place in the sync packet is the timestamp of the previous packet. The software has plenty of time to read the timestamp value and prepare the next sync packet to send, and you can use the regular UDP/IP stack to generate the packet.
And yes, reading a GMII interface is really easy. The packet starts when the data valid signal is asserted, and then you read the preamble, SFD (timestamping point), Ethernet header, IP header, UDP header, UDP data. Our VHDL module reads all the packet contents to detect if it is a UDP packet, on the expected port, and puts the timestamp in the FIFO only if it is a PTP sync packet, together with an ID number that can let the software know which packet the timestamp came from.