Control packets are placed in the stream before the actual video packet. In general you should not assume that there will only be one control packet as most of the VIP will propogate packets from upstream blocks. The last control packet preceding the video packet contains the information pertaining to the video packet.
As this is really a pretty trivial component, I'm going to give you the attached module. Sorry it's in Verilog, I know how fond Europeans are of "Very Hard Description Language." :) Basically the component just removes and packets from the stream that are not video packets. It also (*MAKE NOTE*) removes the leading zero from the video packet. So all you are left with is the raw video data. The startofpacket and endofpacket signals coincide with the first and last pixels of the frame respectively.
Jake