Moving to the software section since this doesn't have anything to do with C2H....
Also if you want more read speed and code or compiler setting changes don't give you enough speed I recommend using a DMA engine to place the samples directly into memory and then the CPU can read the data directly from memory. You could have an interrupt fire when 'x' number of bytes have been locally stored or just poll the DMA engine to figure out how many samples have been buffered.
A simplier but less efficient hardware method would be to buffer and post process the samples in a FIFO built into your ADC interface. First you would read a FIFO watermark that tells you how much data is buffered (or use an interrupt to signal a fill level) and once you know that just read the data as fast as possible. That data masking of 0xFFF could be done in hardware if you made the interface 16-bit wide. Also if you wanted to store a lot of samples into memory you can make the FIFO interface 32-bit wide and read two samples at a time.
So start with the suggestions DSL has since making the changes on the software side should be easier. If that doesn't give you enough speed changing the hardware to be more efficient might be your only choice afterwards.