How about a simple HW acceleration demonstration.
Create some real-world-ish problem you can solve as a C++ class or thread on linux. Then, create the equivalent hardware and write a device driver that feeds the HW your inputs. For example, if you're calculating CRC you'd want your hardware to have its own master port that reads memory. Control registers would include a pointer of where to start calculating CRC, a byte-count register, control register, and then a result register. Your linux device driver would write the starting addr, byte count, and a 'go' command, poll or wait for the status to indicate completion, and then read out the result.
...When its all done, depending on the design trade-offs you made during your HW architecture, you'll have a system that likely runs orders of magnitude faster than the SW on the processor.... such is the power of FPGAs.
Although similar things to this have been done before you may get interest in this sort of thing at trade shows or academic conferences where the results can be published.