Hello Clark,
I think, using an asynchronous load isn't a good solution. You should try to do the load operation within the C'event respectively rising_edge(C) action. If your load signal is too long, you can use edge detection. At first step, I would try without an inverted clock (no action at falling_edge(C) ).
It is important to know, if S is synchronous to C (has defined setup and hold timing). Otherwise, the counter may sometimes load unexpected values. An asynchronous load signal should be synchronized to clock first. Basically this hasn't anything to do with HDL programming, the effects are present with discrete logic as well.
What's the best solution depends on load signal timing related to clock. I don't know exactly from your posting.
As a simple VHDL question: To connect an internal signal as output port, you can declare it as
buffer or use a wire signal as you did with Q.
Regards and Happy New Year!
Frank