--- Quote Start ---
If I continue down the same path using std_logic_vector, I understand that the product will be a Q62.2 value. How would I scale that value to be 32 bits? Would I use the upper 32-bits?
--- Quote End ---
It 'depends' :)
I'll explain why with a simplified example ...
If you have 3-bits, you can represent the signed values -4, -3, -2, -1, 0, 1, 2, 3. If you scale them by 4, you get the Q0.2 numbers -1.0, -0.75, -0.5, -0.25, 0, 0.25, 0.5. 0.75.
If you multiply two of these numbers together, the largest value you can get is -1.0 x -1.0 = +1.0, and the smallest is 0.25 x 0.25 = 0.0625, so your product needs to have a Q1.4 representation.
In general, Qm1.n1 x Qm2.n2 = Q(m1+m2+1).(n1+n2).
However, if you don't like asymmetry, or like wasting bits, you can use a slight variation on the numbering scheme where you eliminate the most negative value, i.e., your signed-symmetric Q0.2 numbers are -0.75, -0.5, -0.25, 0, 0.25, 0.5. 0.75 (any -1.0 values in an input data stream are replaced with -0.75 values - a quantization error no worse than the one that happens to positive values). With this numeric representation at the input to your multiplier, you never get the product 1.0, so you can eliminate that bit when you bit-slice to convert your Q1.4 product back into Q0.2 format (or whatever narrower bit-width you plan on using).
In that case, you'd keep the MSB (the sign bit), drop the next bit (the bit needed to represent +1.0), and then keep the remaining bits you want to keep, and eliminate the LSBs.
Depending on your application, you might actually want to round the result before discarding the LSBs ... but thats another discussion.
Cheers,
Dave