Sorry, I didn't see that you also included std_logic_arith in your first code. You should avoid it, especially when using both signed and unsigned signals in the same design.
Now that vin is signed, you can replace this line:
elsif vin = 1 or vin = x"FFF" then
by this one:
elsif vin = 1 or vin = -1 then
And that way you won't have to modify your code if you want to work on sizes other than 12 bits in the future.
Could you define "is not correct" for us?
There is another problem that I see in your code. You are doing everything within one clock cycle, but as you don't initialize your counters, on each clock cycle they will be increased by the value that you wanted. This is probably not what you wanted.
You should add something to stop the process once the result has been calculated, or reinitialize the counters to start the calculation again. Anyway in your simulation with this code you should only look at the results after the first clock cycle.
It may be a better idea to read only one vector on each clock cycle instead of reading all 16. It will take a longer time to run, but will synthesize in a much smaller system.