One last thing about the wait for 'time' function is that it doesn't fit into the FPGA view. In generally, you want everything moving as fast as possible. So you usually want a decode value available on the next clock edge, for every clock edge(I'm ignoring multicycle cases). So expect it to always be there, decoded on the next edge. Time shouldn't play much into your coding at first(there often are higher level time issues, like I need the data through the chip in 1us, and I have a latency of 40 clocks, and the clock is running at 100MHz, so... but that's a much higher analysis and often not necessary.
Whether to use data_internal versus data_out is purely a preference. I actually like the first case better since I don't like I having too many lines that don't do anything, as it can add clutter. The big benefit of what you're doing is that you can use data_internal elsewhere inside the code, while data_out can only go off chip since it's type OUT. (You could make it type BUFFER, which would solve that problem)
As for how to write up the decode, that's usually a function of how the decode is being written. Maybe it's a look-up table in a data sheet of another part, where it's written as 1's and 0's. Maybe it's the output of a Perl of C script to do a cosine table(which this clearly is not), which might do binary, but might do ASCII. Maybe it's a Matlab output. But with large/complex decodes, there's usually a 'source' that prescribes the syntax, and with HDL you usually have enough flexibility to do whatever that source outputs.