@ vernmid
Thanks for the reply. As far as I know, the latching does take place on the rising edge. This is the reason that I am using a 16kHz clock for the communication while the process is triggered by a 32kHz clock - in this case I can change/sample the data line in the middle of the bit, avoiding race (hopefully). During the "write" sequence, I can see both the bits that I transmit and the ACKs from the device with proper timing.
The problem is that when I try to read data from the device, it sends something different then defined in its spec, for example the zeroes that I mentioned earlier, which makes me think that I've done something wrong with the read process.