Hi all,
I managed to implement the protocol. You were right, my problem was with the timing. I did a little trick in order to implement it with only 2 clock phases per bit.
I divided the main clock by 2 using the
falling edge as a trigger. I used the resulting clock for the SCL line.
In another process I've implemented the communication on the SDA line while it used the main clock divided by 2 with its
rising edge as a trigger. Thus, all of the protocol's timing requirements were met. It came out a little "ugly" in some places, but it works well.
Thanks for all the help.