i don't fully understand what your algorithm looks like
i would use a dual port RAM: 1 port writes the audio_in sequentially to each address of the RAM, then loops back to the top. the 2nd port reads the audio_out from the RAM, but the address pointer can jump based on the modulation signal. when the modulation signal is large, your read pointer may go all the way back to the address right "in front of" the write pointer. when your modulation signal is small your read pointer may go right "behind" the write pointer. this is the most basic delay structure
since your read pointer is jumping around, you can get discontinuities in the audio_out waveform which sound pretty bad. i'm not sure if this is what you're hearing. you'll want to think about interpolation between multiple samples so you have a smaller discontinuity. search terms would be interpolated delay
you will also want to verify that your delay buffer is working in the first place. try mixing 50/50 input signal and delayed signal with a fixed delay of maximum length (read pointer is 1 address ahead of write pointer)