You certainly answered your problem, you need to compute value1 before computing value2 and so on (chained computations). Your adders/mult (let us call it computation engine) can be shared if faster than your data speed.
having 256 x 32 bit adders is quite large but possible. You can go for any TDM figure(Time division multiplexing = time sharing of resource). If your computation engine can be faster than data then you can use a TDM of 2 i.e. share every two data on same resource(128 adders needed) and so on, at the extreme a TDM os 256 needs one computation engine. so it all depends on your speed environment.