Given cosine input range from 0 ~ 4095 representing 1 ~ -1
then use one of following options:
1) use full LUT, simple, needs 4096 * bitwidth as you know. Use cosine input as direct address.
2) use half LUT plus logic. since upper half is anti-symmetric with lower half. e.g.
if address > 2047
address = 4095 - address
data = pi - data
end
but pi need be scaled correctly.
3)use smaller LUT e.g. 256 locations then add logic to interpolate linearly values in between. practical and good approximation and intuitive.
4) explore possibilty of using cordic approach