Hello friends. I'm trying to make addition and subtraction of floating point. My guide is a book "Computer Arithmetic and Verilog HDL Fundamentals" by Cavanagh. Inside the module he use a code for aligning exponents as I show as follows. always @ (oper_1 or oper_2) begin exp_a = oper_1 [31:24]; exp_b = oper_2 [31:24]; fract_a = oper_1 [23:0]; fract_b = oper_2 [23:0]; // bias exponents exp_a_bias = exp_a + 8'b0111_1111; exp_b_bias = exp_b + 8'b0111_1111; // align fractions if (exp_a_bias < exp_b_bias) ctrl_align = exp_b_bias - exp_a_bias; while (ctrl_align) begin fract_a = fract_a >> 1; exp_a_bias = exp_a_bias + 1; ctrl_align = ctrl_align - 1; end if (exp_b_bias < exp_a_bias) ctrl_align = exp_a_bias - exp_b_bias; while (ctrl_align) begin fract_b = fract_b >> 1; exp_b_bias = exp_b_bias + 1; ctrl_align = ctrl_align - 1; end Quartus II give me the following error: Error (10119): Verilog HDL Loop Statement error at ADD_SUB_FLO.v(40): loop with non-constant loop condition must terminate within 250 iterations I searched and it seems that since Quartus can't be sure what will be the size of "ctrl_align" resultant it won't synthesize. Quartus site says I can edit the .qsf file, But I couldn't find in my file any set_global_assignment -name VERILOG_NON_CONSTANT_LOOP_LIMIT 300 Moreover, I'm somehow impressed that the book teach that step with a wrong code, not syntesizable, I also wish to actually fix the problem not to workaround, what can I do?

You realise there is a big difference between HDL and procedural languages right? Loops make little sense in hardware - it doesn't run as a loop, the synthesizer breaks it all up into hardware - so if you did a loop that added 1 four times, it would probably make a chain of four adders each adding 1. In your case, how does the synthesizer know when to stop? If ctrl_align was say 10, it would require 10 copies of the hardware. If it was 20, it would require 20 copies of the hardware. So how does it know how many to put?

--- Quote Start --- You realise there is a big difference between HDL and procedural languages right? Loops make little sense in hardware - it doesn't run as a loop, the synthesizer breaks it all up into hardware - so if you did a loop that added 1 four times, it would probably make a chain of four adders each adding 1. In your case, how does the synthesizer know when to stop? If ctrl_align was say 10, it would require 10 copies of the hardware. If it was 20, it would require 20 copies of the hardware. So how does it know how many to put? --- Quote End --- Yes sir, I have been reading more information and I understand what you say. Although, I'd like to point that this code is from book itself. I'm learning Verilog just recently, and Bibliography is very confusing and contradictory for a beginner. I was trying to make Floating Point operation, and Cavanagh book take this behavioral path. So I wonder, if is senseless to make such of operations by this way, why to write a whole book under this concept, while it should really be implemented differently? This is not a general book, it is about arithmetic in verilog. My first impression was that, it would be ok to use this model, and there is not a better way. What way do you personally would suggest, considering this is a hypothetical real life implementation? One more thing, it seems that is possible to Set .qsf files to a different number, Quartus complaint that 250 is the highest set, and my ctrl_aligh is 8 bits, that would be 255, so I just need to change that setting in the files?

The code from the book looks like it was written for use in a simulator. This code clearly cannot be synthesized. Are you are studying how to develop hardware for doing floating point? Unless that is the case, using floating point in an FPGA is not the best solution. If your goal is to do a parallel calculation, you should convert your problem domain to integer or fixed point. Nearly every computation using physical world data can be done with integers or fixed point. Study numerical methods to see how this is done. Once you've converted your problem space then you can start your FPGA work.

Just because you can write HDL doesnt mean it makes sense in hardware. HDL is also meant to produce behavioral code to aid simulation that is not for use in a real design.

I want to ask here is while keyword in group of synthesized . Is it possible to use generic in this code or even lpm_constant ? Thanks

Forum Discussion

Altera_Forum

Honored Contributor

10 years ago

Quartus complaint: My LOOP is too big? But the code is from the book, what's wrong?

Hello friends. I'm trying to make addition and subtraction of floating point. My guide is a book "Computer Arithmetic and Verilog HDL Fundamentals" by Cavanagh. Inside the module he use a code for aligning exponents as I show as follows.

always @ (oper_1 or oper_2)

begin

exp_a = oper_1 [31:24];

exp_b = oper_2 [31:24];

fract_a = oper_1 [23:0];

fract_b = oper_2 [23:0];

// bias exponents

exp_a_bias = exp_a + 8'b0111_1111;

exp_b_bias = exp_b + 8'b0111_1111;

// align fractions

if (exp_a_bias < exp_b_bias)

ctrl_align = exp_b_bias - exp_a_bias;

while (ctrl_align)

begin

fract_a = fract_a >> 1;

exp_a_bias = exp_a_bias + 1;

ctrl_align = ctrl_align - 1;

end

if (exp_b_bias < exp_a_bias)

ctrl_align = exp_a_bias - exp_b_bias;

while (ctrl_align)

begin

fract_b = fract_b >> 1;

exp_b_bias = exp_b_bias + 1;

ctrl_align = ctrl_align - 1;

end

Quartus II give me the following error:

Error (10119): Verilog HDL Loop Statement error at ADD_SUB_FLO.v(40): loop with non-constant loop condition must terminate within 250 iterations

I searched and it seems that since Quartus can't be sure what will be the size of "ctrl_align" resultant it won't synthesize. Quartus site says I can edit the .qsf file, But I couldn't find in my file any

set_global_assignment -name VERILOG_NON_CONSTANT_LOOP_LIMIT 300

Moreover, I'm somehow impressed that the book teach that step with a wrong code, not syntesizable, I also wish to actually fix the problem not to workaround, what can I do?

15 Replies

Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
I think in this case is mandatory to use floating point, I need iterative divisions, divisions will bring up fractional numbers of different sizes, example. 2.9 - 2.1899, it is necessary to align exponents before make any operations, thus is necessary exponents to vary according to the input size difference, and this is exactly what floating point is about.
--- Quote End ---

In that case - stick with the altera floating point IP cores rather than trying to write your own floating point arithmatic.
This is of course, if there is no chance you can redesign the algorhith to fit with an FPGA. If you want lots of floating point (and especially division - which is particularly expensive in FPGA) why not use a DSP?
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
In that case - stick with the altera floating point IP cores rather than trying to write your own floating point arithmatic.
This is of course, if there is no chance you can redesign the algorhith to fit with an FPGA. If you want lots of floating point (and especially division - which is particularly expensive in FPGA) why not use a DSP?
--- Quote End ---

Tricky, sorry for the question, this DSP you mentioned is like an accessory chip in the development board, or is it the NIOS-II?

In theory. I would need about 57.600 Processing Elements (240 x 240 pixels from a black and white picture) , Each processing element calculates an equation like this

http://www.alteraforum.com/forum/attachment.php?attachmentid=11768&stc=1

where A = is a positive integer from 0 to 255, 1 pixel = 1 processing element, (B, X and Y start with random numbers which will update by addition of the others pixels-PE results, but that's a different story) which I expected to implement in hardware somehow.

In fact, since this is a school project, requirements are pretty flexible, so I have the freedom to use any technology (But I have later to explain the reason).
Altera_Forum
Honored Contributor
10 years ago
a DSP chip is processor designed for digitial signal processing. You can get ones that work in fixed or floating point: https://en.wikipedia.org/wiki/digital_signal_processor

If you try and make 57600 PEs in a single FPGA, you're doomed to failure. For a start, your image (240x240) will not arrive in parrallel. You will get the data as a streaming set of pixels. Then you need to calculate the results in series.
Also, the algorithm looks about ready for some re-design, and I dont understand why it cannot be fixed point. If you know the width of the inputs, then you can with out the max width of U.
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
a DSP chip is processor designed for digitial signal processing. You can get ones that work in fixed or floating point: https://en.wikipedia.org/wiki/digital_signal_processor

If you try and make 57600 PEs in a single FPGA, you're doomed to failure. For a start, your image (240x240) will not arrive in parrallel. You will get the data as a streaming set of pixels. Then you need to calculate the results in series.
Also, the algorithm looks about ready for some re-design, and I dont understand why it cannot be fixed point. If you know the width of the inputs, then you can with out the max width of U.
--- Quote End ---

Tricky, thanks as always for your support. 57600 PE might be an exaggeration (I could actually implement only 1.000 pixels for sake of simplicity) , but the way data get PE is actually not relevant (according to the specs of my project), Once each data is in "position" is when the processing must start. In the other hand. I'm thinking to reformulate my project to use my board DE1-SOC's resources. Do you think I could use NIOS-II and C or Assembly programming language for the equation above for (let's say) 1.000 independent and parallel cores (in a way which makes sense)?
Altera_Forum
Honored Contributor
10 years ago
I don't think that you need a DSP or other processor to perform the calculation. You can also write HDL code that processes data sequentially, using block RAM to hold the array elements. You have to set up a state machine that controls the calculation flow.

I presume that the parallel cores idea doesn't fit any FPGA hardware available to you.

Forum Discussion

Quartus complaint: My LOOP is too big? But the code is from the book, what's wrong?

15 Replies

Recent Discussions

Timing analysis - long combinational path

QuartusPro 25.3 Crashed after using the Signal Tap Logic Analyzer

Duplicate_hierarchy_depth / duplicate_register

Automatically added negative node for TDS output doesn't work with Agilex 5

Quartus 20.1std compilation fails for Quartus map - Device 10AS057K2F40I1SG