Hi dear Altera friends. Sorry for the simple questions, i'm a beginner. My FPGA is DE1-SoC University, i'm learning how to use it. My question today is the following one: Until now, with the help of the student material of Altera, I have learn how to implement simple circuits, ALU and registers. Now, Is it possible to implement in hardware complicated equations? Like 1 divided by a a big number, or square roots, I mean, to work with real numbers like 0.0031416 and so on? I know how to do this either in C and Assembler, but I don't know if this is possible to do in FPGA and VHDL. A senior friend told me, this is not possible to do in VHDL, he said I have to make of use ARM chip in my FPGA and program it fon C, since FPGA and VHDL alone is not able to perform such of calculations. What can you say about this? Example: http://www.alteraforum.com/forum/attachment.php?attachmentid=11555&stc=1

Your friend doesnt know much. Altera provides floating point cores to do floating point arithmatic. https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/ug/ug_altfp_mfug.pdf While you wouldnt be using much VHDL, you are designing a circuit. The VHDL would just glue the cores together. But floating point in FPGA does have it's issues - it has high resource usage and high latency. If you have a constant data stream then the FPGA can process far more than a processor could handle (it can do it in real time - you basically build a custom co-processor). But if it's just a few calculations then it's probably easier to let an arm do the work. So the answer is "Yes it can do it". But you will need to think seriously about how to implement it.

Tricky right. You can But your equation can be more optimized to parallel computation) and should be rewritten as you compute power. it depends how data stream come You have to use floating point cause you need exponent and logarithm or you can build your own for integer data type or fixedpoint

y**x = 2**(x*log2 y) if y > 0. avoid division and use rational approach 1/x = x**(-1) = 2**(-log2 x) , x>0. You can rewrite your equation fully in 2**x and log2 x function. if you need to compute sum the problem of accuracy still exists. you have to remember if you will add unsorted floating-point. Does it take place in FPGA?

When the proper units are chosen, very few caculations in the physical world cannot be done with 64 bit (or higher) integers. DSPs were for a long time strictly integer many techniques developed then are apply to FPGAs. Look into these techniques and take another look at how your problem might be solved with them. I don't recommend using the floating point FPGA blocks that are available. They waste FPGA resources and aren't needed with some careful analysis of the problem. The same goes for floating point in OpenCL and Vivado HLS.

Oh. If Galfonz recommend using dsp I suggest you find docs about transformation based on fft for arithmetic operation with big integers It is not on surface. Even googling web cannot provide fast answer

Me and my simple questions: Can I do this in FPGA?

37 Replies

Altera_Forum
Honored Contributor
10 years ago
i read there are conversation about number manipulations, what are you trying to design? digital image processing or ?
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
I wanted to consult one more thing to you guys, the Soft Core Nios II, would it be more easy to use this CPU for my purpose by programming a custom logic? In that case I'd use C language ,right? so in the end will be more easy?

rromano001 yes, i will start by programming as you said, regarding your questions

but now how large is image :

There is not actual image, I want to use a list of numbers, every number (binary) indicate a pixel intensity, I'm reducing everything to the clustering of a list of numbers, which can be 1000 numbers for example to be grouped in 3 clusters by calculating its membership value and centroids.
.
--- Quote End ---

Hi, sorry but we are speaking different language, we cannot help if you don't provide information of what you wish to learn on or build. Please grasp we don't own magic bowl to see what is in your mind or project seems so secret.

So this is a linear image for ex from a linear ccd or bidimentional raster image?
What is your cluster? Or better how it behave to numbers in the memory image?
Why two index are on uik target of processing and same appear on internal indexes of coefficients?????
Are you proficient in mathematical term? So express in an usable form, this equation with no detail is just wasted resources bandwidth and time of all us.

--- Quote Start ---

This is manage by loading only 1 number to a PE to do the calculation of the equation above. It will produce the first Uki and Vk (centroid), by communicating with the other clones PE it will update Vk and will calculate Uki again, and so on.
.
--- Quote End ---

As before this don't resolve where reside input memory array and output memory array so again no knowledge to us how number got fit to PE [aka Processing Element] and so if some of them need buffering to prevent be overwritten in case of same memory area. No difference if on hardwired processor or in some FPGA or discrete logic, the channel caveat are forever the same!
Assuming your finally declared image size of 1KPx then this small amount of memory can simply be allocated on two separated internal M9K based fast memory block and avoid caching and SDRAM arbitration logic...
If you want hlep please start comminicate.
Equation and index itself say just nothing and nothing is a system with too many variable so it remain not solvable due to his mathematical rules.

From your actual I can infer first pixel generate others??? SO it is again impossible at least 4 different number appear in equation...

--- Quote Start ---

So, the difficult part here is to make the PE to perform Subtraction, Adding, Multiplication and Division of fractional numbers.
.
--- Quote End ---

this is quite simple, not trivial but not a problem at all.
Remember this was done in a mechanical way so I why you continue assert it is not feasible in a modern fast FPGA logic???
see here on long time ago history to learn about how they got assembled:

https://en.wikipedia.org/wiki/z1_%28computer%29

so all it was feasible on old machinery is no more feasible now?
Boole and numbers theory are in the long and long long far past.

--- Quote Start ---

and how uki xi vk vl interact between them? As is in the equation.
.
--- Quote End ---

So you continue disregard my question how are index related to input and output memory, are them separated memory [dfferent array in term of C or other computer languages programming] or same string of memory cells?

--- Quote Start ---

and from where are coming inputs and where are going outputs?
I think I will use the memory of the development board to load a table with the 1000 numbers to be distributed to each PE, and output goes to the neighbor PE to the update, and when it finish to do clustering, it will load the results in memory I guess.
--- Quote End ---

PE stand for Processing Elements or Px Pixel element?
Neighbour stand for first element so PE after computing one "cluster of three element store back result to first elements? This need form of parallel addressable FIFO in term of at almost processing number element plus two to preserve moving parallel.

ANd now caveat of your system:
Memory is shared so one access to read and one to write, first processing time has latence of at least the number of processing element reading, after processing end result has to be written back and this burden again memory channel...
This need plan reading and writing memory in burst and fill in the cache .

Talking in term of dual core ARM is on board then you can prepare two task, one working on first "cluster" and second working on third cluster so :
need read first memory cells in number equal to processing elements and store on buffer, buffer has to be large number of processing plus two (every PE need 3 element from your writing):

set write index to 0
set read index to zero too
2 time:
{ .comment again this cannot be done in parallel due to RAM access
shift cellbuffer right one cell .comment this can be paralleled on store operation
read array[read index] and store to last cell .comment this can be paralleled shift operation
increment read index .comment this can be paralleled with great care
}
. comment element index is now at PEn
loop
2 time:
{ .comment again this cannot be done in parallel due to RAM access
shift cellbuffer right one cell .comment this can be paralleled on store operation
read array[readindex] and store to last cell .comment this can be paralleled shift operation
increment readindex .comment this can be paralleled with great care
}
.comment all PEn+2 element get on memory
pass cellbuffer to task1, task2 in parallel
store result 1 to memory[writeindex]
increment writeindex .comment this can be paralleled on store operation
store result 2 to memory[writeindex]
increment writeindex .comment this can be paralleled on store operation
if last element not reached then
continue to loop
else done

At this we can plan two term of reducing starvation of processors due to memory channel congestion...
Starvation was really a great concern on CRAY computers but on modern device with fast communication and ram access still can plague actual system too.
Starvation touch new parallel system and new techniques can be explored, just old mode of batch processing doing one thing at time STILL leave processor cluster starve a lot .
read more memory elements from array using DMA during computation (this in case no need of main memory acces is required, you have two memory so you can separate FPGA from ARM and do it in <super>parallel fashion.

evaluate when balance of advance reading reduce performance in term of bandwidth saturation...
Again your is I assume an exercise and 1K unit is so small don't need optimization than in special cases...
Cells buffer can be special memory with parallel shift logic and possibly pipeline of new feed and shift on number of PE stage shift....
Everything can be built but remember:
Only problem we can solve by manual computation can be solved by automata.
We can apply some trick and clever logic we learnt and we think as new but this cannot help solve the unsolvable too.
Communication is first ability.
You can appear clever at first or just communicating you have no intention to do it.
Happy new year.
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
The fixed point package is part of the vhdl 2008 language spec. But quartus does not fully support 2008 yet - but David Bishop wrote a '93 compatible version of the fixed_pkg that compiles well with quartus (at least it worked just fine about 6 years ago and I dont see why it would stop working now - I infered rams and multipliers with it just fine). You can download it from here: http://www.vhdl.org/fphdl/
This package doesnt really do anything other than integer arithmetic - it is just holds the numbers in an easier to understand (and modify) format. There is nothing you can do with this package you cannot do with integers (but it takes a little more careful though). The logic created is identical (as fixed point is simply integer arithmatic with an offset).

--- Quote End ---

Tricky regarding the VHDL 2008 library, is it confirmed that is not supported by Quartus Prime? I googled it and It says Quartus Prime works with VHDL-2008 support, would you clarify? Do I have to use a the proceed for loading the package?

Sorry for my poor english, I'm new in both, English and VHDL :P
Altera_Forum
Honored Contributor
10 years ago
Sorry for misunderstanding, I'm new in VHDL and my English is not my first language. Right now I'm only focused into program in VHDL something that perform the equation (I call it PE (Processing Element), I'm starting by addition of fixed point binary numbers.

Regarding the VHDL-2008 library, is it confirmed that Quartus can't support it? I read that Quartus Prime has suport for VHDL-2008.
Altera_Forum
Honored Contributor
10 years ago
Quartus has had "support" for VHDL 2008 for about 6 years. But it only supports specific features. It doesnt support the fixed point package that is part of the VHDL 2008 spec.
If you want to use the fixed point library - you need to include it in your project as if it were another design file - and use the '93 version from the website www.vhdl.org/fphdl

This version is almost identical to the 2008 version, except it doesnt support package generics.
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
Quartus has had "support" for VHDL 2008 for about 6 years. But it only supports specific features. It doesnt support the fixed point package that is part of the VHDL 2008 spec.
If you want to use the fixed point library - you need to include it in your project as if it were another design file - and use the '93 version from the website www.vhdl.org/fphdl

This version is almost identical to the 2008 version, except it doesnt support package generics.
--- Quote End ---

Indeed, I have made an "Adder" using the fixed point package, also I have created another file for testbench in which I load a pair of random numbers to be added. would you please tell how can I see the results of that addition? Is there a kind of waveform screen?
Altera_Forum
Honored Contributor
10 years ago
# ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(22): Library ieee_proposed not found.# ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(23): (vcom-1136) Unknown identifier "IEEE_PROPOSED".# # ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(25): VHDL Compiler exiting

Model Sim doesn't like the VHDL 2008? Quartus II seems to compile, but when I try to use modelsim for simulating it gives me this error.

Tricky I've seen some of your answers regarding this question in the web.

I have done a file for adding binary unsigned fixed point numbers, and also a testbench loading two numbers to be added. Would you please say how to see the results of this addition ?
Altera_Forum
Honored Contributor
10 years ago
--Programando SUMADOR como un Registro--
library ieee;

use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.std_logic_unsigned.all;
use ieee.fixed_pkg.all;

entity SUMADOR is

port
(
-- Input ports
numero1 : in ufixed (2 downto -7);
numero2 : in ufixed (2 downto -7);
resultado : out ufixed (2 downto -8);
clk : in std_logic

);
end SUMADOR;

architecture arc_sum of SUMADOR is
begin

process (clk) begin
if (clk'event and clk='1') then -- when clock rise up--

resultado <= numero1 + numero2;
end if;
end process;

end arc_sum;
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
# ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(22): Library ieee_proposed not found.
# ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(23): (vcom-1136) Unknown identifier "IEEE_PROPOSED".
#
# ** Error: C:/Users/..../Downloads/altera/fixed_pkg_c.vhdl(25): VHDL Compiler exiting

Model Sim doesn't like the VHDL 2008? Quartus II seems to compile, but when I try to use modelsim for simulating it gives me this error.

--- Quote End ---

This is because you didnt create the ieee_proposed library in modelsim. Quartus is quite relaxed when it comes to libraries unless you specify them - it just searches all design units for the correct packages.
as you might have guessed, ieee_proposed was just a placeholder name for the code before it was officially released, and still holds for the '93 compatible code. You can modify it if you wish to be ieee or work. But modelsim already has the fixed_point libraries (in 2008 format) in the ieee library.

as for your code, resultado needs to be declared:

resultado : out ufixed (3 downto -7);

Otherwise you miss the carry bit (numbers never get smaller when added).
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
This is because you didnt create the ieee_proposed library in modelsim. Quartus is quite relaxed when it comes to libraries unless you specify them - it just searches all design units for the correct packages.
as you might have guessed, ieee_proposed was just a placeholder name for the code before it was officially released, and still holds for the '93 compatible code. You can modify it if you wish to be ieee or work. But modelsim already has the fixed_point libraries (in 2008 format) in the ieee library.

as for your code, resultado needs to be declared:

resultado : out ufixed (3 downto -7);

Otherwise you miss the carry bit (numbers never get smaller when added).
--- Quote End ---

Tricky thanks for fast response. I edited the downloaded files to not use IEEE_PROPOSED but just leave IEEE. Quartus Prime compiles without error, but Modelsim give several lines of this error

# ** Error: C:........Downloads/altera/fixed_pkg_c.vhdl(1420): (vcom-1295) Function "to_ufixed" has already been defined in this region.
# ** =====> Prior declaration of "to_ufixed" is at C:/Users/José/Downloads/altera/fixed_pkg_c.vhdl(1047).

Similar to what happen to this user:

http://www.alteraforum.com/forum/showthread.php?t=49993

What do you recommend to do?

Also, I'd like to ask: If Quartus Prime is compiling good, it means the project is doing good?
How can I see the result of the addition I'm doing?? is it trough modelsim?

Forum Discussion

Me and my simple questions: Can I do this in FPGA?

37 Replies

Recent Discussions

Will serialization factor of 6 in LVDS serdes IP be supported in the future on Agilex5?

Cyclone 10 LP's Extended Industrial parts

Avalon-ST configuration with Agilex 3 fails

Agilex5 A5EB013BB23BE4S BSDL

Cyclone IV E – PLL Power Track Width Recommendation Clarification