Forum Discussion
Altera_Forum
Honored Contributor
14 years agoC2H is something different, it is basically facilitating the use of C code as a HDL to ease porting existing C algorithms to FPGA resident implementations. For an algorithm that is well understood, and a FPGA architecture that is well understood, and for a developer that knows how to work efficiently in VHDL or Verilog, there may be no great need to use C2H since it is just trying to accomplish the mapping of algorithm (as defined by existing code) -> HDL/FPGA implementation, just as you'd do manually when making a HDL port of the algorithm. You may benefit from C2H if you have a very optimized C implementation which is difficult to port to HDL, but it isn't the custom instruction mechanism for NIOS that I was referring to.
NIOS custom instructions are just mechanisms to take a certain set of reserved NIOS opcodes and link their execution to a user defined 'execution unit' so that when the opcode is encountered in the code, the custom logic is triggered and it may read input data from the NIOS and produce output data for the NIOS as a result of its execution, just as if you had to implement an instruction like ADD Src1, Src2, Dest to perform an addition of two source operands and store the result in a destination location, except instead of 'ADD' it would be whatever function you want to implement. Since you say your algorithm is math dense relative to memory load/store data transfer operations, you possibly may to be able to get it working pretty quickly using X86 C/SSE-assembler and register/L1 cache resources, or maybe using a GPU and OpenCL / CUDA with heavy use of cache / local / global memory as compared to general GDDR RAM / host RAM. If that is the case and you can achieve giga-ops/second execution speed via CPU or GPU with an efficient implementation of the algorithm, that will raise a pretty significant bar for the performance a FPGA would have to exceed for it to process faster than a $400 PC perhaps with a $200 GPU added, considering that most mid-range FPGAs cost similar money for far less GIPS/s compute capacity. There are certainly many things a good FPGA could do faster than a mid-range CPU/GPU, particularly if you need a hardware I/O interface to a data acquisition system as part of the system, but for a purely computational problem, I'd look more at GPGPU / X86 SSE / ASM before going FPGA. MATLAB/SCILAB is handy for looking at mathematical algorithm implementations (e.g. linear algebra / matrix math / FFT etc.) that might be harder to prototype using C or assembly before you switch to C/ASM after having decided how to efficiently implement your algorithm using lower level but faster languages. If your C is already efficient relative to the achievable bound given the algorithmic complexity, you may not need to consider MATLAB/SCILAB (though sometimes they can be fast in their own right if they can use GPGPU or LAPACK or similar efficient execution engines for most of the core calculations). This is some of the NIOS stuff that I'd suggest looking at, though first I'd look at the FPGAs in general, their clock rates, memory / register density, et. al. and figure out by napkin / ballpark calculations if in the best case it is even possible for the hardware to run your algorithm sufficiently fast compared to other options to make sense to implement in Cyclone / Arria / Stratix vs. X86 / GPU / whatever. ftp://ftp.altera.com/outgoing/download/support/ip/processors/nios2/niosii_docs_11_0.zip http://www.altera.com/literature/ds/ds_nios2_perf.pdf http://www.altera.com/devices/processor/nios2/benefits/performance/ni2-high-performance.html http://www.altera.com/devices/processor/nios2/cores/fast/ni2-fast-core.html If so, going to synthesizable verilog might be a good step, and there are good simulators for that for the PC before you need to try to execute on a FPGA. --- Quote Start --- Following up... @dwh, that sample is a great starting point for the next step in my learning. Thanks. The recommended platform is the NIOS II Embedded Evaluation kit. I don’t mind spending the money on the kit but it looks like it has several features (LCD for example) that I really don’t need right now. Is there a less feature rich/more focused product I could use to implement this example? @af1010, is the ‘custom instruction mechanism’ the C-2-Hardware code optimization that Altera produces? For the implementation testing, I’ve already got the routine working in multiple high level languages (C, java, c#) just to test the performance of various systems. It looks like Matlab and then Verilog are the next places to go. There should not be any floating point calculations and there should be low amounts of data transfer relative to the algorithmic work applied to the data, so I am really hoping that the FPGA will be the platform to target to achieve good speed. --- Quote End ---