New IP development

Honored Contributor

12 years ago

--- Quote Start ---

This thread is almost ( but not quite ) gone full circle.:p

With pure software optimization, I guess you would be looking at an ideal goal that boiled down to (64) loads for the two operands, (32) add/subtract/whatever, and (32) store of the result. Let's call it 128 instructions.

If you are not a HDL person, and this ideal performance is more than adequate of where you need to be, you can attack your problem with just software and at least have a chance of achieving your desired performance.

--- Quote End ---

Akhil>> Please note that this is an academic thesis project and hence by I plan to implement hardware modules for RSA and BIGNUM which could assist an application developer (who may program in a NIOS II SBT, in C/C++). It is not the other way around, which can make a circle as you pointed out :P. I have an example code with me which does the software implementation for RSA. I was wondering any means of converting the software implementation to an HDL RTL design and comparing the performances. And please remember my tentative design diagram, that is not a finalized one. As of now I plan to generate the primes for RSA inside my RSA IP core and hence by I may not have to load the NIOS II CPU for that. Also the communication between the RSA <-> BIGNUM should be okay (I guess?) since I plan to make both the data buses as 1024 bits (even the ALU s).

--- Quote Start ---

The shortcoming of C2H in this specific example is that I believe it is structured as an optimization tool for 32-bit operations, and there is no way to communicate to it that you want to specify 1024-bit operations.

--- Quote End ---

Akhil>> The C2H has been discontinued for QSYS and as you pointed out, that is specific for NIOS II, which supports a 32-bit architecture. So I may not be using this to convert my C to a hardware accelerator.

--- Quote End ---

--- Quote Start ---

Back to the current topic of software optimization: I think what you want to do is use the profiler to identify your high execution count / execution time functions, and then techniques like dsl has written to zero in on the big bottlenecks. As far as acquiring timing data, I like the Performance Counter IP block.

--- Quote End ---

Akhil >> I am interested to profile my code for finding out the bottlenecks. However the challenge here is to convert my C code to a hardware efficiently after identifying the bottlenecks in the code. Please advice me if you know an efficient tool or if I have to hand-code everything.

Best,

Akhil Kalathungal

Forum Discussion

Recent Discussions

Access to RLC data for Agilex5 IBIS Models

Configurable transceiver enable

Agilex 5 dual simplex fitting

JESD240B - No license

Interface LVDS to Gigabit transceivers