Forum Discussion
Altera_Forum
Honored Contributor
10 years agoHi again,
you may wonder why I make such a proposal that looks very complicated at first... I can explain it, it has to do with the design that I'm working on. == the design == It all began with a bitserial CORDIC-unit for cartesian to polar conversion. That unit has to processes only 12,000 complex inputs per second, it would be a waste to make a parallel design (doing trigonometric functions in 12,000 IRQ/s was not an option). My little coprocessor is therefore serial with respect to CORDIC-rounds and also uses serial adders to minimize resource-usage. Real and imaginary part both have 23 bit precision. Each round takes 23 cycles and I have 22 rounds (always precision-1). That gives a total of 506 cycles per computation. 12,000 inputs take 6,072,000 cycles, no problem at 100MHz. (the choice of 23 bits comes from the restriction that I want <512 cycles, more cycle would need more M9K blocks.) As internal tristate logic is not available in the CYCLONEs, the bit-shifter for CORDIC is a pain. Ray Andraka gives the hint to dual-port memory (http://www.andraka.com/files/crdcsrvy.pdf) and that was the solution. A M9K-block in true dual port, reading on both ports and additionally writing on one can do the magic. It needs a very complicated adressing scheme though, but it can also implement signed extension for the shifted values on the fly. I quickly decided to implement RAM-address generation by simply reading it from a ROM-block, a good decision as more control signals needed to be added later on for a divider, some scalers and some filters (interacting with their own RAM-block and also needing addressing, read/write-enables and so on). By now, I generate 28 bits of control signals from this ROM, all periodic on 506 cycles. == how it's done in qsys == Divide and conquer, everything has it's own component, some QSYS subsystems, some with composition-callback, some with elaboration-callback. What I now have is a central "synchronizer"-component in my design, that exports those control signals to every component that needs some of them. There are 20 conduits for the control signals. The synchronizer uses an elaboration-callback and has a table for the conduits, so I can add one every time I need it. == the problem == The problem is now that the same set of control signals is routed to every receiver. If I add a signal, all receivers must be changed to include the signal, even if they do not use it. And, by the way, the system looks very complicated with all those control-conduits. The data-paths between all those components are also made up of conduits. I would really like to use avalon_streaming, but I often have components that need data from two bittrains with identical timing (of course related to the control signals). The only way to do this with avalon_streaming is to slice a multibit data role, so both bittrains have the same valid and ready-signals. Slicing is a pain when it comes to Verilog-Generation. It would be possible to use some Altera-provided IP in some places, but I would need two components for the two bittrains travelling on the same streaming-interface. So I often end up doing my own components for those standard tasks, most times with a TABLE-parameter to have the names for the data-trains configurable. The beatsPerCycle-porperty gave me some hope, it has a little more documentation in the recent Avalon-spec. The docs are confusing though (is default 8 correct?!?), and when you test it, it shows, that beatsPerCycle does not what I need. == solution? == A 1:N connect would be very helpful, lazy role checking (sink is subset of source) even more. Associating control signals with data streams would be a dream. Greets, Andreas