HW Multiplier

Honored Contributor

21 years ago

--- Quote Start ---

originally posted by timbr@Nov 8 2004, 09:27 AM

hi all people who wants a fast multiply now,

even without adding a hardware multiplier, it is possible to gain some performance on the multiply. i disassembled the multiply routine (__mulsi3):

__mulsi3: â â â â addi â â sp,sp,-8 â â â â â â â â â <- useless, can be removed â â â â stw â â fp,4(sp) â â â â â â â â â <- useless, can be removed â â â â mov â â r3,zero â â â â mov â â fp,sp â â â â â â â â â â â <- useless, can be removed â â â â beq â â r4,zero,mul_30 mul_14: â â â â andi â â r2,r4,1 â â â â cmpeq â r2,r2,zero â â â â srli â â r4,r4,1 â â â â bne â â r2,zero,mul_28 â â â â add â â r3,r3,r5 mul_28: â â â â slli â â r5,r5,1 â â â â bne â â r4,zero,mul_14 mul_30: â â â â mov â â r2,r3 â â â â ldw â â fp,4(sp) â â â â â â â â <- useless, can be removed â â â â addi â â sp,sp,8 â â â â â â â â <- useless, can be removed â â â â ret

this routine can be optimized a lot. the most important optimization that can be done is removing the stack frame stuff. this removes 5 instructions without any problem. this together with two minor optimizations results in the following function:

__mulsi3: â â â â mov â â r2,zero â â â â beq â â r4,zero,mul_30 mul_14: â â â â andi â â r3,r4,1 â â â â srli â â â r4,r4,1 â â â â beq â â r3,zero,mul_28 â â â â add â â r2,r2,r5 mul_28: â â â â slli â â â r5,r5,1 â â â â bne â â r4,zero,mul_14 mul_30: â â â â ret

--- Quote End ---

I think you get the same result if you should be able to compile the libraries with the -f-omit-frame-pointer compiler option.

Has anyone done this already, is it hard to do for someone that has practically no experience in porting or making the gnu toolchain?

Stefaan

Forum Discussion

Recent Discussions

NiosV and juart-terminal

Nios V license

NIOS does not start after SW download (timing issue?)

DK-DEV-AGI027-RA: JTAG chain broken after Nios V Hello, FPGA recovery fails

Ashling RISC Free IDE fails to download ELF file