Slow memcpy speed

Honored Contributor

9 years ago

I think my problem is related to the high address (ALT_LWFPGASLVS_OFST = ff200000) that is used, and this might have to be fixed in the kernel space…

While waiting for someone to fix this for me :) , I wrote an assembly version of the memcpy using the “NEON memory copy with preload” example from arm infocenter.

I had to add “SUBS r2,r2,#0x40” before the loop, if not the loop would go 64 bytes too far (thus overwriting memory).

Using this "neon memcpy" I got a bit more speed (62 MBytes/s), and I could use the -Ofast flag to optimize the rest of the code.

This function is called the same way as memcpy, but the data must be 64 bytes aligned:

void *neon_memcpy(void *ut, const void *in, size_t n)

neon_memcpy.S:

.arch armv7-a

.fpu neon

.global neon_memcpy

.type neon_memcpy, %function

neon_memcpy:

SUBS r2,r2,#0x40

neon_copy_loop:

PLD [r1,# 0xC0]

VLDM r1!,{d0-d7}

VSTM r0!,{d0-d7}

SUBS r2,r2,#0x40

BGE neon_copy_loop

bx lr

Forum Discussion

Recent Discussions

NIOS-V QSYS Warning Properties (associatedClock) have been set on

DK-DEV-AGI027-RA: JTAG chain broken after Nios V Hello, FPGA recovery fails

Where is FreeRTOS-Plus-TCP Design

NIOS V: Systick based timeouts not available when using internal timer

Ashling RISC Free IDE fails to download ELF file