Showing posts with label demo. Show all posts
Showing posts with label demo. Show all posts

Monday, June 28, 2010

Weekly Report - Week 5

Originally, I was quite happy to have achieved a 5.5x speedup in mutliplying long vectors of complex floats, but this week, I managed to increase that to values varying between 11x and 18x. My changes were quite subtle, but I suspect the main problem was issuing vmul instructions after vmla, which is specifically noted to produce stalls of 4 cycles in the Cortex-A8 reference manual. What's odd though, is that the q-reg implementation seems to generally perform slightly faster than the d-reg implementation in most cases. In any case, these speedups are most certainly going to make a major difference when implemented in fftw compared to the code generated from the generic C code.

In case you aren't watching them already, I have two repositories set up. The first one is specifically for fftw, and the second is for snippets of c code and assembly, for benchmarking purposes.


Please feel free to check out  'demo2' from misc and run it on your beagleboard to observe the speedups first hand.

Sunday, June 20, 2010

Demo #2

Hi all,

it's far past my bed time, so I'm going to put up a link to some sample code that shows a 5.5x speedup for multiplying long vectors of complex numbers using NEON as opposed to the default libc implementation. Here it is. This has been an "educational" experience in familiarizing myself with the NEON instruction set. Tomorrow, I'm moving on to more FFTW related things, and will finally commit some demo code, and FFTW changes to my git repository.