Monday, August 9, 2010

Meanwhile

I'm waiting for another benchmark to finish to publish my (rather lengthy) weekly update. It should be done within the next hour. Meanwhile, here's a small sample of many more graphs to come. 
update-20100812 (am): right before I was about to publish the rest of the graphs, I ran into a really strange (probably alignment-related) bug that was causing fftwff to stop at N=32. I've sinced fixed the bug, but also realized that I completely neglected optimized memory copies! FFTW terms these as 'rank-0' transforms and you can probably imagine that they're absolutely essential for pretty much everything. This morning I identified the appropriate files to fix, and will get to it tonight after spending the rest of the day on my thesis. For the impatient, I gracefully refer you to my git repository

update-20100814 (pm): ... found the bug... oddly enough it was not due to misalignment. libavcodec/fft.c:75 in ff_init_ff_cos_tabs(). I rebuilt ffmpeg it with hardcoded tables.

update-20100816 (am): i've since fixed several other bugs to do with alignment and vector strides, added the beginnings of memory copy acceleration (for rank-0 transforms and buffered operations), and am (yet again) generating lots of lovely graphs.  

2 comments:

NIC1138 said...

I have some plans to use FFT in ARM chips in the near future, so I am following your work with much attention! :)

One question: what is the reason for the peak on these graphics? Delay due to more memory accesses?...

Christopher Friedt said...

The method with the highest performance (ffmpeg) is from benchmarking ffmpeg with benchfft-3.1 . You can see my misc repository.

The method with the second best performance (also ffmpeg) is from code I commited to my main repository. It links to ffmpeg dynamically, but as you can tell there is a slight bit of overhead.

FFMPEG has highly optimized power-of-two transforms (i.e. X=F{x[N]}, N=2^p) routines, but FFTW has several algorithms to computer non-power-of-two transforms, some that can actually re-used the FFMPEG routines. Buffering needs to be done in order to push data to the FFMPEG routines. It's not quite done, but will be soon. Just keep an eye on the git repository.

Post a Comment