This project was created for educational purposes to learn more about synthesizing 3D spatialized audio using HRTFs. The result is a beep that continuously plays, moving around the user along the horizontal plane in increments of 5 degrees.
The current result isn't quite satisfying - there is certainly a sense of spatialization to the audio, but around 30-35 degrees (if N is straight in front of the listener, 30-35 degrees would be NE) it sounds like it is to the right, and at 90 it sounds like it is at 0 or 180. It's likely there is a bug in there somewhere, but I haven't spent enough time looking to find out where it is. Also, given that the perception of spatialization is subjective, it's hard to know whether there is a bug or whether the HRTF data just isn't a good fit for me.
The HRTF data used is the MIT HRTF KEMAR dummy head measurements available at http://sound.media.mit.edu/resources/KEMAR.html. The audio data is converted to 2-channel 32-bit floating point little-endian for consistency with other audio data being used.
The IRCAM HRTF database was also used (http://recherche.ircam.fr/equipes/salles/listen/) but the resultant audio did not seem to be effective for me, but could have been a bug in the application.
- SDL is used for input and audio
- KISSFFT is used for FFT computations
- All FFTs use 512 points
- The HRTF files are 128 sample stereo wav files (more correctly, these are the HRIRs the HRTFs are generated from) that get 0-padded to 512 samples to match the rest of the data. Although FFT data is sometimes 0-padded to increase the resolution of the DFT output, I'm not sure there is any benefit in doing it in this case other than consistency with the IRCAM data I had originally used.
- The input is a monophonic audio clip that is streamed in 512 sample chunks. At runtime stereo output is generated by convolving the monophonic audio with the left and right channels of the HRIR corresponding to the current azimuth.
- Interpolate HRTFs for smooth movement
- Add OVR support - head tracking and visuals
- Lots of little performance optimizations, but the goals for this was to just get something working
- How to spatialize audio outside the range of the HRTFs? I haven't found an open data set that includes data directly below the listener. I imagine the anthropometry below the neck of the listener comes into even larger effect here.
- Is there any research on interpolating between full HRTF data sets to cover a much larger range of listeners?