Why is this a challenge?
If all processors and detectors were part of the same "package" then this would probably not be a challenge at all, but already before getting started I am aware of 3 separate systems that I will have to integrate against:
- libav: input and basic preprocessing of audio and video content
- OpenCV: Most video related detectors will use this
- CMU Sphinx: speech recognition
I have looked wide and far for solutions to accommodate the need to interface this diverse mix of specialized software packages. My closest bet was libavfilter, but I have decided against using it because it fails to hide many complexities that are a direct result of libavfilter being written in C while aiming to be ultimately efficient and flexible at the same time. In my humble opinion you may choose any two out of those three and succeed in making something that is easy to use.
So what options remain?
Making my own of course! I have some experience with this from previous projects where ultimate efficiency was a goal (and before libavfilter was an option). Unfortunately I won't be able to reuse the code since it is proprietary to one of my previous employers.
Goals of the project:
- Make it easy to integrate between the software packages I will use
- Write it in standard C++/STL/boost
- Use templates to hide complexities and keep efficiency at a maximum
- Keep it simple and clean.
- Make it somewhat easy to get started with.