I have turned my attention to the software part of the robot. I have decided to attack the challenge of efficiently distributing buffers of audio/video to the different detectors and processors that will be required in the software stack.

Why is this a challenge?

If all processors and detectors were part of the same "package" then this would probably not be a challenge at all, but already before getting started I am aware of 3 separate systems that I will have to integrate against:
  • libav: input and basic preprocessing of audio and video content
  • OpenCV: Most video related detectors will use this
  • CMU Sphinx: speech recognition
I suspect this list will continue to grow rapidly as I extend my ambitions in the future.

I have looked wide and far for solutions to accommodate the need to interface this diverse mix of specialized software packages. My closest bet was libavfilter, but I have decided against using it because it fails to hide many complexities that are a direct result of libavfilter being written in C while aiming to be ultimately efficient and flexible at the same time. In my humble opinion you may choose any two out of those three and succeed in making something that is easy to use.

So what options remain?

Making my own of course! I have some experience with this from previous projects where ultimate efficiency was a goal (and before libavfilter was an option). Unfortunately I won't be able to reuse the code since it is proprietary to one of my previous employers.

Goals of the project:
  • Make it easy to integrate between the software packages I will use
  • Write it in standard C++/STL/boost
  • Use templates to hide complexities and keep efficiency at a maximum
  • Keep it simple and clean.
  • Make it somewhat easy to get started with.
Maybe it will result in a reusable library someday, bud don't hold your breath! I will release code for it when it's usable.

No comments:

Post a Comment