The OctoMY™ Blog: June 2012

2012-06-29

ALSA Source

I just implemented a rudimentary audio source in my filtergraph using ALSA capture directly. I also implemented a "n-channel to mono" node since for some reason ALSA refuses to capture in mono on my development computer. It simply discards all but the selected channel and extracts the interleaved samples into its own buffer before passing that on.

Filtergraph first version complete

After working for weeks on the filtergraph testing out all sorts of approaches, I ended up with a fairly simple and loose approach.

In the schematics of my previous posts I talked about sources, sinks, pads, nodes and so forth. In my code I have avoided making the pins explicit. Each node is either a sink, a source or both. sink and source are implemented as abstract classes which provide the means to transfer buffers and to notify/get notified when buffers are available.

I decided against using boost::signals in the end because it really introduced alot of unecessary complexities and added a few hundrer kilobytes of extra swag to my errormessages that I really didn't need. Instead I opted for writing my own lean and pretty naive implementation of the observer pattern.

Source.hpp

/*
 * Source.hpp
 *
 *  Created on: Jun 12, 2012
 *      Author: lennart
 */

#ifndef SOURCE_HPP_
#define SOURCE_HPP_

#include "SimpleObserver.hpp"

namespace filtergraph {
 using namespace std;
 using namespace simple_observer;

 template<class T>
 class Sink;
 template<class T>
 class Source: public Observee {
  public:
   Source() :
     Observee() {
   }

   virtual ~Source() {
   }

   // Connect the given sink
   void addSubscriber(Sink<T> &sink) {
    //cout << "addSubscriber()n";
    registerObserver((Observer &) sink);
   }

   // Disconnect the given sink
   void removeSubscriber(Sink<T> &sink) {
    //cout << "removeSubscriber()n";
    removeObserver((Observer &) sink);
   }

   // Tell sinks connected to this source that a new buffer has arrived
   void broadcastSubscribers() {
    //cout << "broadcastSubscribers()n";
    notifyObservers();
   }
   // Block until a new buffer object is ready
   virtual void pumpSource() = 0;
   // Borrow current buffer object
   virtual T &borrowBuffer() = 0;
   // Return a copy of the current buffer object
   T &copyBuffer() {
    return new T(borrowBuffer());
   }

 };

} /* namespace filtergraph */
#endif /* SOURCE_HPP_ */

Sink.hpp

#ifndef SINK_HPP_
#define SINK_HPP_
#include "SimpleObserver.hpp"
#include "Source.hpp"
namespace filtergraph {

 using namespace std;
 using namespace simple_observer;
 template<class T>
 class Sink: public Observer {

  public:

   // Connect to the given source
   void subscribeTo(Source<T> &source) {
    source.addSubscriber(*this);
   }
   //Disconnect from the given source
   void unsubscribeFrom(Source<T> &source) {
    source.removeSubscriber(*this);
   }

  public:
   Sink() :
     Observer() {
   }

   virtual ~Sink() {

   }

   void handleObserverNotification(Observee &observee) {
    handleSource((Source<T> &) observee);
   }

   // Called by sources when new buffer is available
   virtual void handleSource(Source<T> &source)=0;

 };

} /* namespace filtergraph */
#endif /* SINK_HPP_ */

2012-06-06

Filtergraph documentation

The previous post introduced my plan of creating a filter graph library to make it easy to integrate different audio/video systems.

This post will contain design notes on the filter graph library for my own reference.

Notes:

I will specify a precise terminology to make it simple to talk about the various parts of the system.
I will use some "plumber" analogies as a basis for my terminology.
I will use boost::signal for propagating events about changes to the graph structure as well as the flow of data through the graph.
I will separate the concerns of graph structure from the concerns about data passing through the graph so that the filter graph code may be used to build graphs that can handle buffers of any kind.

Filtergraph terminology illustration

Terminology:

Graph: A system of one or more nodes linked together.
Node: A single component in the graph.
Pad: A connection point on a node which can be connected to exactly one other pad on another node. Can be either an input or an output.
Source: A node that has only output pad(s) and no input pads.
Sink: A node that has only input pad(s) and no output pads.
Filter: A node that has both input and output pad(s)
Connection: A link from an output pad of one node to an input pad of another node.
Pump: A node that determines the flow of control throughout the graph. When a node pushing data from node(s) connected to its output(s) or by pulling data from node(s) connected to its input(s) is said to ac as a pump. Usually there is a single source or sink in the graph acting as pump.
Simple: A node with only one input pad and/or one output pad is said to be simple. This makes it possible to talk about such things as "simple sink", or "simple filter pump".
Buffer: A piece of data traveling though the graph. For video the content of the buffer usually corresponds to one frame of video. For audio, it usually corresponds to a certain length of time in audio samples.

2012-06-05

Filtergraph

I have turned my attention to the software part of the robot. I have decided to attack the challenge of efficiently distributing buffers of audio/video to the different detectors and processors that will be required in the software stack.

Why is this a challenge?

If all processors and detectors were part of the same "package" then this would probably not be a challenge at all, but already before getting started I am aware of 3 separate systems that I will have to integrate against:

libav: input and basic preprocessing of audio and video content
OpenCV: Most video related detectors will use this
CMU Sphinx: speech recognition

I suspect this list will continue to grow rapidly as I extend my ambitions in the future.

I have looked wide and far for solutions to accommodate the need to interface this diverse mix of specialized software packages. My closest bet was libavfilter, but I have decided against using it because it fails to hide many complexities that are a direct result of libavfilter being written in C while aiming to be ultimately efficient and flexible at the same time. In my humble opinion you may choose any two out of those three and succeed in making something that is easy to use.

So what options remain?

Making my own of course! I have some experience with this from previous projects where ultimate efficiency was a goal (and before libavfilter was an option). Unfortunately I won't be able to reuse the code since it is proprietary to one of my previous employers.

Goals of the project:

Make it easy to integrate between the software packages I will use
Write it in standard C++/STL/boost
Use templates to hide complexities and keep efficiency at a maximum
Keep it simple and clean.
Make it somewhat easy to get started with.

Maybe it will result in a reusable library someday, bud don't hold your breath! I will release code for it when it's usable.

2012-06-02

Useful sites

Mostly for my own reference, here is a list of links to sites that offer hardware, software and data related to video processing, SLAM, natural language processing and logic/reasoning and other topics of interest to the DEVOL project.

Reasoning

Language

Video

OpenSLAM

4-stroke Engine

http://engine.honda.ca/mini_4_stroke/horizontal_crankshaft/gx25/specifications

Software stack schematic

As in my previous post, here is a draft of how I plan to lay out the software in the DEVOL robot.

DEVOL Software Stack schematic draft

It can be broken down to the following components:

Audio and Video inputs are filtered through a graph of detectors, recognizers and other more or less involved processes that translate them into useful events such as people, objects, facial expressions, distances, location and so forth.

These events are gathered together with sensor inputs in an event inventory where all events are classified, sorted, persisted, enriched and refined in real-time.

The core of the system is the "consciousness driver" which is simply the main loop of the program. It relies heavily on its array of close assistants that are responsible for their respective expertises of keeping up communications with HQ, inducing logic , keeping track of objectives, keeping track of ongoing dialogues, keeping track of appearances in form of pose and avatar and so on.

The consciousness driver will base its decisions on the content of the event inventory and its decisions will result in changes to pose, additions to the local logic facts database, output of audio dialogue and display of avatar on screen.

Power and electronics schematic

I made a first sketch for the power and electronics diagram for DEVOL.

DEVOL power and electronics schematic

In essence the robot will rely on a 12V lead battery as the main source of power. A small gasoline powered generator will serve as a means to keeping this battery charged when in the field.

Power will be distributed from the battery via two separate regulators, a delicate and stable regulator for controllers and logic and a more robust and protecting regulator for the power-hungry actuator motors.

The system is kept modular so that the different components such as visual input, strategic planning and real-time control may be handled by a separate computer (I suspect that especially the vision part will require a rather powerful computer).

The actuators are connected to a serial bus that distributes commands for each actuator from the real-time controller. Power is distributed along a separate power rail.

Visual input is provided by a calibrated stereo pair (two Logitech 9000 Pro) and another "long range" camera (Logitech 9000 Pro with mounted tele-lens). The whole camera rig is supposed to be put on a pan-tilt rig, guarded from the elements by a glass/plastic dome.

For audio, a hand-held zoom H1 stereo recorder, which provides high quality, low latency sound card and high quality microphones requiring very little power. It also has a third input where I intend to plug in a "long range" so called "shotgun microphone".

This is the first draft, expect drastic changes!

CNC Mill

When working with my first prototype robot limb, I decided to make it from plastic tubing. This decision was made mostly because of cost since plastic tubing is very cheap. Another aspect was availability and space. It became clear after creating this first prototype that I would need other materials to construct my prototypes from.

Right now I don't have space in my apartment for many tools I am looking to buy a house soon, and hopefully it will have plenty of room for my workshop in the basement.

Looking farther into the future, a lot of the parts for the robot will inevitably be machined from metal, and that requires me to get hold of a mill and other metal working tools. Since a CNC mill is really expensive, and since it is basicallt just a robot with 3 actuators (or 4 if you get fancy) I have therefore decided to create my own CNC mill as part of this project.

Since the creation of this CNC mill is not the primary goal of the project, I have decided to cut many corners to speed up its construction. I have cheated and bought a book on the subject. Basically the book is a guide written by someone like me who would like to make their own CNC.

I will take its advice, and use parts from the book directly, but change some things to better suit my own needs.