Android x86

One realization i got while working with the software stack for the robot was that the software will have to be distributed over a pretty heterogeneous mix of hardware. Each hardware will run a piece of the software that each perform a part of the processing needed for the robot.

Some of the hardware/software components are on a low level, and need to run real-time and have a very narrow and predictable set of responsibilities, while others are on a higher more administrative level.

Deciding which authority is given to the components and how they communicate will be an ongoing and difficult task. Several strategies exist. One is distributing the decision making the farthest possible way down, so that each component has as much as possible authority to solve problems on their own, giving the robot more resilience to errors in communications.

The opposite approach is of course to centralize as much as the decision making as possible in one central device that then commands the other subsystems, making the decisions more informed, and quicker, but making the solution more vulnerable to single point of failure.

I think that both strategies have both weak and strong sides, and that I will have to make a decision in each separate case whether to put authority centrally or distributed. In other words, an intelligently configured hybrid may be the best solution.

"What does this have to do with Android"? you may ask.

Well, the software that will be put on the embedded low-end hardware will be mostly purpose built and even firmware in some cases, but at the higher levels we will need something really flexible, yet efficient and modern. Something awesome. And that is in one word: Android.

Illustration photo HP tx1000 convertible tablet

That's why I am building Android for x86 to fit my HP tx1000 convertible tablet computer right now. Hopefully it will let me develop a central, top-level, main brain app that can command all the other subsystems big and small in perfect unison.


Devol Speaks!

I have made a new node for the filtergraph. This time it is a wrapper for the excellent open source eSpeak text to speech synthesis system, that basically allows my robot software to utter words and sentences in english. My test code for the new node looks like this:
void test_speech_synthesis() {
 cout << "allocin --------------------- \n";
 AudioSpeechSynth tts;
 AudioFileSink afs_mono("mono_tts.raw");
 cout << "buildin --------------------- \n";
 cout << "openin --------------------- \n";
 cout << "pumpin --------------------- \n";
 string s("hello devold");
 cout << "done pumpin, breaking --------------------- \n";
 cout << "done breaking, closin --------------------- \n";
 cout << "done closin, COMPLETE!\n";
I have yet to design a proper voice that matches the robot personality. More on that later.


Honda GX25 Unboxing

The original box it came in was surprisingly light and small. The engine itself weighed just over 3 kilograms on my scale.

The engine was fitted with the inner part of a centrifugal clutch attached to the flywheel/fan. Looking at the parts lists online I found that it there is an alternative bare axle version. To make an axle for this engine makes the perfect first project for my CNC Mill!


GX25 in plastic

Surprisingly small and light box.
GX25 From the righthand side
GX25 From the back

GX25 From the front

GX25 From the lefthand side


CNC Mill and Honda GX25 engine ordered

I had previously decided to make my own CNC Mill as part of this project. That was before I got money on my hands, and before I found a great offer on a used CNC Mill.

EMCO F1P-CNC Mill (to the right )with matching
EMCO tronic M1 controller (to the left).
Image shamelessly copied of the internet.

I have ordered it, and it will be arriving at my apartment in a week. It is an old "EMCO F1P" with a matching control/monitor unit "M1". Mor details will follow as I "unbox" it.

I also bought a Honda GX25 mini 4-stroke 1-cylinder petrol engine. If you have seen the post about power and electronics you will know that I intended to use a generator on board to provide longer range.

The engine arrived just 1 day after I ordered it, the unboxing of which will follow in a later post.


Insights from an expert

While shamelessly asking Mr. Samir Shaker for the code accompanying his paper on using OpenCL while implementing the SLAM algorithm, I received very insightful and extremely relevant advice based on his hard earned experience, which I have reproduced in part here (with his permission of course):

 [...] it seems that you are using the AMD implementation of OpenCL. I have worked with both the AMD and Nvidia implementations extensively, and it would be safe to say that Nvidia's implementation is much faster and much more completely. The biggest flaw in the AMD implementation I would say is the lack of support for images in OpenCL. This is a driver issue, and they plan on supporting images eventually, but after all the time that passed since the OpenCL standard they still haven't done so! My code uses images, so it would only run on an Nvidia implementation (for now).

Also, as a general remark, I would like to tell you that from experience (and a lot of reading), not all algorithms are faster on the GPU, even those that can be parallelized. Whether or not you get faster speeds relies on many factors. For example, off the top of the my head, performance on the GPU depends on:

1- The speed of the GPU (seems obvious but): Most integrated GPUs and those on standard laptops (not high-end ones) are slower than the CPUs on board. So running an algorithm on those GPUs will prove much slower than running them on the CPUs available.

2- Type of algorithm: If the algorithm requires a lot of data transfer between the CPU and the GPU, this will likely be a huge bottleneck.

3- The GPU manufacturer: For now, Nvidia's implementation is much better than AMD's or Intel's, and this is natural since they got into the GPU computing game much earlier than the rest, and they kind of drew the path for all the rest.

4- If you are working on a mobile robot and computation is done on-board the robot (as opposed to wirelessly on a desktop PC), having a fast-enough GPU on-board is probably not feasible since those consume a lot of power, so it would be hard to procure a battery powerful enough to handle it.

5- In practice (at least in today's technology), the best time to use GPU computation is when you have a desktop PC with a high-end GPU from Nvidia, those that require a larger system power supply, and when you have an algorithm that can be easily parallelized.



Thanks to Mr. Erik Smistad's excellent minimalist introduction to OpenCL, I have managed to set up AMD's OpenCL implementation on my development computer.

Soon after, I had clsurf up and running. It required some massage in order to work with a recent OpenCL version:
But it eventually compiled and ran successfully using CPU device (I don't have a dedicated GPU on my dev computer) to produce this lovely image:

Lena with SURF features marked with clsurf

Not bad for a midnight hack!


Flexibility, performance and scalability. Yes please!

I am very excited to have discovered that the path of 3 distinct fields of interest may intersect in an "almost too good to be true" combination of performance,  scalability, flexibility and developer-friendlyness .

I am talking about the vision code for my robot combined with OpenCL and LLVM. It turns out that many common vision algorithms such as SURF that I will need in the vision code for my robot may use OpenCL as a way to reach new levels of performance and scalability through the use of modern massively parallel hardware such as GPUs and APUs. Since OpenCL is inherently quite involved, the idea of making an  LLVM backend that automatically translate C/C++ code directly to compact and optimized OpenCL kernels is really appealing.

And this is what Simon Moll might just have made this possible through his thesis work which he has released. Thanks dude!

I hope to look into this as soon as i get more experience working with OpenCL in general.



Now that the filtergraph is operating, it's time to start implementing some detector nodes. Here are some "sourcing links" I have gathered as inspiration and potential starting points (for my own reference).

Performance will be an issue with many concurrently active detectors, and I have been giving that some thought as how to solve that. One optimization strategy would be to lower the number of invocations of each detector to the minimum. Another is adaptively disable detectors. For example face detection may run every 100 frames (4 seconds) until a face is detected, upon which it may be run more often as long as faces are present.

Another optimization strategy is to use a saliency (areas of interest) detection algorithm and increase rate of the other detectors in the areas with high interest.

I think the approach now is just to get a simple detector up and working, and take it from there.


First glimpse

My robot has finally been given the gift of visual perception. In other words, my filtergraph code can now record video. Behold DEVOL's very first visual impression:


ALSA Source

I just implemented a rudimentary audio source in my filtergraph using ALSA capture directly. I also implemented a "n-channel to mono" node since for some reason ALSA refuses to capture in mono on my development computer. It simply discards all but the selected channel and extracts the interleaved samples into its own buffer before passing that on.

Filtergraph first version complete

After working for weeks on the filtergraph testing out all sorts of approaches, I ended up with a fairly simple and loose approach.

In the schematics of my previous posts I talked about sources, sinks, pads, nodes and so forth. In my code I have avoided making the pins explicit. Each node is either a sink, a source or both. sink and source are implemented as abstract classes which provide the means to transfer buffers and to notify/get notified when buffers are available.

I decided against using boost::signals in the end because it really introduced alot of unecessary complexities and added a few hundrer kilobytes of extra swag to my errormessages that I really didn't need. Instead I opted for writing my own lean and pretty naive implementation of the observer pattern.


 * Source.hpp
 *  Created on: Jun 12, 2012
 *      Author: lennart

#ifndef SOURCE_HPP_
#define SOURCE_HPP_

#include "SimpleObserver.hpp"

namespace filtergraph {
 using namespace std;
 using namespace simple_observer;

 template<class T>
 class Sink;
 template<class T>
 class Source: public Observee {
   Source() :
     Observee() {

   virtual ~Source() {

   // Connect the given sink
   void addSubscriber(Sink<T> &sink) {
    //cout << "addSubscriber()n";
    registerObserver((Observer &) sink);

   // Disconnect the given sink
   void removeSubscriber(Sink<T> &sink) {
    //cout << "removeSubscriber()n";
    removeObserver((Observer &) sink);

   // Tell sinks connected to this source that a new buffer has arrived
   void broadcastSubscribers() {
    //cout << "broadcastSubscribers()n";
   // Block until a new buffer object is ready
   virtual void pumpSource() = 0;
   // Borrow current buffer object
   virtual T &borrowBuffer() = 0;
   // Return a copy of the current buffer object
   T &copyBuffer() {
    return new T(borrowBuffer());


} /* namespace filtergraph */
#endif /* SOURCE_HPP_ */


#ifndef SINK_HPP_
#define SINK_HPP_
#include "SimpleObserver.hpp"
#include "Source.hpp"
namespace filtergraph {

 using namespace std;
 using namespace simple_observer;
 template<class T>
 class Sink: public Observer {


   // Connect to the given source
   void subscribeTo(Source<T> &source) {
   //Disconnect from the given source
   void unsubscribeFrom(Source<T> &source) {

   Sink() :
     Observer() {

   virtual ~Sink() {


   void handleObserverNotification(Observee &observee) {
    handleSource((Source<T> &) observee);

   // Called by sources when new buffer is available
   virtual void handleSource(Source<T> &source)=0;


} /* namespace filtergraph */
#endif /* SINK_HPP_ */


Filtergraph documentation

The previous post introduced my plan of creating a filter graph library to make it easy to integrate different audio/video systems.

This post will contain design notes on the filter graph library for my own reference.

  • I will specify a precise terminology to make it simple to talk about the various parts of the system. 
  • I will use some "plumber" analogies as a basis for my terminology.
  • I will use boost::signal for propagating events about changes to the graph structure as well as the flow of data through the graph.
  • I will separate the concerns of graph structure from the concerns about data passing through the graph so that the filter graph code may be used to build graphs that can handle buffers of any kind.
Filtergraph terminology illustration

  • Graph: A system of one or more nodes linked together.
  • Node: A single component in the graph.
  • Pad: A connection point on a node which can be connected to exactly one other pad on another node. Can be either an input or an output.
  • Source: A node that has only output pad(s) and no input pads.
  • Sink: A node that has only input pad(s) and no output pads.
  • Filter: A node that has both input and output pad(s)
  • Connection: A link from an output pad of one node to an input pad of another node.
  • Pump: A node that determines the flow of control throughout the graph. When a node pushing data from node(s) connected to its output(s) or by pulling data from node(s) connected to its input(s) is said to ac as a pump. Usually there is a single source or sink in the graph acting as pump.
  • Simple: A node with only one input pad and/or one output pad is said to be simple. This makes it possible to talk about such things as "simple sink", or "simple filter pump".
  • Buffer: A piece of data traveling though the graph. For video the content of the buffer usually corresponds to one frame of video. For audio, it usually corresponds to a certain length of time in audio samples.



I have turned my attention to the software part of the robot. I have decided to attack the challenge of efficiently distributing buffers of audio/video to the different detectors and processors that will be required in the software stack.

Why is this a challenge?

If all processors and detectors were part of the same "package" then this would probably not be a challenge at all, but already before getting started I am aware of 3 separate systems that I will have to integrate against:
  • libav: input and basic preprocessing of audio and video content
  • OpenCV: Most video related detectors will use this
  • CMU Sphinx: speech recognition
I suspect this list will continue to grow rapidly as I extend my ambitions in the future.

I have looked wide and far for solutions to accommodate the need to interface this diverse mix of specialized software packages. My closest bet was libavfilter, but I have decided against using it because it fails to hide many complexities that are a direct result of libavfilter being written in C while aiming to be ultimately efficient and flexible at the same time. In my humble opinion you may choose any two out of those three and succeed in making something that is easy to use.

So what options remain?

Making my own of course! I have some experience with this from previous projects where ultimate efficiency was a goal (and before libavfilter was an option). Unfortunately I won't be able to reuse the code since it is proprietary to one of my previous employers.

Goals of the project:
  • Make it easy to integrate between the software packages I will use
  • Write it in standard C++/STL/boost
  • Use templates to hide complexities and keep efficiency at a maximum
  • Keep it simple and clean.
  • Make it somewhat easy to get started with.
Maybe it will result in a reusable library someday, bud don't hold your breath! I will release code for it when it's usable.


Useful sites

Mostly for my own reference, here is a list of links to sites that offer hardware,  software and data related to video processing, SLAM, natural language processing and logic/reasoning and other topics of interest to the DEVOL project.


4-stroke Engine

Software stack schematic

As in my previous post, here is a draft of how I plan to lay out the software in the DEVOL robot.

DEVOL Software Stack schematic draft

It can be broken down to the following components:

Audio and Video inputs are filtered through a graph of detectors, recognizers and other more or less involved processes that translate them into useful events such as people, objects, facial expressions, distances, location and so forth.

These events are gathered together with sensor inputs in an event inventory where all events are classified, sorted, persisted, enriched and refined in real-time.

The core of the system is the "consciousness driver" which is simply the main loop of the program. It relies heavily on its array of close assistants that are responsible for their respective expertises of keeping up communications with HQ, inducing logic , keeping track of objectives, keeping track of ongoing dialogues, keeping track of appearances in form of pose and avatar and so on.

The consciousness driver will base its decisions on the content of the event inventory and its decisions will result in changes to pose, additions to the local logic facts database, output of audio dialogue and display of avatar on screen.

Power and electronics schematic

I made a first sketch for the power and electronics diagram for DEVOL.
DEVOL power and electronics schematic
In essence the robot will rely on a 12V lead battery as the main source of power. A small gasoline powered generator will serve as a means to keeping this battery charged when in the field.

Power will be distributed from the battery via two separate regulators, a delicate and stable regulator for controllers and logic and a more robust and protecting  regulator for the power-hungry actuator motors.

The system is kept modular so that the different components such as visual input, strategic planning and real-time control may be handled by a separate computer (I suspect that especially the vision part will require a rather powerful computer).

The actuators are connected to a serial bus that distributes commands for each actuator from the real-time controller. Power is distributed along a separate power rail.

Visual input is provided by a calibrated stereo pair (two Logitech 9000 Pro) and another "long range" camera (Logitech 9000 Pro with mounted tele-lens). The whole camera rig is supposed to be put on a pan-tilt rig, guarded from the elements by a glass/plastic dome.

For audio, a hand-held zoom H1 stereo recorder, which provides high quality, low latency sound card and high quality microphones requiring very little power. It also has a third input where I intend to plug in a "long range" so called "shotgun microphone".

This is the first draft, expect drastic changes!

CNC Mill

When working with my first prototype robot limb, I decided to make it from plastic tubing. This decision was made mostly because of cost since plastic tubing is very cheap. Another aspect was availability and space. It became clear after creating this first prototype that I would need other materials to construct my prototypes from.

Right now I don't have space in my apartment for many tools I am looking to buy a house soon, and hopefully it will have plenty of room for my workshop in the basement.

Looking farther into the future, a lot of the parts for the robot will inevitably be machined from metal, and that requires me to get hold of a mill and other metal working tools. Since a CNC mill is really expensive, and since it is basicallt just a robot with 3 actuators (or 4 if you get fancy) I have therefore decided to create my own CNC mill as part of this project.

Since the creation of this CNC mill is not the primary goal of the project, I have decided to cut many corners to speed up its construction. I have cheated and bought a book on the subject. Basically the book is a guide written by someone like me who would like to make their own CNC.

I will take its advice, and use parts from the book directly, but change some things to better suit my own needs.


The best robots are free

As you may have noticed there has been a gap of over 9 months in my blog. This has it's valid reasons, you see I made (with a little help) an autonomous, waterproof, self aware, self repairing biped robot with stereo vision, and with extremely advanced psycho-visual and psycho-aural perception capabilities connected to its extreme high-dynamic-range stereo visual and aural inputs that run solely on biological waste and oxygen from the atmosphere.

In other words, I just got my baby boy, and I can tell you that even if it doesn't come with a manual, it is pretty sweet! He and I will probably continue this blog, albeit at a lower pace, when time and budget allows!

My most advanced robot thus far...