The OctoMY™ Blog: July 2012

2012-07-27

CNC Mill and Honda GX25 engine ordered

I had previously decided to make my own CNC Mill as part of this project. That was before I got money on my hands, and before I found a great offer on a used CNC Mill.

EMCO F1P-CNC Mill (to the right )with matching
EMCO tronic M1 controller (to the left).
Image shamelessly copied of the internet.

I have ordered it, and it will be arriving at my apartment in a week. It is an old "EMCO F1P" with a matching control/monitor unit "M1". Mor details will follow as I "unbox" it.

I also bought a Honda GX25 mini 4-stroke 1-cylinder petrol engine. If you have seen the post about power and electronics you will know that I intended to use a generator on board to provide longer range.

The engine arrived just 1 day after I ordered it, the unboxing of which will follow in a later post.

2012-07-13

Insights from an expert

While shamelessly asking Mr. Samir Shaker for the code accompanying his paper on using OpenCL while implementing the SLAM algorithm, I received very insightful and extremely relevant advice based on his hard earned experience, which I have reproduced in part here (with his permission of course):

[...] it seems that you are using the AMD implementation of OpenCL. I have worked with both the AMD and Nvidia implementations extensively, and it would be safe to say that Nvidia's implementation is much faster and much more completely. The biggest flaw in the AMD implementation I would say is the lack of support for images in OpenCL. This is a driver issue, and they plan on supporting images eventually, but after all the time that passed since the OpenCL standard they still haven't done so! My code uses images, so it would only run on an Nvidia implementation (for now).

Also, as a general remark, I would like to tell you that from experience (and a lot of reading), not all algorithms are faster on the GPU, even those that can be parallelized. Whether or not you get faster speeds relies on many factors. For example, off the top of the my head, performance on the GPU depends on:

1- The speed of the GPU (seems obvious but): Most integrated GPUs and those on standard laptops (not high-end ones) are slower than the CPUs on board. So running an algorithm on those GPUs will prove much slower than running them on the CPUs available.

2- Type of algorithm: If the algorithm requires a lot of data transfer between the CPU and the GPU, this will likely be a huge bottleneck.

3- The GPU manufacturer: For now, Nvidia's implementation is much better than AMD's or Intel's, and this is natural since they got into the GPU computing game much earlier than the rest, and they kind of drew the path for all the rest.

4- If you are working on a mobile robot and computation is done on-board the robot (as opposed to wirelessly on a desktop PC), having a fast-enough GPU on-board is probably not feasible since those consume a lot of power, so it would be hard to procure a battery powerful enough to handle it.

5- In practice (at least in today's technology), the best time to use GPU computation is when you have a desktop PC with a high-end GPU from Nvidia, those that require a larger system power supply, and when you have an algorithm that can be easily parallelized.

2012-07-11

clsurf

Thanks to Mr. Erik Smistad's excellent minimalist introduction to OpenCL, I have managed to set up AMD's OpenCL implementation on my development computer.

Soon after, I had clsurf up and running. It required some massage in order to work with a recent OpenCL version:

#define CL_USE_DEPRECATED_OPENCL_1_1_APIS

But it eventually compiled and ran successfully using CPU device (I don't have a dedicated GPU on my dev computer) to produce this lovely image:

Lena with SURF features marked with clsurf

Not bad for a midnight hack!

2012-07-08

Flexibility, performance and scalability. Yes please!

I am very excited to have discovered that the path of 3 distinct fields of interest may intersect in an "almost too good to be true" combination of performance, scalability, flexibility and developer-friendlyness .

I am talking about the vision code for my robot combined with OpenCL and LLVM. It turns out that many common vision algorithms such as SURF that I will need in the vision code for my robot may use OpenCL as a way to reach new levels of performance and scalability through the use of modern massively parallel hardware such as GPUs and APUs. Since OpenCL is inherently quite involved, the idea of making an LLVM backend that automatically translate C/C++ code directly to compact and optimized OpenCL kernels is really appealing.

And this is what Simon Moll might just have made this possible through his thesis work which he has released. Thanks dude!

I hope to look into this as soon as i get more experience working with OpenCL in general.

2012-07-04

Detectors

Now that the filtergraph is operating, it's time to start implementing some detector nodes. Here are some "sourcing links" I have gathered as inspiration and potential starting points (for my own reference).

Performance will be an issue with many concurrently active detectors, and I have been giving that some thought as how to solve that. One optimization strategy would be to lower the number of invocations of each detector to the minimum. Another is adaptively disable detectors. For example face detection may run every 100 frames (4 seconds) until a face is detected, upon which it may be run more often as long as faces are present.

Another optimization strategy is to use a saliency (areas of interest) detection algorithm and increase rate of the other detectors in the areas with high interest.

I think the approach now is just to get a simple detector up and working, and take it from there.

2012-07-01

First glimpse

My robot has finally been given the gift of visual perception. In other words, my filtergraph code can now record video. Behold DEVOL's very first visual impression: