Showing posts with label vision. Show all posts
Showing posts with label vision. Show all posts

2017-10-04

Announcement; I am 94% person!

I am happy to announce that I have been found to be 94% person by darknet. It sure put the case to rest.





I have built it with CPU support only, since cuda install has botched completely on my system, and so it only runs the detection at 0.1 fps.

2016-01-04

OctoMY™ on github

I decided to wrap my current code into a project and put it on github. I also made a project page on google sites and bought a domain name for it. The logo I sketched out real quick in inkscape looks like this:


Logo for eight legged madness!
Logo in SVG format can be downloaded here

Currently I have just posted my work-in-progress code that compiles without errors on Ubuntu, but that does not really do anything useful. I will keep this updated as I progress in making the code more useful!





2013-08-13

Riftcam

I have previously written about OpenGL code for the ocular rift, and Now after receiving my own I have yet to go further with writing the perspective correction code. The truth is that even if the rift is one awesome piece of gadget that I believe will change the way we interact with computers in the near future, I still have a strong mandate in this project that I have to prioritize, and that is to build my robot. Until I actually need the rift for something in my project, It will be sitting in its box.

Oculus rift illustration 
That being said, I have contacted several surveillance camera manufacturers on Alibaba with requests about creating a stereo vision PTZ dome camera with oculus compatible wide angle lenses. I actually had some response in this regard. I will keep this blog updated.


2013-02-09

OpenGL source code for Oculus Rift

If you are at all interested in gadgets, graphics, games you have surely picked up news about the up and coming disruptive technology that is oculus rift. If you haven't, then you should definitely check it out.

I have ordered my developer kit already, and I am looking forward to using the head mount display as a way to display the UI for my  robot.

Since there isn't really much to go on yet when it comes to code examples and such, I decided to create example source code for oculus rift using OpenGL. A little tutorial if you will, for rendering a scene in OpenGL in a way that will be adaptible to the VR gadget when it arrives.

First some explanation. When you render in 3D, you will create what is called a frustum, which means "view volume". This is usually shaped as a pyramid protruding from your viewpoint in the scene, in the direction you are "viewing".


Regular perspective frustum in OpenGL

This is fine when you are rendering to a regular monitor. However when you want to display the result in any form of stereoscopic display such as a 3DTV, VR goggles or similar, you will have to render TWO frustums, one for each eye.

Stereoscopic perspective frustum in OpenGL

Since most stereoscopic displays today have moderate field of view, the image will not be very immersive at all. The Oculus Rift changes this by boosting the field of view (also known as the view angle) to 110 degrees. This goes beyond what we are able to preceive, and will together with the stereoscopic 3D effect give a very immersive effect.

Wide angled stereoscopic perspective frustum (Oculus Rift style) in OpenGL

So how is this done in OpenGL? This entry in the OpenGL FAQ sums it up really nicely.

What are the pros and cons of using glFrustum() versus gluPerspective()? Why would I want to use one over the other?
glFrustum() and gluPerspective() both produce perspective projection matrices that you can use to transform from eye coordinate space to clip coordinate space. The primary difference between the two is that glFrustum() is more general and allows off-axis projections, while gluPerspective() only produces symmetrical (on-axis) projections. Indeed, you can use glFrustum() to implement gluPerspective(). However, aside from the layering of function calls that is a natural part of the GLU interface, there is no performance advantage to using matrices generated by glFrustum() over gluPerspective().
Since glFrustum() is more general than gluPerspective(), you can use it in cases when gluPerspective() can't be used. Some examples include projection shadows, tiled renderings, and stereo views.
Tiled rendering uses multiple off-axis projections to render different sections of a scene. The results are assembled into one large image array to produce the final image. This is often necessary when the desired dimensions of the final rendering exceed the OpenGL implementation's maximum viewport size.
In a stereo view, two renderings of the same scene are done with the view location slightly shifted. Since the view axis is right between the “eyes”, each view must use a slightly off-axis projection to either side to achieve correct visual results.

The glFrustum call will in other words allow you to set up a view matrix that with the necessary offset. But how should we go about rendering the scene? The oculus rift expects the image for each eye to be rendered side by side, so we simply render the scene twice, using the proper viewport each time. Again, from the OpenGL FAQ:

9.060 How can I draw more than one view of the same scene?
You can draw two views into the same window by using the glViewport() call. Set glViewport() to the area that you want the first view, set your scene’s view, and render. Then set glViewport() to the area for the second view, again set your scene’s view, and render.
You need to be aware that some operations don't pay attention to the glViewport, such as SwapBuffers and glClear(). SwapBuffers always swaps the entire window. However, you can restrain glClear() to a rectangular window by using the scissor rectangle.
Your application might only allow different views in separate windows. If so, you need to perform a MakeCurrent operation between the two renderings. If the two windows share a context, you need to change the scene’s view as described above. This might not be necessary if your application uses separate contexts for each window.
With no further ado, here is my working code for a stereoscopic view, which I think will work pretty well with the oculus rift from what I have gathered. It might need some tweaking with respect to projection mapping as they have been talking about  "adjusting for fisheye effect". However, I assume that it will be easy to perform with a custom projection matrix.

/*
 * StereoView.hpp
 *
 *  Created on: Feb 7, 2013
 *      Author: Lennart Rolland
 */

#ifndef STEREO_VIEW_HPP_
#define STEREO_VIEW_HPP_

#include "GLStuff.hpp"
#include "View.hpp"

using namespace std;
// Magic constant
const float DTR = 0.0174532925f;
// Intraocular distance (distance between eyes, should match the real distance between the eyes of the viewer when realism is a goal)
const float IOD = 0.5f;

class StereoView: public View {
private:

 class Eye {
 private:
  float left;
  float right;
  float bot;
  float top;
  float translation;
  float near;
  float far;
 public:

  Eye(float lf, float rf, float bf, float tf, float mt, float near, float far) :
    left(lf), right(rf), bot(bf), top(tf), translation(mt), near(near), far(far) {
  }

  void apply() {
   glMatrixMode (GL_PROJECTION);
   glLoadIdentity();
   //Set view frustum
   glFrustum(left, right, bot, top, near, far);
   //Translate to cancel parallax
   glTranslatef(translation, 0.0, 0.0);
   glMatrixMode (GL_MODELVIEW);
  }
 };

 int w, h;
 float aspect, top, right, shift, distance;
 Eye eyeLeft, eyeRight;
 bool useViewports;

 void init(void) {
  glMatrixMode (GL_PROJECTION);
  glLoadIdentity();
  glMatrixMode (GL_MODELVIEW);
  glLoadIdentity();
 }

 void drawSceneInstance(Scene &scene, Engine &e) {
  glPushMatrix();
  //Translate to screen plane
  glTranslatef(0.0, 0.0, distance);
  scene.render(e);
  glPopMatrix();
 }

 void selectEye(bool left) {
  //Use viewports
  if (useViewports) {
   const int w2 = w / 2;
   glViewport(left ? 0 : w2, 0, w2, h);
   glScissor(left ? 0 : w2, 0, w2, h);

   glEnable (GL_SCISSOR_TEST);
   glClearColor(left ? 1.0 : 0, 0, left ? 0 : 1.0, 1.0);
   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
   glDisable(GL_SCISSOR_TEST);

   //cout << "viewport left:" << left << "\n";
  }
  //Use native OpenGL stereo back buffers
  else {
   glDrawBuffer(left ? GL_BACK_LEFT : GL_BACK_RIGHT);
   //cout << "buffer left:" << left << "\n";
  }
 }

public:
 StereoView(int w = 1280, int h = 720, bool useViewports = true, float near = 3.0, float far = 30.0, float fov = 110, float screenZ = 10.0, float distance = -10.0) :
   w(w), h(h), aspect(double(w) / double(h)), top(near * tan(DTR * fov / 2)), right(aspect * top), shift((IOD / 2) * near / screenZ), distance(distance), eyeLeft(top, -top, -right + shift, right + shift, IOD / 2, near, far), eyeRight(top, -top, -right - shift, right - shift, -IOD / 2, near, far), useViewports(useViewports) {
 }

 virtual ~StereoView() {
 }

 void resize(int w, int h) {
  float fAspect, fHalfWorldSize = (float) (1.4142135623730950488016887242097 / 2);
  glViewport(0, 0, w, h);
  glMatrixMode (GL_PROJECTION);
  glLoadIdentity();
  if (w <= h) {
   fAspect = (GLfloat) h / (GLfloat) w;
   glOrtho(-fHalfWorldSize, fHalfWorldSize, -fHalfWorldSize * fAspect, fHalfWorldSize * fAspect, -10 * fHalfWorldSize, 10 * fHalfWorldSize);
  } else {
   fAspect = (GLfloat) w / (GLfloat) h;
   glOrtho(-fHalfWorldSize * fAspect, fHalfWorldSize * fAspect, -fHalfWorldSize, fHalfWorldSize, -10 * fHalfWorldSize, 10 * fHalfWorldSize);
  }
  glMatrixMode (GL_MODELVIEW);
 }

 void renderView(Scene &scene, Engine &e) {
  init();
  gluLookAt(pos.x, pos.y, pos.z, dir.x, dir.y, dir.z, up.x, up.y, up.z);
  //Clear color and depth for all buffers
  glDrawBuffer (GL_BACK);
  glViewport(0, 0, w, h);
  //Left eye
  selectEye(true);
  eyeLeft.apply();
  drawSceneInstance(scene, e);
  //Right eye
  selectEye(false);
  eyeRight.apply();
  drawSceneInstance(scene, e);
  glDrawBuffer(GL_BACK);
  glViewport(0, 0, w, h);
  glDisable (GL_SCISSOR_TEST);
 }

};

#endif /* STEREO_VIEW_HPP_ */



2012-07-13

Insights from an expert

While shamelessly asking Mr. Samir Shaker for the code accompanying his paper on using OpenCL while implementing the SLAM algorithm, I received very insightful and extremely relevant advice based on his hard earned experience, which I have reproduced in part here (with his permission of course):

 [...] it seems that you are using the AMD implementation of OpenCL. I have worked with both the AMD and Nvidia implementations extensively, and it would be safe to say that Nvidia's implementation is much faster and much more completely. The biggest flaw in the AMD implementation I would say is the lack of support for images in OpenCL. This is a driver issue, and they plan on supporting images eventually, but after all the time that passed since the OpenCL standard they still haven't done so! My code uses images, so it would only run on an Nvidia implementation (for now).

Also, as a general remark, I would like to tell you that from experience (and a lot of reading), not all algorithms are faster on the GPU, even those that can be parallelized. Whether or not you get faster speeds relies on many factors. For example, off the top of the my head, performance on the GPU depends on:

1- The speed of the GPU (seems obvious but): Most integrated GPUs and those on standard laptops (not high-end ones) are slower than the CPUs on board. So running an algorithm on those GPUs will prove much slower than running them on the CPUs available.

2- Type of algorithm: If the algorithm requires a lot of data transfer between the CPU and the GPU, this will likely be a huge bottleneck.

3- The GPU manufacturer: For now, Nvidia's implementation is much better than AMD's or Intel's, and this is natural since they got into the GPU computing game much earlier than the rest, and they kind of drew the path for all the rest.

4- If you are working on a mobile robot and computation is done on-board the robot (as opposed to wirelessly on a desktop PC), having a fast-enough GPU on-board is probably not feasible since those consume a lot of power, so it would be hard to procure a battery powerful enough to handle it.

5- In practice (at least in today's technology), the best time to use GPU computation is when you have a desktop PC with a high-end GPU from Nvidia, those that require a larger system power supply, and when you have an algorithm that can be easily parallelized.



2012-07-11

clsurf

Thanks to Mr. Erik Smistad's excellent minimalist introduction to OpenCL, I have managed to set up AMD's OpenCL implementation on my development computer.

Soon after, I had clsurf up and running. It required some massage in order to work with a recent OpenCL version:
#define CL_USE_DEPRECATED_OPENCL_1_1_APIS
But it eventually compiled and ran successfully using CPU device (I don't have a dedicated GPU on my dev computer) to produce this lovely image:

Lena with SURF features marked with clsurf

Not bad for a midnight hack!

2012-07-08

Flexibility, performance and scalability. Yes please!

I am very excited to have discovered that the path of 3 distinct fields of interest may intersect in an "almost too good to be true" combination of performance,  scalability, flexibility and developer-friendlyness .

I am talking about the vision code for my robot combined with OpenCL and LLVM. It turns out that many common vision algorithms such as SURF that I will need in the vision code for my robot may use OpenCL as a way to reach new levels of performance and scalability through the use of modern massively parallel hardware such as GPUs and APUs. Since OpenCL is inherently quite involved, the idea of making an  LLVM backend that automatically translate C/C++ code directly to compact and optimized OpenCL kernels is really appealing.

And this is what Simon Moll might just have made this possible through his thesis work which he has released. Thanks dude!

I hope to look into this as soon as i get more experience working with OpenCL in general.


2012-07-01

First glimpse

My robot has finally been given the gift of visual perception. In other words, my filtergraph code can now record video. Behold DEVOL's very first visual impression: