The OctoMY™ Blog: January 2025

2025-01-30

Construct using QRhi

Qt Rendering Hardware Interface is the underlying mechanism that Qt uses to do hardware accelerated rendering cross platform across many heterogenous platforms such as

OpenGL
OpenGL ES
Vulcan
DirectX (Microsoft)
Metal (Apple)

Previously Qt relied on OpenGL under the hood, but due to the evolution of graphics APIs the decision was made to tranistion to this abstraction layer.

The way QRhi works is by standardizing on a low level "common denominator" API similar to that of Vulcan and then adorning it with well integrated Qt spesific tools, helper classes and helper functions to make it as easy as possible to work with.

For example, Qt Shader Tools (qst) is a commandline tool that allows taking as input shaders in a single source language and spit out a compiled shader binary that is compatible across platforms.

Another example is how easy it is to use existing Qt resource management to work with texture data:

void ExampleRhiWidget::updateCubeTexture()
{
    QImage image(CUBE_TEX_SIZE, QImage::Format_RGBA8888);
    const QRect r(QPoint(0, 0), CUBE_TEX_SIZE);
    QPainter p(&image);
    p.fillRect(r, QGradient::DeepBlue);
    QFont font;
    font.setPointSize(24);
    p.setFont(font);
    p.drawText(r, itemData.cubeText);
    p.end();

    if (!scene.resourceUpdates)
        scene.resourceUpdates = m_rhi->nextResourceUpdateBatch();
    scene.resourceUpdates->uploadTexture(scene.cubeTex.get(), image);
}

We want to create a virtual environment in OctoMY™ called construct that allows us to edit robot layouts, preview 3D data and more. We want to use the modern approaches available in Qt to do this. Since the Qt3D project is deprecated, we will instead focus on using QRhi directly.

2025-01-07

OctoMY™ Anniversary 2025!

We are celebrating the 9th anniversary of OctoMY™ which is crazy!

Looking at the project history, you can see it has not been updated the last 2 years, so here is an attempt at explaining the status of the project.

After being dormant for a long time, 2024 marked a ramp up of development again. In summary it felt completely invigorating to finally focus on this passion again!
The project has gone through a bit of a leaning process with old archaic stuff being streamlined or removed entirely.
The user experience of basic functionality such as delivery, hardware configuration and pairing has been completely revamped and lots of bugs and ugly legacy was cleaned up.
Many components were re-arranged and re-structured to become more focused.
We are now 100% into Qbs as a build system and it is working great, much better than expected!
Side projects like website and a large chunk of python code relating to server side stuff has been somewhat aligned to the main project.
Website has gone through a few iterations of refinement. Since it is being co-developed with another project, some of the updates are still not visible on octomy.org but expect them early to mid 2025.
New feature development which is next in the pipeline is described below.

New development currently in focus early 2025:

During the clean-up effort mentioned, it became clear that the internal structure of the project was not going to work. The structure is very complex and so reasoning about it is hard, and making real commitment in the form of design and architectural documentation that can be followed during development was impossible.

Without going into too much technical detail, here are the points of contention:

There is no generalized way to represent the internal state of agents and other nodes that can be reliably transmitted to other nodes while giving the end user full understanding of the state as it is evolving. For example, if the agent has a headlamp then the chain of command for the headlamp is lamp <-power wire-> power-relay <-signal wire-> controller board <-usb wire-> agent phone <-unreliable network connection-> remote phone. In this chain, the app on the agent phone should show the actual status of the lamp (on/off) and ensure that the state is propagated to the remote app. Since the network connection is unreliable, we can never know for sure if the app is actually showing reality or not. So the UI needs to tell the user something about the certainty of each state it displays. Under the hood it also needs to prioritize transmitting confirmation of the most critical state. For example the headlamp is more important than a decoration lamp, but less important than the status of a running motor used for locomotion.
The ArduMY controller is amazing, however much of the code for it should really be generalized across all controllers. This is difficult be4acuse the code is complex and needs to be flawless for the system to work. The current version has a barrage of tests ensuring the quality which is awesome except now we are supposed to tear it apart and refactor it, which means we are in effect "destroying" a working system. The way forward is to create a new version and pick piece by piece from the old working version. This is painstaking work that needs to be completed before we can move on.
Finally bringing everything together is really hard because each part has their own complications. This is the network effect in place, or the Pareto principle if you will. The reason why most IT projects fail. Getting 80% of the way takes 20% of the effort. Given we spent 9 years getting to 80%, this does not bode well for our timeline going forward xD

To keep sane during all these refactoring efforts, we also focus on developing actual new features as a palette cleanser. A few very adjacent features have been included:

FACS support. One of the main pillars of OctoMY™ is to treat agents as real living creatures from the start. They deserve respect in their own right and they deserve their own way of expressing themselves. This means they will have a personality, a voice and a face! So far we have put some effort into showing the eyes of agent. The natural progression of this is to make an expressive face. The most prominent research on this topic has culminated into an industry standard of sorts called FACS. Unfortunately there seems to be no good existing system of supporting FACS. Everyone, including researchers have put all their works into walled gardens outside the reach of open source projects like OctoMY™. So this will be an avenue where we can actually contribute work into a sorely needed sub-project.
As a direct sub-requirement if supporting FACS fully, we need to recognize facial expression from images/video streams and since OctoMY™ does not include opencv support, we have to make our own. This has prompted a requirement for HAAR like feature extraction algorithm to be implemented. We will support opencv XML files for feature descriptions, making it trivial to inter-operate with existing expertise developed in opencv without depending on the code.
Planned: support for other feature extraction algorithms like YOLO.

I think this is all for this anniversary status update.