Deep Learning Terms For Toddlers

The best way to learn is to teach others. That is why I am making this list of terms related to deep learning in a way that is meant to be easy to understand for total beginners like myself.

This list tries to start with the basics, and tries to be readable from start to finish by only referencing items that are already covered earlier. It also intentionally spreads out the buzzwords so they can be grasped one by one.

It will try to stick to one term when several terms mean the same, and it will try to debunk ambiguities and overlaps when possible.

But first a fair warning/std disclaimer: I am doing this only in the hope that I personally will learn from it, and I share it with you for it's entertainment value only.

  • Nerve Cell: See Neuron.
  • Neuron: The kind of cell that brains are made of. Works by sending a pulse to other neurons when it receives a pulse. It only sends out a pulse if it receives pulses fast enough or strong enough for its liking. Neurons are modeled in software as "Artificial Neurons" to form "Artificial Neural Networks".
  • Synapse: A connection between the output of one Neuron and the input of another. This is where pulses travel between neurons. In a computer model synapses may be greatly simplified.
  • Transfer Function: See activation Function
  • Activation Function: The function in a neuron that triggers an output based on the sum of its inputs. There are several types (see the section below), but the most relevant for Deep learning is called  ReLU ("Rectified Linear Unit").
  • Neural Network: A number of neurons connected into a network. The number of neurons, number of connections, and the way in which the connections are made (aka the "architecture") can vary greatly and are all deciding factors for how the neural network will work, and what it can be used for.
  • Layer: In a Neural Network neurons can be arranged in layers. In each layer the neurons typically have their outputs connected only to neurons in the next layer of the model. Several layers containing different number of neurons can be connected, and each layer will then serve a separate purpose in the network. Connections in a layer typically go from one neuron in the previous layer to several neuroons in this layer. Layers have different names depending on their use:
    • Input: Special purpose layer where data is first entered into the network
    • Output: Special purpose layer where results exit from the network
    • Hidden: "Normal" layer that simply stores or processes data through it's neurons.
    • Pool: See this article.
    • Soft-max: See this article.
  • Connection topology:
    • Fully-connected: All neurons are connected to all other (N! connections);
    • Locally-Connected: Only neighboring  neurons are connected.
  • Connection Weight: See Weight.
  • Weight: Neurons trigger when the value they receive from their connected inputs reach a certain level. The value from each of it's inputs are adjusted according to the weight before entering the activation function, so even if 10 neurons all send to us a strong pulse they may be weighted down to 0 and not cause activation.
  • Learning: What the network "learns" is simply the values that it stores in its weights. The exact meaning of what a network learns we cannot know because the process by which the learning takes place is indirect and very complex, just like in a real brain. However we can use the network and its knowledge simply by passing pulses through it and recording the output. There are several forms of learning:
    • Supervised: The program knows the ground truth for training data and knows when the output from the network is good or bad, and can correct the network weights thereafter.
    • Unsupervised: The program does not know what is good or bad, but judges it's performance by trying to replicate the input after it has gone through a network whose architecture does not allow the input to simply be copied verbatim, and thereby forcing the network to "learn something, and demonstrate it later".
    • Back-propagation: A method of supervised learning where the weights of the connections in a neural network are adjusted to minimize the error of the output generated by the network when compared to ground truth.
    • Reinforcement Learning: Training with sparse training data, learning by reinforcing good behavior without the need for extensive training data.
  • Network Types: As mentioned in the section about Neural Networks, they can be arranged in all sorts of ways. Here is a list of network types:
    • Perceptron: A single layer neural network that classifies input as either "in" or"out" (a.k.a. binary classification). For example, it can determine if the input is an image of a dog or not.
    • MLP ("Multi Layer Perceptron"): see "Perceptron".
    • CNN ("Convolutional Neural Network)
    • FNN ("Feedforward Neural Network"): An artificial neural network where the neurons do not form cycles (as opposed to recurrent neural networks).
    • RNN ("Recurrent Neural Network"): A network where neurons form "cycles" (as opposed to feed forward neural networks). This imposes fewer restriction on which connections may be made between neuron in the network. Several related concepts:
      • Fullly Recurrent Network:
      • Hopfield Network:
      • Elman Network:
      • Jordan Network:
      • ESN ("Echo State Network"):
      • LSTM ("Long Short Term Memory Network"):
      • BRNN ("Bi-Directional Recurrent Neural Network"):
      • CTRNN ("Continuous Time Recurrent Neural Network"):
      • Hierarchical:
      • Recurrent Multilayer Perceptron:
      • Second Order Recurrent Neural Network:
    • Cognitron: An early implementation of a self-contained multi-layered neural network that used un-supervised learning.
    • Neocognitron: Evolution of the cognitron that improved some of it's shortcomings.
    • Topographic map: See Self Organizing Feature Map
    • Kohonen map: See Self Organizing Feature Map
    • Kohonen network: See Self Organizing Feature Map
    • SOM ("Self Organizing Map"): See Self Organizing Feature Map
    • DBN ("Deep Belief Network")
    • SOFM ("Self Organizing Feature Map"): A network that applies competitive learning to create a "map" that organizes itself to map the input space.
    • GSOM ("Growing Self Organizing Map"): A variant of SOM where nodes are added to the map following some heuristic. Invented to overcome the problem that deciding a map size that works well is difficult, in GSOM the map starts small and grows adaptability until it is "big enough".
    • Diabolo Network: See Autoencoder.
    • Autoassociator: See Autoencoder.
    • Autoencoder: A non-recurrent neural network with the same number of input as outputs and with at least one hidden layer, that when receiving as input the value X is trained not to generate some output Y but to reconstruct its input X. Autoencoders are inherently well suited to unsupervised learning.
      • Denoising autoencoder: An auto encoder that is trained on corrupted versions of the input in order to make it learn more robust features.
      • VAE ("Variational Autoencoder"): A generative model that is similar in architecture to a normal autoencoder, but has a completely different usage.
  • Ground truth: When training a neural network we supply an input and expect an output. The ideal expected output is called ground truth.
  • Deep Neural Network: The deepness of a neural network refers to the number of layers in the network, where typical deep networks have more than 2 hidden layers.
  • Oldschool methods: Not related to neural netowkrs, but a common hand-crafted method to solve a problem that has been surpassed by recent deep learning approaches.
    • SIFT.
    • SURF.
  • Boltzman Machine: A wobbly and fun neural network that is run continuously to reach different states. Albeit their intriguing nature, such networks are useless unless they are being restricted in particular ways.
  • RBM ("Restricted Boltzman Machine"): A Neural network that can learn the probability distribution of its inputs.
  • Generative model: A model that generates random output conforming to a preset distribution.
  • Stochastic neural network: A neural network that has either stochastic activation functions or random weights assigned to it. Used in training to help avoid reaching local minimum.
  • Greedy Learning: train each component of a network on its own instead of trying to train the whole network at once.
  • Convnet: See Convolutional Neural Network.
  • Training protocol: How a network is trained
    • Purely supervised
    • Unsupervised layerwize, supervised classifier on top
    • Unsupervised layerwize, global supervised fine-tuning
  • Intermediate representations: What the network or architecture actually learned at each level.
  • Reward Function: See Objective Function
  • Profit Function: See Objective Function
  • Fitness Function: See Objective Function
  • Cost function: See Objective Function.
  • Loss function: See Objective Function.
  • Objective Function: A function that maps the cost of an event. If we wish to mimimize the cost, the function may be referred to as a loss function. If we wish to maximize the cost we refer to it as a reward function.
  • Optimization: Finding the input that provides the best output from a objective function. In a loss function this means the lowest output, in a reward function it means the highest output.
  • Gradient: A smooth sloped 2D graph with dips and hills.
  • Gradient Descent: Finding nearest local minimum by moving in steps towards the steepest downward slope in the model.
  • Incremental gradient descent: See Stochastic Gradient Descent.
  • SGD ("Stochastic Gradient Descent"): A gradient descent method that iteratively takes random steps towards the slope in an effort to find local minimum.
  • Conjugate gradient: Alternative to gradient descent that is used for solving linear equations.
  • FPROP ("Forward Propegation"): To feed data into a neural network to get the resulting value output from the network. May also be called "testing" the network. The output can be compared with the ground truth/real value and the weights of the network can be ajusted based on the deviance.
  • BPROP ("Back Propegation"): To minimize error in the network, you propagate backwards by finding the derivative of error with respect to each weight and then subtracting this value from the weight value.
  • Energy based unsupervised learning:
    • PSD ("Predictive Sparse Decomposition"): TODO.
    • FISTA ("Fast Iterative Shrinkage-Thresholding Algorithm"): TODO.
    • LISTA("Learned Iterative Shrinkage-Thresholding Algorithm"): TODO.
    • LcoD ("Learning Coordinate Descent"): TODO.
    • DrSAE ("Discriminative Recurrent Sparse Auto-Encoder"): TODO.
  • Deep Learning: Broad term realting to working with "deep networks" and related technology to solve problems.
  • Manifold: a 2D surface wrapped in 3D space (think curled up sheet of paper) to form a shape that can be both be treated as a Cartesian coordinate system locally but where each point in the map actually has 3 coordinates as well.
  • Entangled Data Manifold:
  • Linear separability: When a line (in 2D), plane (in 3D) or hyperplane (in n-D) can be found such that two distinct sets of points can be completely separated by it.
  • Hebb's rule: In a biological brain synapses that are used often are strengthened while synapses that are not will weaken.
  • Feature Extractor: Derived values (features) derived from input data intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps.
  • Invariant Feature Learning: TODO.
  • Trainable Feature: Something that can be understood by a neural network. In deep learning each layer typically learn features of different levels of abstraction. Low-level features are learned at at levels close to the input level. High-level features, while for each progressive layer in the model more complex and abstract features are learned.
    • Image Recognition
      • Pixel
      • Edge
      • Texton: Textons refer to fundamental micro-structures in generic natural images and the basic elements in early (pre-attentive) visual perception.
      • Motif
      • Part
      • Object
    • Text
      • Character
      • Word
      • Word group
      • Clause
      • Sentence
      • Story
    • Speech
      • Sample
      • Spectral Band
      • Sound
      • Phone
      • Phoneme
      • Word
  • Classification: Determine to which category an observation belongs.
  • ("Support Vector Machine"):
  • CV ("Computer Vision"): The field of study relating to processing of visual data in computers such as images and videos.
  • Feature: An artifact such as a pattern in an image or phone in an audio sample that can be reliably detected, tracked, parameterized and stored by feature recognizers for processing feature classifiers.
    • Haar like features: See Haar Features.
    • Haar features:
  • Integral image: Image representation, or look-up-table where sums of intensities are stored per pixel instead of direct image intensities. This simple structure is used to optimize performance of some computationally expensive image processing algorithms that depend on these sums.
  • Feature Engineering: Hand-crafting of feature detectors.
  • Stride: How many units (such as pixels, neurons etc). a sliding window travels between iterations during input of (image) data into network either during training or testing.
  • Training: Changing a network to make it better at solving the problem for which it is being designed.
  • Testing: Using a network without changing it to either see how well the network is working during training or to actually use the output in production.
  • Ventral Pathway: A pathway in the mamal brain where visual input is recognized. It has several stages each with it's own intermediate representation just like a deep neural network.
  • Sparse Modeling:
  • Neuron Cluster:
  • Shared Weights:
  • ICA ("Independent Component Analysis"):
  • Training set:
  • Receptive field: area in the visual cortex that detects concepts of different levels.
  • Datasets:
    • Barcelona
    • Imagenet
    • SIFT Flow
    • Stanford Background
    • NYU RGB-Depth Indoor Scenes
    • RGB-D People
    • MNIST http://yann.lecun.com/exdb/mnist/
    • INRA http://pascal.inrialpes.fr/data/human/
    • GTSRB http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset
    • SVHN ("Street View House Numbers"): http://ufldl.stanford.edu/housenumbers/
    • Music:
      • GTZAN: http://marsyas.info/downloads/datasets.html
  • RGB-D ("Red Green Blue Depth"):  Image format with 3 chanels devoted to color and one last channel devoted to z buffer (depth).
  • Classification loss:  TODO.
  • Reconstruction loss:  TODO.
  • Sparsity Penalty:  TODO.
  • Inhibition Matrix:  TODO.
  • Invariant Features:  TODO.
  • Lateral Inhibition:  TODO.
  • Gibbs measure: See Gibbs distribution:
  • Gibbs distribution:  TODO.
  • Semantic Segmentation: dividing image into regions that fit different labels.
  • Scene parsing: TODO.
  • Laplacian Pyramid: TODO.
  • Labeling: See Semantig labeling.
  • Semantic Labeling:
    • Label every pixel of image with the object it belongs to
  • Overfitting: Training resulting in the network simply storing data instead of inferring actual knowledge from the input. 
  • Regularization: Method to avoid overfitting by adding a paramter to the cost function that regulates priority of finding small weights and minimizing the function.
  • Cross-entropy cost function: Cost function that reduces learning slowdown by taking into account more values.
  • Dropout:
  • Epoch: step in training or testing.
  • learning rate annealing:
  • Pre-training: Training a network using a different method before actual trainig starts to get a good set of initial weights suited to the particular training in question. Usually done to avoid common errors in training due to bad values for initial weights.
  • Deep Architecture: A way to organize a set of network to conduct deep learning
    • Feed-Forward: multilayer neural nets, convolutional nets
    • Feed-Back: Stacked Sparse Coding, Deconvolutional Nets
    • Bi-Drectional: Deep Boltzmann Machines, Stacked Auto-Encoders
  • Spatial Pooling:
    • Sum or max
    • Non-overlapping / overlapping regions
    • Role of pooling:
      • Invariance to small transformations
      • Larger receptive fields (see more of input)
  • Retinal mapping: See Retinotopy.
  • Retinotopy: The mapping of input from the retina to neurons.
  • Activation function types:
    • Sigmoidal function: Function that looks like the sigma shape (continuous transition from 0 to 1).
    • Sigmoid: 1/(1+exp(-x))
    • TanH
  • Winner-take-all: let neurons compete for activation.
  • Soft-max: TODO.
  • ReLU ("Rectified Linear Unit"):
    • Simplifies backpropagation
    • Makes learning faster
    • Avoids saturation issues
    • Preferred option
  • Simple Cell:  TODO.
  • Complex Cell: TODO.


Existential woe; should robots be designed to be immortal or mortal?

While working on the pairing of nodes within the OctoMY™ ecosystem, I have hit upon an interesting, almost scary design choice; should the robots in OctoMY™ be mortal, or should their soul die with the hardware?

In reality, there are 3 valid choices;

  • All robots are immortal, when the hardware dies, simply copy the software to another to resume its existence.
  • All robots are mortal, when the hardware dies, simply copying the software to another robot will automatically generate a "new soul", the old one will be lost forever.
  • Leave the choice up to the owner of the robot, so when the robot is "born" and the first configuration is completed, the robot will either be immortal or mortal depending on the user's preference.
This peculiar problem stems from the way nodes identify themselves when communicating. To inspire any form of trust when communicating with each other, the nodes must identify themselves with a signature string that is directly related to the public key used to securely transfer secrets among themselves.

If it is NOT related to the key then it would be easy to "spoof" the signature of another node and all sorts of security problems would arise.

Further, simply using the pub-key itself as a signature would be valid, but this would allow the pub-key to be copied should the old hardware die, hence the immortality.

Currently the robot signature is generated from some unique hardware parameters that differ between each node. Combining this unique "hardware fingerprint" into the signature of the node would mean that copying software from one node to another would yield a new and completely different signature, and so the "old signature would die with the old hardware", suggesting that the robot would be mortal.

I have been torn between the 3 alternatives above. On the one hand, I really think that both robots and human beings of the future will be immortal. But on the other hand, owning a robot today that "will die one day and there is nothing you can do to change that" gives it a human dimension.

I think the 3rd choice, of letting the user decide may be best, that way the user can sign the birth certificate of their robot themselves. After all, who am I to decide the mortality of your robots?


What is OctoMY™ ?

This post will be an introduction to concepts of OctoMY™.

I am putting all my spare time into this overly ambitious project, and there are some great things in store. However instead of waiting, I will let some details trickle out on the web early, just for good measure.

Ok so what is OctoMY™ ?

It is a free and open-source software that you should be able to easily use in your hobby robot project to be able to do a bunch of cool stuff.

Cool stuff like what? I can hear you say.

Well. This is where the "overly ambitious" part comes in. I know for this project to be successful I need to cram in some pretty cool stuff! At the same time, I know that there needs to be a lot of basic boring stuff in place too, because cool stuff usually relies on boring stuff to work. So from this notion the following plan has emerged.

OctoMY™ Agent
There are 4 "tiers" in the model:
  1. Agent: This is your robot
  2. Remote: This is your laptop or mobile device used to control said robot
  3. Hub: This is your server running in the cloud or in your basement (or even in your laptop) used to keep track of multiple robots and share and store data between them.
  4. Zoo: This is a central service run by OctoMY™ project here. It is used to brag about your robot online and allow the public to see it. Basically it's a facebook for your robots.
OctoMY™ Remote
Agent and Remote will be available as apps readily downloadable from Google Play, or as binaries for Ubuntu. Hub will most likely be available as a docker image, or as ubuntu/debian binaries. Zoo is just there running from the cloud.

OctoMY™ Hub
OctoMY™ will try to handle all the boring communication stuff and security issues for you so that your robot will have privacy and stay safe.

That was the boring bit. Now for a list of cool stuff that this "boring" platform can enable:
  • Controlling swarms of quad-copters from your phone.
  • Having an army of hexapod robots roam an area and generate a common 3D map that can be used for accurate localization and mapping (slam).
  • Having your robot "pick up strangers" online and exchanging "love letters" with them.
  • Let crowds see live streams from your robot.
  • Send commands to your robots via twitter.
  • Let your robot collaborate on tasks with the robot's of other OctoMY™ users online.
  • Letting other use the configurations and adaptions you make for your robot easily.
OctoMY™ Zoo
This list was just a quick one I made that was severely limited by my imagination really. The point is that with a common software stack like this and with many eager enthusiasts working on it, the possibilities are virtually endless.

Hopefully I will have some software that is worth downloading up soon.