2016-05-28

Deep Learning Terms For Toddlers

The best way to learn is to teach others. That is why I am making this list of terms related to deep learning in a way that is meant to be easy to understand for total beginners like myself.

This list tries to start with the basics, and tries to be readable from start to finish by only referencing items that are already covered earlier. It also intentionally spreads out the buzzwords so they can be grasped one by one.

It will try to stick to one term when several terms mean the same, and it will try to debunk ambiguities and overlaps when possible.

But first a fair warning/std disclaimer: I am doing this only in the hope that I personally will learn from it, and I share it with you for it's entertainment value only.

  • Nerve Cell: See Neuron.
  • Neuron: The kind of cell that brains are made of. Works by sending a pulse to other neurons when it receives a pulse. It only sends out a pulse if it receives pulses fast enough or strong enough for its liking. Neurons are modeled in software as "Artificial Neurons" to form "Artificial Neural Networks".
  • Synapse: A connection between the output of one Neuron and the input of another. This is where pulses travel between neurons. In a computer model synapses may be greatly simplified.
  • Transfer Function: See activation Function
  • Activation Function: The function in a neuron that triggers an output based on the sum of its inputs. There are several types (see the section below), but the most relevant for Deep learning is called  ReLU ("Rectified Linear Unit").
  • Neural Network: A number of neurons connected into a network. The number of neurons, number of connections, and the way in which the connections are made (aka the "architecture") can vary greatly and are all deciding factors for how the neural network will work, and what it can be used for.
  • Layer: In a Neural Network neurons can be arranged in layers. In each layer the neurons typically have their outputs connected only to neurons in the next layer of the model. Several layers containing different number of neurons can be connected, and each layer will then serve a separate purpose in the network. Connections in a layer typically go from one neuron in the previous layer to several neuroons in this layer. Layers have different names depending on their use:
    • Input: Special purpose layer where data is first entered into the network
    • Output: Special purpose layer where results exit from the network
    • Hidden: "Normal" layer that simply stores or processes data through it's neurons.
    • Pool: See this article.
    • Soft-max: See this article.
  • Connection topology:
    • Fully-connected: All neurons are connected to all other (N! connections);
    • Locally-Connected: Only neighboring  neurons are connected.
  • Connection Weight: See Weight.
  • Weight: Neurons trigger when the value they receive from their connected inputs reach a certain level. The value from each of it's inputs are adjusted according to the weight before entering the activation function, so even if 10 neurons all send to us a strong pulse they may be weighted down to 0 and not cause activation.
  • Learning: What the network "learns" is simply the values that it stores in its weights. The exact meaning of what a network learns we cannot know because the process by which the learning takes place is indirect and very complex, just like in a real brain. However we can use the network and its knowledge simply by passing pulses through it and recording the output. There are several forms of learning:
    • Supervised: The program knows the ground truth for training data and knows when the output from the network is good or bad, and can correct the network weights thereafter.
    • Unsupervised: The program does not know what is good or bad, but judges it's performance by trying to replicate the input after it has gone through a network whose architecture does not allow the input to simply be copied verbatim, and thereby forcing the network to "learn something, and demonstrate it later".
    • Back-propagation: A method of supervised learning where the weights of the connections in a neural network are adjusted to minimize the error of the output generated by the network when compared to ground truth.
    • Reinforcement Learning: Training with sparse training data, learning by reinforcing good behavior without the need for extensive training data.
  • Network Types: As mentioned in the section about Neural Networks, they can be arranged in all sorts of ways. Here is a list of network types:
    • Perceptron: A single layer neural network that classifies input as either "in" or"out" (a.k.a. binary classification). For example, it can determine if the input is an image of a dog or not.
    • MLP ("Multi Layer Perceptron"): see "Perceptron".
    • CNN ("Convolutional Neural Network)
    • FNN ("Feedforward Neural Network"): An artificial neural network where the neurons do not form cycles (as opposed to recurrent neural networks).
    • RNN ("Recurrent Neural Network"): A network where neurons form "cycles" (as opposed to feed forward neural networks). This imposes fewer restriction on which connections may be made between neuron in the network. Several related concepts:
      • Fullly Recurrent Network:
      • Hopfield Network:
      • Elman Network:
      • Jordan Network:
      • ESN ("Echo State Network"):
      • LSTM ("Long Short Term Memory Network"):
      • BRNN ("Bi-Directional Recurrent Neural Network"):
      • CTRNN ("Continuous Time Recurrent Neural Network"):
      • Hierarchical:
      • Recurrent Multilayer Perceptron:
      • Second Order Recurrent Neural Network:
    • Cognitron: An early implementation of a self-contained multi-layered neural network that used un-supervised learning.
    • Neocognitron: Evolution of the cognitron that improved some of it's shortcomings.
    • Topographic map: See Self Organizing Feature Map
    • Kohonen map: See Self Organizing Feature Map
    • Kohonen network: See Self Organizing Feature Map
    • SOM ("Self Organizing Map"): See Self Organizing Feature Map
    • DBN ("Deep Belief Network")
    • SOFM ("Self Organizing Feature Map"): A network that applies competitive learning to create a "map" that organizes itself to map the input space.
    • GSOM ("Growing Self Organizing Map"): A variant of SOM where nodes are added to the map following some heuristic. Invented to overcome the problem that deciding a map size that works well is difficult, in GSOM the map starts small and grows adaptability until it is "big enough".
    • Diabolo Network: See Autoencoder.
    • Autoassociator: See Autoencoder.
    • Autoencoder: A non-recurrent neural network with the same number of input as outputs and with at least one hidden layer, that when receiving as input the value X is trained not to generate some output Y but to reconstruct its input X. Autoencoders are inherently well suited to unsupervised learning.
      • Denoising autoencoder: An auto encoder that is trained on corrupted versions of the input in order to make it learn more robust features.
      • VAE ("Variational Autoencoder"): A generative model that is similar in architecture to a normal autoencoder, but has a completely different usage.
  • Ground truth: When training a neural network we supply an input and expect an output. The ideal expected output is called ground truth.
  • Deep Neural Network: The deepness of a neural network refers to the number of layers in the network, where typical deep networks have more than 2 hidden layers.
  • Oldschool methods: Not related to neural netowkrs, but a common hand-crafted method to solve a problem that has been surpassed by recent deep learning approaches.
    • SIFT.
    • SURF.
  • Boltzman Machine: A wobbly and fun neural network that is run continuously to reach different states. Albeit their intriguing nature, such networks are useless unless they are being restricted in particular ways.
  • RBM ("Restricted Boltzman Machine"): A Neural network that can learn the probability distribution of its inputs.
  • Generative model: A model that generates random output conforming to a preset distribution.
  • Stochastic neural network: A neural network that has either stochastic activation functions or random weights assigned to it. Used in training to help avoid reaching local minimum.
  • Greedy Learning: train each component of a network on its own instead of trying to train the whole network at once.
  • Convnet: See Convolutional Neural Network.
  • Training protocol: How a network is trained
    • Purely supervised
    • Unsupervised layerwize, supervised classifier on top
    • Unsupervised layerwize, global supervised fine-tuning
  • Intermediate representations: What the network or architecture actually learned at each level.
  • Reward Function: See Objective Function
  • Profit Function: See Objective Function
  • Fitness Function: See Objective Function
  • Cost function: See Objective Function.
  • Loss function: See Objective Function.
  • Objective Function: A function that maps the cost of an event. If we wish to mimimize the cost, the function may be referred to as a loss function. If we wish to maximize the cost we refer to it as a reward function.
  • Optimization: Finding the input that provides the best output from a objective function. In a loss function this means the lowest output, in a reward function it means the highest output.
  • Gradient: A smooth sloped 2D graph with dips and hills.
  • Gradient Descent: Finding nearest local minimum by moving in steps towards the steepest downward slope in the model.
  • Incremental gradient descent: See Stochastic Gradient Descent.
  • SGD ("Stochastic Gradient Descent"): A gradient descent method that iteratively takes random steps towards the slope in an effort to find local minimum.
  • Conjugate gradient: Alternative to gradient descent that is used for solving linear equations.
  • FPROP ("Forward Propegation"): To feed data into a neural network to get the resulting value output from the network. May also be called "testing" the network. The output can be compared with the ground truth/real value and the weights of the network can be ajusted based on the deviance.
  • BPROP ("Back Propegation"): To minimize error in the network, you propagate backwards by finding the derivative of error with respect to each weight and then subtracting this value from the weight value.
  • Energy based unsupervised learning:
    • PSD ("Predictive Sparse Decomposition"): TODO.
    • FISTA ("Fast Iterative Shrinkage-Thresholding Algorithm"): TODO.
    • LISTA("Learned Iterative Shrinkage-Thresholding Algorithm"): TODO.
    • LcoD ("Learning Coordinate Descent"): TODO.
    • DrSAE ("Discriminative Recurrent Sparse Auto-Encoder"): TODO.
  • Deep Learning: Broad term realting to working with "deep networks" and related technology to solve problems.
  • Manifold: a 2D surface wrapped in 3D space (think curled up sheet of paper) to form a shape that can be both be treated as a Cartesian coordinate system locally but where each point in the map actually has 3 coordinates as well.
  • Entangled Data Manifold:
  • Linear separability: When a line (in 2D), plane (in 3D) or hyperplane (in n-D) can be found such that two distinct sets of points can be completely separated by it.
  • Hebb's rule: In a biological brain synapses that are used often are strengthened while synapses that are not will weaken.
  • Feature Extractor: Derived values (features) derived from input data intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps.
  • Invariant Feature Learning: TODO.
  • Trainable Feature: Something that can be understood by a neural network. In deep learning each layer typically learn features of different levels of abstraction. Low-level features are learned at at levels close to the input level. High-level features, while for each progressive layer in the model more complex and abstract features are learned.
    • Image Recognition
      • Pixel
      • Edge
      • Texton: Textons refer to fundamental micro-structures in generic natural images and the basic elements in early (pre-attentive) visual perception.
      • Motif
      • Part
      • Object
    • Text
      • Character
      • Word
      • Word group
      • Clause
      • Sentence
      • Story
    • Speech
      • Sample
      • Spectral Band
      • Sound
      • Phone
      • Phoneme
      • Word
  • Classification: Determine to which category an observation belongs.
  • ("Support Vector Machine"):
  • CV ("Computer Vision"): The field of study relating to processing of visual data in computers such as images and videos.
  • Feature: An artifact such as a pattern in an image or phone in an audio sample that can be reliably detected, tracked, parameterized and stored by feature recognizers for processing feature classifiers.
    • Haar like features: See Haar Features.
    • Haar features:
  • Integral image: Image representation, or look-up-table where sums of intensities are stored per pixel instead of direct image intensities. This simple structure is used to optimize performance of some computationally expensive image processing algorithms that depend on these sums.
  • Feature Engineering: Hand-crafting of feature detectors.
  • Stride: How many units (such as pixels, neurons etc). a sliding window travels between iterations during input of (image) data into network either during training or testing.
  • Training: Changing a network to make it better at solving the problem for which it is being designed.
  • Testing: Using a network without changing it to either see how well the network is working during training or to actually use the output in production.
  • Ventral Pathway: A pathway in the mamal brain where visual input is recognized. It has several stages each with it's own intermediate representation just like a deep neural network.
  • Sparse Modeling:
  • Neuron Cluster:
  • Shared Weights:
  • ICA ("Independent Component Analysis"):
  • Training set:
  • Receptive field: area in the visual cortex that detects concepts of different levels.
  • Datasets:
    • Barcelona
    • Imagenet
    • SIFT Flow
    • Stanford Background
    • NYU RGB-Depth Indoor Scenes
    • RGB-D People
    • MNIST http://yann.lecun.com/exdb/mnist/
    • INRA http://pascal.inrialpes.fr/data/human/
    • GTSRB http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset
    • SVHN ("Street View House Numbers"): http://ufldl.stanford.edu/housenumbers/
    • Music:
      • GTZAN: http://marsyas.info/downloads/datasets.html
  • RGB-D ("Red Green Blue Depth"):  Image format with 3 chanels devoted to color and one last channel devoted to z buffer (depth).
  • Classification loss:  TODO.
  • Reconstruction loss:  TODO.
  • Sparsity Penalty:  TODO.
  • Inhibition Matrix:  TODO.
  • Invariant Features:  TODO.
  • Lateral Inhibition:  TODO.
  • Gibbs measure: See Gibbs distribution:
  • Gibbs distribution:  TODO.
  • Semantic Segmentation: dividing image into regions that fit different labels.
  • Scene parsing: TODO.
  • Laplacian Pyramid: TODO.
  • Labeling: See Semantig labeling.
  • Semantic Labeling:
    • Label every pixel of image with the object it belongs to
  • Overfitting: Training resulting in the network simply storing data instead of inferring actual knowledge from the input. 
  • Regularization: Method to avoid overfitting by adding a paramter to the cost function that regulates priority of finding small weights and minimizing the function.
  • Cross-entropy cost function: Cost function that reduces learning slowdown by taking into account more values.
  • Dropout:
  • Epoch: step in training or testing.
  • learning rate annealing:
  • Pre-training: Training a network using a different method before actual trainig starts to get a good set of initial weights suited to the particular training in question. Usually done to avoid common errors in training due to bad values for initial weights.
  • Deep Architecture: A way to organize a set of network to conduct deep learning
    • Feed-Forward: multilayer neural nets, convolutional nets
    • Feed-Back: Stacked Sparse Coding, Deconvolutional Nets
    • Bi-Drectional: Deep Boltzmann Machines, Stacked Auto-Encoders
  • Spatial Pooling:
    • Sum or max
    • Non-overlapping / overlapping regions
    • Role of pooling:
      • Invariance to small transformations
      • Larger receptive fields (see more of input)
  • Retinal mapping: See Retinotopy.
  • Retinotopy: The mapping of input from the retina to neurons.
  • Activation function types:
    • Sigmoidal function: Function that looks like the sigma shape (continuous transition from 0 to 1).
    • Sigmoid: 1/(1+exp(-x))
    • TanH
  • Winner-take-all: let neurons compete for activation.
  • Soft-max: TODO.
  • ReLU ("Rectified Linear Unit"):
    • Simplifies backpropagation
    • Makes learning faster
    • Avoids saturation issues
    • Preferred option
  • Simple Cell:  TODO.
  • Complex Cell: TODO.

No comments:

Post a Comment