Primer on Machine Learning
Primer on Machine Learning
Author: Susanne Lomatch
Machine learning (ML) is the capture and transformation of information (from sensors, databases) into a usable form to improve performance. Pattern/object recognition, decision making/planning and communication are three key application areas for ML. Performance can be evaluated as the ability to make accurate predictions using known data (as in supervised learning or training), or the ability to discover new information with value (as in unsupervised learning).
There are many types
  of ML approaches and algorithms, and below I attempt to list as many as I could
  find, organized into a short, concise descriptor for each. I have placed Wiki
  links to concepts that deserve more depth, so that readers can dig deeper to
  understand those concepts. 
One point to be
  made: optimization and filtering are not ML, though they may be used in ML
  (specifically reinforcement learning in the case of optimization, and
  unsupervised learning in the case of filtering). This is a common confusion
  when reading through what some call ML. 
I also emphasize a
  second point, also made by others [1]: the term machine refers both to machines
  and living organisms (or alternately, systems or agents). The same mathematical
  theory of learning applies regardless of what we choose to call the learner,
  whether it is artificial or biological. 
Types of ML
  approaches and algorithms: 
o      Supervised
  Learning
o  
  Learning
    an input-output relationship from examples
o  Given pairs of input and output patterns,
  learn the dependencies between input and output
o  Algorithm analyzes the training data and
  produces an inferred function (classifier or regression function), used to
  predict an output from valid input
o  
  Problem/Tasks: Regression
    (continuous), classification (discrete), ranking
o  
  Underlying disciplines: Statistical regression and classification
    analysis
o  
  Applications: Skill estimation, behavioral cloning,
    recognition (pattern, object, optical character, handwriting, speech),
    information retrieval (rankers) 
o  
Advantages: Minimize expected error on as yet unseen data
o  
  Tradeoffs: Approximation (variance), generalization (bias) and quality of
    training data
o  
Specific algorithms: Linear regression, logistic regression, naive Bayes classifier, linear discriminant analysis, decision trees, instance learning (k-nearest neighbor), inductive logic programming, artificial neural networks (backpropagation), support vector machines, boosting learners, maximum entropy Markov model
o      Reinforcement
  Learning
o  
  Learning
    from state-action-reward sequences
o  Given observations and rewards from an environment,
  learn how to act in given situations
o  Algorithm finds out which actions are optimal
  based on past experiences and a feedback of rewards
o  
  Problem/Tasks: Control, value estimation, policy learning,
    optimal decision making
o  
  Underlying disciplines: Control theory, game theory, decision theory, operations research,
evolutionary computation
o  
  Applications: Learning to walk, drive, fly an airplane,
    play a game; object recognition (control of search), robotic or autonomous
    control, critical path analysis, planning, scheduling, pricing, trading,
    natural language processing
o  
Advantages: Maximize expected reward over time
o  
Tradeoffs: Exploration and exploitation
o  
  Specific algorithms: Dynamic programming
    (optimization, shortest path), Markov decision processes, Monte Carlo, temporal difference, Q-learning, genetic programming,
      value-dependent learning (see special note below and [2])
o  
Special notes: 
o  Neuroscience researchers have found that the firing rate of dopamine neurons in
  the brain appear to mimic the error function of the temporal difference
  algorithm. The error function reports back the difference between the estimated
  reward at any given state or time step and the actual reward received. The
  larger the error function, the larger the difference between the expected and
  actual reward. When this is paired with a stimulus that accurately reflects a
  future reward, the error can be used to associate the stimulus with the future
  reward.
o  Reinforcement learning is commonly used in
  robotics and autonomous system control (e.g. traffic, airplane, etc.), and has
  been proposed for the training of brain-machine interfaces (BMIs).
o      Unsupervised
  Learning
o  
  Learning
    the underlying structure from examples
o  Algorithm finds hidden structure in unlabeled
  information/data
o  
Problem/Tasks: Cluster analysis, manifold learning, density estimation, blind signal separation (statistical FA, PCA, ICA, etc.), inference
o  
Underlying disciplines: see Tasks
o  
  Applications: Modeling motion capture data and user
    behavior, data mining, recognition (pattern, object, image, speech)
o  
Advantages: Information and knowledge discovery, reasoning under uncertainty
o  
  Disadvantages: No error or reward signal to evaluate a
    potential solution
o  
Tradeoffs: Computational tractability, parameter estimation
o  
  Specific algorithms: Hierarchical clustering, k-means (centroid)
    clustering, distribution clustering (expectation-maximization), association rule learning (Apriori), artificial neural networks (self-organizing map, adaptive resonance), Bayesian networks
      (belief propagation), nonlinear dimensionality reduction (manifold learning or mapping), hidden Markov model
o      Semi-supervised
  Learning
o  
  Supervised
    learning from a small amount of labeled information/data, combined with
    unsupervised learning from a large amount of unlabeled information/data
o  
  Applications: Recognition (pattern, object, character,
    image, speech), data mining, information retrieval, question-answering
o  
  Advantages: Improvement in learning accuracy for certain cases: co-training,
    relevant unlabeled data
o  
  Disadvantages: Worsening of learning accuracy if
    unsupervised learning leads to excessive noise
o  
Tradeoffs: Cost of supervised training and unsupervised noise
o  
  Specific algorithms: Co-training, constrained clustering, transduction or transductive
    inference
o      Deep
  Learning and Cortical Learning
      
o  
  The
    modeling of learning processes performed in the mammalian or human brain; deep
    learning has roots of inspiration from learning processes in the
    mammalian/human visual system; cortical learning is inspired by cortical-thalamic
    anatomy/function and the "Mountcastle principle" of a hierarchical cortical columnar
      organizational structure
o  
  Problem/Tasks: Unsupervised learning of representations
    (and features), inference; discriminative (supervised), reinforcement, semi-supervised
      and multi-task learning are also utilized
o  
  Underlying disciplines: Computational neuroscience (much as I dislike the term – I prefer
  otheoretical neuroscienceo), neuromorphic engineering
o  
  Applications: recognition (pattern, object, image,
    speech), natural language processing and understanding (communication and
    dialogue), machine vision, data mining, information retrieval,
    question-answering, decision making, planning, artificial or biomimetic imagination
o  
  Advantages: Allows for odeep learningo approach, incorporating a hierarchy of
    features to efficiently represent and learn complex abstractions needed for AI
    and mammal intelligence (computational and statistical efficiency); particularly
    suited for multi-task learning, transfer learning, domain adaptation, self-taught learning,
      and semi-supervised learning with few labels; may also be used to solve NP-complete problems
o  
  Disadvantages: A common algorithm may not represent all
    regions in the neocortex, leading to model or algorithmic complexity
o  
Tradeoffs: Complexity, efficiency
o  
  Specific algorithms: Hierarchical temporal memory, artificial neural networks (adaptive resonance theory, Boltzmann machines),
    energy-based learning, hierarchical greedy learning for deep belief
      networks
o  
Deep learning: See [3], [4] and [5]
o  
Cortical learning: See [6] and [7]
An example list of
  open or commercial toolkits (by no means complete, and will be revised
  periodically):
ML software toolkits: Torch5, APML, Shogun, SIGMA-MSFT, Google Prediction API, MALLET, Spider, Deep Learning SW
(Disclaimer: This primer is meant to inform. I encourage readers who find factual errors or deficits to contact me (click on contact link below). I also welcome constructive and friendly comments, suggestions and dialogue.)
References
  and Endnotes:
[1] "Advanced Lectures on Machine Learning," ed. O. Bousquet and G. Rotsch, Springer-Verlag, 2004.
[2] "Value-Dependent Selection in the Brain: Simulation in a Synthetic Neural Model," K.J. Friston et al., Neuroscience, vol. 59, 1994. Link HERE.
[3] "Learning Deep Architectures for AI," Y. Bengio, Foundations and Trends in Machine Learning, vol. 2 (1), 2009. Link HERE. See also the site dedicated to Deep Learning: HERE.
[4] "A Fast Learning Algorithm for Deep Belief Nets," G.E. Hinton et al., Neural Computation, vol. 18, p.1527, 2006. Link HERE.
[5] "A Tutorial on Energy-Based Learning," Y. LeCun et al., Predicting Structured Data, 2006. Link HERE.
[6] "Learning and Inference in the Brain," K. Friston, Neural Networks, vol. 16, 2003. Link HERE.
"A Theory of Cortical Responses," K. Friston, Phil. Trans. R. Soc. B, vol. 360, 2005. Link HERE.
"Hierarchical Models in the Brain," K. Friston, PLoS Computational Biology, vol. 4, 2008. Link HERE.
[7] "Towards a Mathematical Theory of
  Cortical Micro-circuits," D. George and J. Hawkins, PLoS
  Computational Biology, vol. 5, 2009. Link HERE.
  
 
  
     
  
  