CAP 6610, Machine Learning, Fall 2020

Place:WEB
Time:MWF 4 (10:40-11:30 p.m.)

Instructor:
Prof. Arunava Banerjee
Office: CSE E336.
E-mail: arunava@cise.ufl.edu.
Phone: 505-1556.
Office hours: Tuesday 2:00 p.m.-4:00 p.m.

TA:
Anik Chattopadhyay
Office: CSE E335.
E-mail: xxx@cise.ufl.edu.
Office hours: Wednesday 2:00 p.m.-3:00 p.m.

TA:
Jingzhou Hu
Office: CSE E335.
E-mail: xxx@cise.ufl.edu.
Office hours: Thursday 2:00 p.m.-3:00 p.m.

Pre-requisites:

Textbook: Machine Learning: A Probabilistic Perspective, Murphy, ISBN-10: 0262018020.

Reference: Pattern Recognition and Machine Learning, Bishop, ISBN 0-38-731073-8.

Reference: Pattern Classification, 2nd Edition, Duda, Hart and Stork, John Wiley, ISBN 0-471-05669-3.

Tentative list of Topics to be covered

The above list is tentative at this juncture and the set of topics we end up covering might change due to class interest and/or time constraints.

Please return to this page at least once a week to check updates in the table below

Evaluation:

The final grade will be on the curve.

Course Policies:

Academic Dishonesty: See http://www.dso.ufl.edu/judicial/honestybrochure.htm for Academic Honesty Guidelines. All academic dishonesty cases will be handled through the University of Florida Honor Court procedures as documented by the office of Student Services, P202 Peabody Hall. You may contact them at 392-1261 for a "Student Judicial Process: Guide for Students" pamphlet.

Students with Disabilities: Students requesting classroom accommodation must first register with the Dean of Students Office. The Dean of Students Office will provide documentation to the student who must then provide this documentation to the Instructor when requesting accommodation.

Announcements

HomeWorks

List of Topics covered (recorded classroom+zoom lectures)
Lectures Topic Additional Reading
1, 2, 3
  • Putative framework via example: NEST thermostat
  • Supervised, Unsupervised Learning.
  • Independent variable, covariates, feature vector vs Class label, dependent variable
  • Continuous versus nominal features
4, 5, 6
  • Putative framework continued
  • Concept class/ Hypothesis space: What do we fit
  • Testing on unseen data
  • Generalization, over-fitting to training data
  • Noisy features
  • Core areas: Probability theory, Optimization, Complexity
  • Started Mathematical Probability theory: Sample space of outcomes, Sigma algebra of events, Countably additive probability function
7, 8, 9
  • Mathematical Probability theory continued:
  • supremum, limit supremum, infimum and limit infimum of a countable sequence of events.
  • Random variables, Probability distribution function
  • Rick Durrett's book Probability: Theory and examples, can be found here.
  • 10, 11, 12
    • Indicator RV, Simple RV
    • Expected value
    • Conditional probability, Independence of RV
    • Probability density function
  • A more intuitive intro to probability theory, with common thms here.
  • 13, 14, 15
    • Standared density functions: univariate/multi-variate Gaussian, univariate/multi-variate uniform.
    • The Risk Functional Approach
    • Risk functional, Loss function, Expected Loss.
    • Demonstration of Risk Functionals for Classification, 0/1 loss.
    • Regression = conditional expectation of dependant variable.
    16, 17, 18
    • The Risk Functional Approach continued.
    • Demonstration of Risk Functionals for Regression; mean squared error
    • Demonstration of Risk Functionals for Density Estimation.
    • Jensen's inequality, Kulback Leibler divergence.
    • Expected risk versus Empirical risk
    • Empirical Risk Minimization principle
    • Generalization, over-fitting to training data
    • Proof of convergence of perceptron learning algorithm can be found here.
    19, 20, 21
    • Intro to Computational neuroscience and the real biological neuron.
    • McCullough Pitt neuron
    • Rosenblatt's Perceptron
    • Mistake bound theorem for the perceptron.
    • Energy function for perceptron learning and Stochastic Gradient Descent
    • All you need to know about optimization: video. Well not quite, but Ben Recht's exposition is brilliant.
    22, 23, 24
    • Multi-layer perceptrons and Error back propagation
    • On-line learning (stochastic gradient descent), epoch, etc.
    • Deep learning: Convolution networks, Recurrent neural networks
    25, 26, 27
    • Convex functions, Convex sets
    • Thm: local minima = global minima
    • Convex optimization: Inequality and Equality constraints
    • Lagrange Multipliers
    28, 29, 30
    • Constrained optimization; objective, equality and inequality constraints
    • Lagrange multiplier technique for equality constraints.
    • Convex optimization problems, the Lagrangian, the Lagrange dual problem.
    • Primal form of maximal margin classifier
    • Support Vector Machines: Margin maximization, the constrained optimization problem;
    • Primal formulation of SVM
    • Slack variable version of SVM for linearly non-separable data, hinge loss.
    Here is a link to the book Convex Optimization by Boyd and Vandenberghe.
    31, 32, 33
    • Convex optimization problems, the Lagrangian, the Lagrange dual problem.
    • Dual of a convex program
    • Slater's condition and the duality gap.
    • Weak and Strong duality
    • Dual formulation of SVM
    34, 35, 36
    • Kernel Trick
    • Mercer's condition
    • markov, Chebychev, Chernoff, Hoeffding's inequality
    • Proof Sketch for VC bound on generalization error.
    • Shattering, VC-dimension, margin etc. Here is the paper that proves the VC-dimension for given margin/diameter.
    • VC bound on generalization error (statement w/o proof)
    37, 38, 39
    • Unsupervised learning; Roadmap for rest of semester
    • Maximum likelihood and Bayesian parameter estimation
    • Maximum likelihood principle (ML), Maximum a posteriori (MAP)
    • Gaussian distribution, 1-D case, Multi-D case, ML estimates for mean and variance
    • Bias of estimator
    • Conjugate priors, Bernoulli/Binomial and it conjugate (Beta)
    • Conjugate Prior for Multinomial is Dirichlet
    40, 41, 42
    • Principal component analysis.
    • K-Means Clustering.
    • Mixture of Gaussians and Expectation Maximization.
    Here are D'Souza's notes.