Vinaora Nivo Slider 3.x

Statistical Learning and Data Mining

CFU: 6

Prerequisites

Basic knowledge about mathematics and linear algebra, probability models. 

Preliminary Courses

None.

Learning Goals 

The course aims at providing students with the logic of statistics and the methodological skills of the Statistical Learning paradigm: Data Mining, Inference and Prediction, in the application domains of Engineering and Basic Sciences. Specifically, students are exposed to and trained about the fundamental methods for exploratory data analysis and statistical modeling for inference and prediction in classification and regression problems.

The learning pace is kept by practical exercises carried out by open-source programming languages, so that the acquisition of those methods and skills will be strengthened through the development of suitable case studies based on real world data.  

Expected Learning Outcomes 

Knowledge and understanding

The course provides students with the statistical methodology for learning from data, how to transform the real problem questions into statistical challenges, how to explore data and extract important patterns, how to build up models for decision-making and prediction, how to validate the results, how to interpret and communicate the outcomes of the statistical data analysis. 

The student needs to show that he/she learned how to choose the suitable approach and method, implement the algorithm and its requirements to address real problem questions using statistical methodology.  

Applying knowledge and understanding

The student needs also to show the knowledge of the main phases of statistical data analysis in a case study project using real world data sets or planning a simulation study. The student shows his/her ability in the learning process presenting the quantitative storytelling with the results and providing the correct interpretations of the outcomes.

Course Content - Syllabus 

- Introduction to Statistics, Technè-Logia, Analysis of Data (0.50 CFU*)

  • Basics of Statistics
    • Variable Type and Terminology
    • Exploratory versus Confirmatory approach
    • Descriptive Statistics versus Inference
  • Technè and Logia
    • Rationale of the Learning Strategy: from theory to practice
    • Heuristic versus Algorithmic approach
  • Analysis of Data
    • Key methodological steps in Learning from Data
    • Introduction to Data Mining by D. Hand
    • Introduction to Vapnik’s Statistical Learning Theory

- Unsupervised Learning (1 CFU)

  • Clustering Methods
    • Hierarchical clustering
    • Non-hierarchical clustering (K-Means Clustering, K-Medoids Clustering)
    • Soft K-Means Clustering and Fuzzy Clustering
    • Internal and External Validation
  • Factorial Methods
    • Principal Component Analysis
    • Independent Component Analysis and Projection Pursuit

- Introduction to Supervised Learning (0.50 CFU)

  • Vapnik’s Statistical Learning Theory
    • Learning Machine, Loss function and Risk Functional
    • Regression/Classification/Density Estimation problem
    • Empirical Risk Minimization Principle and Structural Risk Minimization Principle
    • Vapnik and Chervonenkis (VC) dimensionality and the machine learner ability
    • Accuracy-Model Complexity trade-off
    • Bias-Variance trade-off
  • Overview of Statistical Models, Supervised Learning and Function Approximation
    • Parametric Methods versus Non-Parametric Methods
    • Prediction Accuracy versus Model Interpretability
    • Model Assessment versus Model Selection

- Linear Methods (1 CFU)

  • Linear Regression and Regression Diagnostics
  • Linear Models for Time Series Analysis
  • Logistic Regression
  • Discriminant Analysis

- Linear Model Selection (0.25 CFU)

  • Subset Selection and Stepwise Regression
  • Dimension Reduction Methods
    • Principal Component Regression
    • Partial Least Squares Regression
  • Shrinkage methods
    • Ridge Regression
    • Lasso Regression
    • Elastic-Net Regression

- Resampling Methods (0.25 CFU)

    • Model assessment via Bootstrap
    • Model selection via Cross-validation

- Tree-based Methods (0.75 CFU)

  • Classification and Regression Trees
    • Recursive Partitioning Procedures
    • Pruning and Decision Tree Selection
  • Ensemble Methods
    • Bagging
    • Boosting
    • Random Forest

- Moving Beyond Non-Linearity (1 CFU)

  • Basis Expansions and Regularization
    • Polynomial Regression and Step functions
    • Piecewise Polynomials
    • Smoothing Splines
  • Kernel Smoothing Methods
    • Kernel Smoother and Local Regression
    • Kernel Density Estimation and Classification
  • Generalized Additive Models
    • Backfitting Algorithm
    • Local Scoring Algorithm

- Machine Learning (0.75 CFU)

  • Support Vector Machines
  • Projection Pursuit Regression
  • Neural Networks and Deep Learning

*1 CFU = 8 Hours

Readings/Bibliography

 

The elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, Jerome Friedman, Springer (2009). (Available for free in pdf, url: https://web.stanford.edu/~hastie/Papers/ESLII.pdf

Introduction to Statistical Learning, with applications in R. James Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani (2009). (Available for free in pdf, url: http://www-bcf.usc.edu/~gareth/ISL/ ) 

Slides and teaching material by the docent.

Teaching Method

The teaching activities will be organized as follows: a) lectures for about 70% of the total hours, b) practical exercise in the classroom for about 30% of the total hours. 

Examination/Evaluation criteria

Exam type

Project discussion.

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.