CFU: 9
Prerequisites
Teoria dei segnali (Signal Theory).
Preliminary Courses
None.
Learning Goals
The course aims to provide to the students a solid knowledge about the development and the application of image processing techniques for typical computer vision tasks, ranging from traditional model-based signal processing approaches to modern data-driven solutions such as those based on convolutional neural networks. Among the learning goals, specific computer vision tasks taken in consideration are detection, description and matching of image local features, geometric model fitting and alignment, image classification and segmentation (semantic or by instances), object detection, localization and recognition, human-pose estimation, depth estimation, stereo matching, structure from motion.
Expected Learning Outcomes
Knowledge and understanding
The student must know both classic filtering techniques and modern convolutional neural network based approaches to address problems such as local feature detection, description and matching, model fitting and alignment, image classification, semantic or instance segmentation, object detection, localization and recognition, human-pose estimation, depth estimation, stereo matching, structure from motion. In addition, the student must know metrics and indexes for the assessment of any solution to the above listed problems.
Applying knowledge and understanding
The student must demonstrate the capability to design, develop and test state of the art image processing algorithms that address common computer vision tasks, comprising local feature detection, description and matching, model fitting and alignment, image classification, semantic or instance segmentation, object detection, localization and recognition, human-pose estimation, depth estimation, stereo matching, structure from motion.
Course Content - Syllabus
Basics image filtering. Scale-space domain and pyramid representations. Brief overview on programming languages for computer vision use.
Image formation: Light ad color. Pinhole camera model and World-to-image plane projection: camera matrix and calibration. 2D projective transforms.
Early vision: Countour/edge detection; watershed segmentation; template matching; texture description; corner detection (Harris detector); line detection (Hough transform).
Keypoint detection and description: Keypoint definition and repeatability property. Invariance of a detector with respect to intensity changes, translation, rotation, scaling, geometric affinity and omographies. Harris detector. Difference of Gaussians (DoG) detector. DoG pyramid. Keypoint orientation and scale. Feature description: discriminative properties; common descriptors (SIFT, SURF, MSER, …); shape and context descriptors.
Matching, fitting and alignment: Distance ratio criterion for feature matching. Model fitting and alignment: least square and robust least square; ICP algorithm; generalized Hough transform; RANSAC. Object detection, classification and recognition.
Convolutional neural network (CNN) based image processing: CNN architectures for image processing. Training of CNNs for image processing: backpropagation and Stochastic Gradient Descent (SGD) based algorithm (and variants). CNN layers: ConvLayer, pooling, unpooling, batch normalizazion, activation functions (ReLU and variants, Tanh, sigmoid). Loss functions for image processing tasks. Dropout and data augmentation. CNN models for super-resolution, classification, segmentation, object detection and localization, depth estimation, human-pose estimation.
Multi-view: Stereo vision: disparity and depth from stereo. Epipolar constraints: essential and fundamental matrices. Dense correspondence problem. 3D reconstruction from multiple views: Structure from Motion (SfM).
Readings/Bibliography
- R. Szeliski, “Computer vision: algorithms and applications”, Springer 2010.
- R. -I. Hartley, A. Zisserman, “Multiple View Geometry in Computer Vision”, C. U. P., 2nd Ed., 2004.
- I. Goodfellow, et al., “Deep Learning”, MIT Press, 2017.
- Lecturer’s notes.
Teaching Method
The course involves both lectures (about 60%) and lab experience. Moreover, a few introductory tutorials about Python and related deep learning toolboxes, as well as about the use of cloud computing platforms for the course project development, will be given. Part of the lab experience will be reserved to the development of the projects to be presented at the end of the course for the final exam.
Examination/Evaluation criteria
Exam type
The final exam consists of a (individual or team) project development and of a general colloquium about the course content. By default, the project is developed during the course (in part as homework, in part during the lab sessions with tutoring) and discussed in a dedicated workshop at the end of the course, whereas the colloquium can take place in any useful exam session of the current academic year without any temporal constraint.