M:ISR (Image and speech recognition)

Winter 2018/2019
The Faculty of Mathematics and Information Science

Contents:


Meeting times and rooms

Lecture:
Thursday, 10:15-12:00, room 103.

Exercises:
Thursday (selected weeks), 12:15-14:00, room 219.

Project:
Thursday (selected weeks), 12:15-14:00, room 219.

[go to top]


Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture, exercises)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Tuesday, 12.15-14.00
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Maciej Stefańczyk, M.Sc. (exercises, project)
Office: room 564, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 xxx
M.Stefanczyk at elka.pw.edu.pl

[go to top],


Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech recognition programs, dealing with pattern (image or speech) processing, pattern segmentation and object (or word) recognition.

Prerequisities
Students are expected to have the following background:

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]


Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:
ECTS Grade A, 5 B, 4.5 C, 4 D, 3.5 E, 3 F/FX, 2
mark 100- 91 90-81 80-71 70- 61 60- 51 50 or less
Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of: The Pass mark for this course will be set at: 25 pts. for combined assessment of exercises + project, and 26 pts. for total assessment of tests. In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises/project and an optional attendance of the lecture. Credits will be awarded to candidates who pass this course.

[go to top]


Lecture

Place and time: Thursday, time 12.15-14.00, room 103.

Lecture schedule (tentative): Tests:
  1. [29.11] Part 1, 12.15-14.00, room 103
  2. [17.01] Part 2, 12.15-14.00, room 103
  3. [24.01] Retake test (both parts), 12.15-14.00

[go to top]


Course notes and readings

Lecture and Exercise notes:
  1. W. Kasprzak: Image and speech recognition. Lecture notes, WUT, Warszawa, 2017, v.5, 12 chapters.

  2. W. Kasprzak: Image and speech recognition, Exercises. WUT, Warszawa, 2017, v.5.
Readings:
  1. W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
  2. R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
  3. R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
  4. I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
  5. L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
  6. J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
  7. W. Kasprzak: Adaptive computation methods in digital image sequence analysis. Prace Naukowe - Elektronika, Warsaw University of Technology Publishing House, Warszawa, No. 127 (2000), 172 pages. (Chapters: 3,4)
Other sources:
  1. The OpenCV Reference Manual. Release 2.4.9.0 (or higher). 2014 (or later), http://opencv.org/
  2. Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.
Week Topic Readings Lecture notes
(Week 1, 2) L1. Introduction to pattern recognition. IASR_B1
(Week 3) L5. Image processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] IASR_B5
(Week 4, 5) L2. Pattern transformation [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] IASR_B2, IASR_B2A
(Week 6) L9. Speech signal. L10. Phonetics. (cancelled)
[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2] IASR_B9, IASR_B10
(Week 7) L11. Speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] IASR_B11
(Week 8) L6. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] IASR_B6
(Week 9) Test 1
(Week 9, 10) L7. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] IASR_B7
(Week 11) L3. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] IASR_B3
(Week 12) L4. Pattern sequences
[Kas09, ch.5], [Rabiner, 4.7] IASR_B4
(Week 13) L12. Speech recognition
[Kas09, 11], [Rabiner, 6] IASR_B12
(Week 14, 15) L8. Object recognition
IASR_B8
(Week 16) Test 2.

[go to top]


Exercises:

Place and time: Thursday (selected weeks), time 12.15-15.00, room 219. Marks: Participants can earn up to 8 points. Points will be deducted in case of absence (-1p. for 1h).

[go to top]


Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Friday (selected weeks), 10.15-12.00, room 219.

Schedule:

  1. [18.10] - Project introduction and topics
  2. [ ... ] Project assignments
  3. [15.11] - Validation of assumptions
  4. [29.11] - I. Preliminary report deadline
  5. [13.12] - II. Prototype
  6. [17.01] - III. Completed work (final evaluation)
    1. Marks: Participants can earn up to 32 points.

      Suitable implementation tools - libraries with open sources:

      1. OpenCV - Open Source Computer Vision library - diverse image processing and analysis algorithms in C++.
        Documentation.
        Download openCV
      2. DisCODe - Distributed Component Oriented Data Processing – a C++ framework facilitating the development of data (image, speech) processing algorithms (T.Kornuta and M.Stefańczyk at WUT).
      3. MARF - The Modular Audio Recognition Framework (written In JAVA).
      4. Sphinx-4 - A speech recognizer written in Java.
      5. The KALDI project page - provides a toolkit for speech recognition written in C++ .

      W. Kasprzak.
      Last modification: 4.10.2018.