M:ISR (Image and speech recognition)

Winter 2018/2019
The Faculty of Mathematics and Information Science

Contents:

Meeting times and rooms
Teaching staff and contact info
Short course description
Marks, grading
Lecture
Textbooks and suggested readings
Exercises
Project work

Meeting times and rooms

Lecture:
Thursday, 10:15-12:00, room 103.

Exercises:
Thursday (selected weeks), 12:15-14:00, room 219.

Project:
Thursday (selected weeks), 12:15-14:00, room 219.

[go to top]

Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture, exercises)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Tuesday, 12.15-14.00
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Maciej Stefańczyk, M.Sc. (exercises, project)
Office: room 564, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 xxx
M.Stefanczyk at elka.pw.edu.pl

[go to top],

Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech recognition programs, dealing with pattern (image or speech) processing, pattern segmentation and object (or word) recognition.

Prerequisities
Students are expected to have the following background:

Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program, preferably in one of the languages: C/C++, Java, C# or Pascal/Delphi.
Familiarity with the basic mathematical analysis, linear algebra and probability theory.

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]

Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:

ECTS Grade	A, 5	B, 4.5	C, 4	D, 3.5	E, 3	F/FX, 2
mark	100- 91	90-81	80-71	70- 61	60- 51	50 or less

Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of:

two written tests (midterm and final) both for 0-30 pts.;
exercise assesment (from -8 to 8 pts).
project work (0-32 pts.).

The Pass mark for this course will be set at: 25 pts. for combined assessment of exercises + project, and 26 pts. for total assessment of tests. In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises/project and an optional attendance of the lecture. Credits will be awarded to candidates who pass this course.

[go to top]

Lecture

Place and time: Thursday, time 12.15-14.00, room 103.

Introduction.
Part ONE: Pattern analysis (sec. 1-5)
Part TWO: Image analysis (sec. 6-9)
Part THREE: Speech analysis (sec. 10-12)

Lecture schedule (tentative):

[4.10] L1. Introduction to pattern recognition
[11.10] L6. Image processing
[18.10] L2. Pattern transformation I
[25.10] L3. Pattern transformation II
Lecture and exercise notes, part I: L1, L6, L2, L3
[8.10] L10. Speech signal and phonetics.
[15.11] L11. Speech features.
[22.11] L7. Image segmentation I
Lecture and exercise notes, part II: L7, L10, L11

[29.11] Test 1 (sections: 1, 2, 3, 6, 7, 10, 11)
[6.12] L8. Image segmentation II
[13.12] L4. Pattern classification
[20.12] L5. Pattern sequences
Lecture and exercise notes, part III: L8, L4, L5
[3.01.2019] L12. Speech recognition
[10.01] L9. Object recognition
Lecture and exercise notes, part IV: L12, L8
[17.01] Test 2 (sections 4, 5, 8, 9, 12)
[24.01] Retake test

Tests:

[29.11] Part 1, 12.15-14.00, room 103
[17.01] Part 2, 12.15-14.00, room 103
[24.01] Retake test (both parts), 12.15-14.00

[go to top]

Course notes and readings

Lecture and Exercise notes:

W. Kasprzak: Image and speech recognition. Lecture notes, WUT, Warszawa, 2017, v.5, 12 chapters.
W. Kasprzak: Image and speech recognition, Exercises. WUT, Warszawa, 2017, v.5.

Readings:

W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
W. Kasprzak: Adaptive computation methods in digital image sequence analysis. Prace Naukowe - Elektronika, Warsaw University of Technology Publishing House, Warszawa, No. 127 (2000), 172 pages. (Chapters: 3,4)

Other sources:

The OpenCV Reference Manual. Release 2.4.9.0 (or higher). 2014 (or later), http://opencv.org/
Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.

Week Topic Readings Lecture notes

(Week 1, 2) L1. Introduction to pattern recognition. IASR_B1

(Week 3) L5. Image processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] IASR_B5

(Week 4, 5) L2. Pattern transformation [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] IASR_B2, IASR_B2A

(Week 6) L9. Speech signal. L10. Phonetics. (cancelled)
[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2] IASR_B9, IASR_B10

(Week 7) L11. Speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] IASR_B11

(Week 8) L6. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] IASR_B6

(Week 9) Test 1

(Week 9, 10) L7. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] IASR_B7

(Week 11) L3. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] IASR_B3

(Week 12) L4. Pattern sequences
[Kas09, ch.5], [Rabiner, 4.7] IASR_B4

(Week 13) L12. Speech recognition
[Kas09, 11], [Rabiner, 6] IASR_B12

(Week 14, 15) L8. Object recognition
IASR_B8

(Week 16) Test 2.

[go to top]

Exercises:

Place and time: Thursday (selected weeks), time 12.15-15.00, room 219.

11.10 - Exercises - sec. 1, 5. (IASR-E1s, EIASR-E5s)
25.10 - Exercises - sec. 2. (IASR-E2s)
8.11 - Exercises - sec. 9, 10
22.11 - Exercises - sec. 6, 11
6.12 - Exercises - sec. 7, 3
20.12 - Exercises - sec. 3, 4
3.01 - Exercises - sec. 4, 12
10.01 - Exercises - sec. 8

Marks: Participants can earn up to 8 points. Points will be deducted in case of absence (-1p. for 1h).

[go to top]

Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Friday (selected weeks), 10.15-12.00, room 219.

Schedule:

[18.10] - Project introduction and topics
[ ... ] Project assignments
[15.11] - Validation of assumptions
[29.11] - I. Preliminary report deadline
[13.12] - II. Prototype
[17.01] - III. Completed work (final evaluation)

Week	Topic	Readings	Lecture notes
(Week 1, 2)	L1. Introduction to pattern recognition.		IASR_B1
(Week 3)	L5. Image processing	[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3]	IASR_B5
(Week 4, 5)	L2. Pattern transformation	[Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4]	IASR_B2, IASR_B2A
(Week 6)	L9. Speech signal. L10. Phonetics. (cancelled)	[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2]	IASR_B9, IASR_B10
(Week 7)	L11. Speech features	[Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6]	IASR_B11
(Week 8)	L6. Image segmentation I	[Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1]	IASR_B6
(Week 9)	Test 1
(Week 9, 10)	L7. Image segmentation II	[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3]	IASR_B7
(Week 11)	L3. Pattern classificaton	[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10]	IASR_B3
(Week 12)	L4. Pattern sequences	[Kas09, ch.5], [Rabiner, 4.7]	IASR_B4
(Week 13)	L12. Speech recognition	[Kas09, 11], [Rabiner, 6]	IASR_B12
(Week 14, 15)	L8. Object recognition		IASR_B8
(Week 16)	Test 2.