Heart Sounds Catania 2011 (HSCT11)
==================================

This is the Heart Sounds Catania 2011 Database, a collection of heart sounds
to be used for research purpose in the field of heart-sounds biometry,
collected by the University of Catania, Italy.

If you use this database for your research, please cite the following paper in
your bibliography:

A. Spadaccini and F. Beritelli, "Performance Evaluation of Heart Sounds
Biometric Systems on an Open Dataset", in Proceedings of the 18th IEEE
International Conference on Digital Signal Processing, 1-3 July 2013,
Santorini, Greece.

Description of the database
---------------------------
The database contains contains heart sounds acquired from 206 people, i.e. 157
male and 49 female.  The sensor used for the acquisition is a ThinkLabs Rhythm
Digital Electronic Stethoscope; the files were acquired using a sampling
frequency of 11025 Hz and 16 bits per sample, and are stored using the WAVE
format.

During the acquisition phase the person was sitting, in resting state, and the
stethoscope was positioned near the pulmonary valve. 

The filenames encode the following metadata about the
person:
- the first character encodes the sex of the person (M or F);
- the next 4 characters are the numeric ID of the person;
- the next character encodes the heart valve used for the auscultation (M:
  mitral, P: pulmonary, A: aortic, T: tricuspid); this database contains 
  only sequences recorded near the pulmonary valve;
- the next character encodes whether the recording was done with the subject
  in resting condition (N) or after some light physical activity (C); so far the
  database contains only sequences recorded in resting condition;
- the next 3 characters encode the sequential number of the registration
  acquired from a given person; the first of these 3 characters is always the
  letter R.
- the next 7 characters encode the date of the acquisition; the first one is
  always a letter D, the others represent the date in the format MMDDYY;
- the next 7 characters encode the birth date of the subject; the first one
  is always a letter N, the others represent the date in the format MMDDYY;

The letters between fields could have been avoided since the fields have a
fixed length, but they have been inserted because they make it easier for
human eyes to scan the filename and extract the required information. An
example filename is: F7007NR01D290610N051077.wav.

Evaluation protocol
-------------------
The comparison should be done in the following way: for each person, one
sequence is used for the model training phase and one is used for the
computation of matching scores.
Let X be a given person, Xa its first recording and Xb its second recording;
also let D be the set of all the people in the database, and let N = |D| = 206
be the number of people in it. Let S be the matching function that, given an
identity model and a recording gives a similarity score.  For each person, the
database user should compute one genuine matching score, that is S(MX, Xb),
and N - 1 impostor matching scores S(MY, Xb), for each Y in {D \ X}. This will
yield N genuine matching scores and N x (N - 1) impostor matching scores.

The baseline EER (Equal Error Rate) value for this database is 13.66 %,
obtained used one of the systems described in the paper mentioned in the
introduction. The system uses the UBM/GMM method and is based on the
Alize/LIA_RAL toolkit.

Contacts
--------
For enquiries related to the database or to research activities on
heart-sounds biometry, please contact:

    Prof. Francesco Beritelli <francesco.beritelli@dieei.unict.it>
    Ing. Andrea Spadaccini <andrea.spadaccini@dieei.unict.it>