Table Of ContentInternational Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
C ECG A
LASSIFICATION OF RRHYTHMIAS USING
D W T N
ISCRETE AVELET RANSFORM AND EURAL
N
ETWORKS
Maedeh Kiani Sarkaleh and Asadollah Shahbahrami
Department of Computer Engineering, Faculty of Engineering
P.O.Box: 3756-41635
University of Guilan, Rasht, Iran
[email protected]
ABSTRACT
Automatic recognition of cardiac arrhythmias is important for diagnosis of cardiac abnormalies. Several
algorithms have been proposed to classify ECG arrhythmias; however, they cannot perform very well.
Therefore, in this paper, an expert system for ElectroCardioGram (ECG) arrhythmia classification is
proposed. Discrete wavelet transform is used for processing ECG recordings, and extracting some
features, and the Multi-Layer Perceptron (MLP) neural network performs the classification task. Two types
of arrhythmias can be detected by the proposed system. Some recordings of the MIT-BIH arrhythmias
database have been used for training and testing our neural network based classifier. The simulation
results show that the classification accuracy of our algorithm is 96.5% using 10 files including normal and
two arrhythmias.
KEYWORDS
ECG, Arrhythmia, Daubechies Wavelets, Discrete Wavelet Transform (DWT), Neural Network (NN).
1. INTRODUCTION
The ElectroCardioGram (ECG) signal is an important signal among all bioelectrical signals.
Analysis of the ECG signal is widely used in the diagnosis of many cardiac disorders. It can be
recorded from the wave passage of the depolarization and repolarization processes in the heart.
The potential in the heart tissues is conducted to the body surface where it is measured using
electrodes.
Figure 1 illustrates two periods of the normal ECG signal. The P, Q, R, S and T waves are the
most important characteristic features of the ECG. The peaked area in the ECG beat, commonly
called QRS complex, together with its neighboring P wave and T wave, is the portion of the
signal through to contain most of the diagnostically important information. Other important
information includes the elevation of the ST segment and heartbeat rate, the RR or PP.
DOI : 10.5121/ijcsea.2012.2101 1
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
Figure 1. Two periods of the normal ECG signal [33].
The shape of ECG conveys very important hidden information in its structure. The amplitude and
duration of each wave in ECG signals are often used for the manual analysis. Thus, the volume of
the data being enormous and the manual analysis is tedious and very time-consuming task.
Naturally, the possibility of the analyst missing vital information is high. Therefore, medical
diagnostics can be performed using computer-based analysis and classification techniques [1].
Several algorithms have been proposed to classify ECG heartbeat patterns based on the features
extracted from the ECG signals. Fourier transform analysis provides the signal spectrum or range
of frequency amplitudes within the signal; however, Fourier transform only provides the spectral
components, not their temporal relationships. Wavelets can provide a time versus frequency
representation of the signal and work well on non-stationary data [2-4]. Other algorithms use
morphological features [5], heartbeat temporal intervals [6], frequency domain features and
multifractal analysis [7]. Biomedical signal processing algorithms require appropriate classifiers
in order to best categorize different ECG signals. In 1976, Shortliffe presented an early computer-
aided diagnostic system for diagnosis and treatment of symptoms of bacterial infection [8].
Classification techniques for ECG patterns include linear discriminate analysis [2], support vector
machines [9], artificial neural networks [10-14], mixture-of-experts algorithms [12], and
statistical Markov models [15, 16]. In addition, unsupervised clustering of the ECG signal has
been performed using self-organizing maps in [17].
Besides the fact the ECG record can be noisy, the main problem in computer-based classification
is the wide variety in the shape of beats belonging to the same class and beats of similar shape
belonging to different classes [18, 19]. Computer-based diagnosis algorithms have generally three
steps, namely: EGC beat detection, extraction of useful features from beats, and classification.
In this paper, we propose a Neural Network (NN) based algorithm for classification of Paced Beat
(PB), Artial Premature Beat (APB) arrhythmias as well as the normal signal. Our algorithm uses
some features obtained by the Discrete Wavelet Transform (DWT) along with timing interval
features to train an MLP NN. We extract some important features from the wavelet coefficients to
achieve both an accurate and a robust NN based classifier by using a number of training patterns.
2
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
This paper is organized as follows. Section 2 describes the background information about DWT
and Artificial Neural Networks (ANNs). The proposed feature extraction technique as well as our
classification system is present in Section 3. Section 4 explains the experimental results.
Conclusions are drawn in Section 5.
2. BACKGROUND
2.1. Discrete Wavelet Transform
The wavelet transform was presented at the beginning of the 1980s by Morlet, who used it to
evaluate seismic data [16]. Wavelets provide an alternative to classical Fourier algorithms for one
and multi-dimensional data analysis and synthesis, and have numerous applications such as in
mathematics, physics, and digital image processing. The wavelet transform can be applied in both
continuous-time signal and discrete-time signal. For example, The wavelet representation of a
discrete signal X consisting of N samples can be computed by convolving X with the Low-Pass
Filters (LPF) and High-Pass Filters (HPF) and down-sampling the output signal by 2, so that the
two frequency bands each contains N/2 samples.
This technique is based on the use of wavelets as the basis functions for representing other
functions. These basis functions have a finite support in time and frequency domain. Multi-
resolution analysis is achieved by using the mother wavelet, and a family of wavelets generated
by translations and dilations of it [20, 21]. There are different approaches to implement the 2D
DWT such as traditional convolution-based and lifting scheme methods. The convolutional
algorithms apply filtering by multiplying the filter coefficients with the input samples and
accumulating the results. These algorithms are implemented using Finite Impulse Response (FIR)
filter banks. The lifting scheme has been proposed for the efficient implementation of the wavelet
transform. This approach has three phases, namely: split, predict, and update [22, 23, 24]. In one-
dimensional DWT, at each decomposition level, the HPF associated with scaling function
produces detail information which is related to high-frequency components, while the LPF
associated with scaling function produces coarse approximations, which are related to low-
frequency components of the signal. The approximation part can be iteratively decomposed. This
process for two-level decomposition is depicted in Figure 2. A signal is broken down into many
lower resolution components. This operation is called the wavelet decomposition tree [24].
Figure 2. Sub-band decomposition of discrete wavelet transform implementation.
The wavelet transform is reversible. The reconstruction is the reverse process of decomposition.
The approximation and detail wavelet coefficients at every level are up sampled by two, passed
through the LPF and HPF and then added. This process is continued through the same number of
levels as in the decomposition process to obtain the original signal. Figure 3 depicts this process.
3
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
↑2 HPF
↑2 HPF X[n]
↑2 LPF
↑2 LPF
Figure 3. Wavelet reconstruction process.
Various wavelet families are defined in the literature. Daubechies wavelets are the most popular
wavelets. The Daubechies wavelets are used in different applications. The wavelets filters are
selected based on their ability to analyze the signal and their shape in an application. Figure 4
shows nine members of the Daubechies family.
Figure 4. Nine members of the Daubechies family.
The ECG signals are considered as representative signals of cardiac physiology, useful in
diagnosing cardiac disorders. The most complete way to display this information is to perform
spectral analysis. The wavelet transform provides very general technique which can be applied to
many signal processing applications. Different features can be computed and manipulated in
compressed domain using wavelet coefficients. All these means that we can apply the wavelet
transform on the ECG signal and convert it to the wavelet coefficients or parameters. The
obtained coefficients characterize the behavior of the ECG signal and the number of these
coefficients are small than the number of the original signal. This reduction of feature space is
particularly important for recognition and diagnostic purposes [25].
2.2. Artificial Neural Networks
The Artificial Neural Networks (ANN) are the tools, which can be used to model human
cognition or neural biology using mathematical operations. An ANN is a processing element. It
has has certain performance characteristics in common with biological neural networks. A neural
network is characterized by 1) its pattern of connections between the neurons (called its
architecture), 2) its algorithm of determining the weights on the connections (called its training,
or learning algorithm), and 3) its activation function [26]. The MultiLayer Perceptron (MLP) is
the most common neural network. This type of neural network is known as a supervised network
because it requires a desired output in order to learn. The purpose of the MLP is to develop a
model that correctly maps the input data to the output using historical data so that the model can
4
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
then be used to produce the output result when the desired output is unknown. A graphical
representation of an MLP is shown in Figure 5.
Figure 5. MLP architecture with two hidden layers [34].
In the first step, the MLP is used to learn the behaviour of the input data using back-propagation
algorithm. This step is called the training phase. In the second step, the trained MLP is used to
test using unknown input data. The back-propagation algorithm compares the result that is
obtained in this step with the result that was expected. This kind of classification is called
supervised classification. The MLP computes the error signal using the obtained output and
desired output. The computed signal error is then fed back to the neural network and used to
adjust the weights such that with each iteration the error decreases and the neural model gets
closer and closer to produce the desired output. Figure 6 shows this process [24].
Figure 6. Neural network learning algorithm [34].
There are different training algorithms, while it is very difficult to know which training algorithm
is the fastest for a given problem. In order to determine the fastest training algorithm, many
parameters should be considered. For instance, the complexity of the problem, the number of
data points in the training set, the number of weights, and biases in the network, the and error
goal should be evaluated.
5
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
3. PROPOSED CLASSIFICATION SYSTEM
Figure 7 shows the block diagram of the proposed system. The system is based on wavelet
transform and neural networks. This system consists of two phases: the feature extraction phase
and the classification phase which are discussed in the following sections.
Figure 7. Block diagram of the proposed classification system.
3.1. Feature Extraction Phase
The first phase consists of two stages: preprocessing stage and processing stage. The
preprocessing stage improves the classification accuracy of any algorithm; because, it gives us
more accurate features.
The obtained ECG from body electrodes has the baseline noise. Baseline wander, which may
appear due to a number of factors arising from biological or instrument sources such as electrode
skin resistance, respiration, and amplifiers thermal drift. It is a low-frequency noise. In
preprocessing stage, the ECG signal is filtered using the moving average filter to eliminate the
baseline wander. This process is equivalent to LPF with the response of the smoothing given
using “Eq. (1)”.
1
y(i)= (x(i+N)+x(i+N −1)+...+x(i−N)) (1)
2N +1
Where, y(i) is the smoothed value for the ith data point, N is the number of neighboring data
points on either side of y(i), 2N+1 is the span and x is input vector [26].
6
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
Figure 8 depicts an original ECG signal along with its noise, which has the offset of 0.5. Figure 9
depicts the baseline eliminated ECG signal, which has the offset of 0.
ECG Signal e0103.dat
2.5
2
V) 1.5
m
e (
g
Volta 1
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Time (Seconds)
Figure 8. Original ECG signal with baseline noise which has the offset of 0.5.
2
1.5
V) 1
m
Voltage ( 0.5
0
-0.5
0 50 100 150 200 250 300 350 400 450 500
sample (n)
Figure 9. Baseline eliminated ECG signal which has the offset of 0
In the processing stage, the ECG features are extracted using selecting 2 Sec of an ECG records.
For feature extraction stage, we used DWT. As already mentioned there are many wavelet filters
to apply on a signal. We have selected the Daubechies wavelet of order 6 (db6). This is because
the Daubechies wavelet family is similar in shape to QRS complexes and their energy spectra are
concentrated around low frequencies. The number of decomposition levels was set to 8. In other
words, the ECG signals have been decomposed into the details D1-D8. In order to reduce the
dimensionality of the extracted feature vectors, 24 statistics over the set of the wavelet
coefficients were used from first level to eighth level. The sets of features are given bellow:
• Maximum of the wavelet coefficients in each level.
• Minimum of the wavelet coefficients in each level.
• Variance of the wavelet coefficients in each level.
7
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
3.2. Classification Phase
In the classification phase, we have used an MLP neural network. The best architecture of the
MLP NN is usually obtained using a trial-and-error process [27], [28]. Therefore, after running
many simulations, we chose an MLP NN with 24 input neurons, one hidden layer, and 2 linear
output neurons. Bipolar outputs using +1 and -1 numbers were used as the output target for the
three selected classes. The performance of the proposed MLP NN was tested using the Mean-
Squared Error (MSE) parameter. This error is computed using the differences between the actual
outputs and the outputs obtained by the trained NN.
In the neural network model that has been implemented using MATLAB programming tools,
there were several training algorithms which have a variety of different computation and storage
requirements. However, it is hard to find an algorithm that is well suited to all applications. In our
works, we tried to implement our algorithm using several high-performance algorithms such as
Variable Learning Rate (or “traingdx”), Resilient Backpropagation (or “trainrp”) and Reduced
Memory Levenberg-Marquardt (or “trainlm”) algorithms as the training algorithm. Two bipolar
outputs were used as the target of network. The targets for three classified patterns are: [1, 1] for
normal signal, [-1,-1] for PB arrhythmia, [-1, 1] for APB arrhythmia. Heuristic techniques are
used by traingdx and trainrp algorithms. The heuristic techniques were developed using an
analysis of the performance of the standard steepest descent algorithm. The trainlm algorithm
uses standard numerical optimization techniques [29]. The training curve for proposed MLP
neural network is depicted in Figure 10. As can be seen in the figure, the best and goal training
curve are overlapped.
Figure 10. Training curve for proposed MLP neural network.
4. EXPERIMENTS AND SIMULATION RESULTS
The MIT–BIH arrhythmia database consists of 48 ECG signal records. Each record comprises
several files, the signals, annotations and specifications of signal attributes. Each record of the
MIT–BIH database is 30 minutes selected from 24 hours. The sampling frequency of the ECG
signals in this database is 360Hz, and records are annotated throughout; by this we mean that
each beat is described by a label called an annotation. Typically an annotation file for an MIT–
BIH record contains about 2000 beat annotations, and smaller numbers of rhythm and signal
quality annotations.
8
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
In this study, we used 10 records from this database that is listed in Table 1 and we selected 500
samples from each record.
Table 1. Selected records from database.
Class Record numbers
Normal 103-115-121-123
PB 104-107-217
APB 209-220-223
The proposed MLP neural networks were trained using 90 training vectors from 10 files of the
MIT–BIH arrhythmia database. The trained MLP NNs were tested using 45 patterns (15 testing
patterns for each class) using 10 files including normal and two arrhythmias. In order to test the
performance of the trained MLP NN, We have used two criteria to compare the trained networks,
recognition samples and recognition rate. Recognition rate is defined as follows:
Nc
A=100 (2)
Nt
Where A is the recognition rate, Nc is the number of correctly classified patterns, and Nt
represents the total number of patterns.
The simulation results are depicted in Table 2, 3 and 4.The test results obtained by training the
proposed MLP NN using different number of neurons in the hidden layer.
Table 2 shows the results which have been obtained using training traingdx algorithm. We use 12,
13, and 14 neurons in the hidden layer. As can be seen the best performance is obtained using 13
neurons. The proper setting of the learning rate for traingdx algorithm is very important. This is
because its performance is very sensitive to the proper setting of this rate. For example, if the
learning rate is set too high, the algorithm can oscillate and it became unstable, while if the
learning rate is too small, the algorithm takes too long to converge. It is hard to determine and
obtain the optimal setting for this variable before trainings phase. This is because during the
training process, the optimal learning rate is changed.
Table 3 depicts results which have been calculated using training trainrp algorithm. It uses 10, 11,
and 12 neurons in the hidden layer. The best performance was obtained by using 11 neurons in
the hidden layer. The gradient can have a very small magnitude when the steepest descent has
been used to train the ANN with sigmoid functions. This leads small changes in the biases and the
weights, even though the biases and weights are far from their optimal values. The goal of the
trainrp algorithm is to remove these harmful effects of the magnitudes of the partial derivatives.
The obtained results using training trainlm algorithm is depicted in Table 4. We got the best
performance using 14 neurons among 13, 14, and 15 neurons in the hidden layer. This algorithm
is the fastest algorithm but the main drawback of this algorithm is that it requires the storage of
some matrices that can be quite large for certain problems. In addition, the best recognition rate of
the trained MLP NN with three train algorithms is 96.5%.
9
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.1, February 2012
In addition, Table 5 summarizes the results obtained by the other algorithms proposed in the
literature. As it can be seen in Table 5, our proposed algorithm has high performance in compared
to some other algorithms for ECG arrhythmias classification.
Table 2. Simulation result with traingdx algorithm.
# of hidden neurons # of hidden neurons
# of
Data 12 13 14 12 13 14
samples
Recognized samples Recognition rate (%)
Train 30 30 30 30 100 100 100
Normal
Test 15 12 13 13 80 86 86
Train 30 30 30 30 100 100 100
PB
Test 15 15 15 15 100 100 100
Train 30 30 30 30 100 100 100
APB
Test 15 13 14 13 86 93 86
Total 135 130 132 131 94.3 96.5 95.3
Table 3. Simulation result with trainrp algorithm.
# of hidden neurons # of hidden neurons
# of
Data 10 11 12 10 11 12
samples
Recognized samples Recognition rate (%)
Train 30 30 30 30 100 100 100
Normal
Test 15 12 13 12 80 86 80
Train 30 30 30 30 100 100 100
PB
Test 15 15 15 15 100 100 100
Train 30 30 30 30 100 100 100
APB
Test 15 13 14 14 86 93 93
Total 135 130 132 131 94.3 96.5 95.5
Table 4. Simulation result with trainlm algorithm.
# of hidden neurons # of hidden neurons
# of
Data 13 14 15 14 15 16
samples
Recognized samples Recognition rate (%)
Train 30 30 30 30 100 100 100
Normal
Test 15 13 13 12 86 86 80
10
Description:International Journal of Computer Science, Engineering and Applications (2002) “Classification of heart rate data using artificial neural network