A pairwise output coding method for multi-class EEG classification of a self-induced BCI

Article history: Received: 20 July 2017 Accepted: 13 March 2018 Available Online: 14 July 2018 In brain computer interface (BCI) research, electroencephalography (EEG) is the most widely used method due to its noninvasiveness, high temporal resolution and portability. Most of the EEG-based BCI studies are aimed at developing methodologies for signal processing, feature extraction and classification. In this study, an experimental EEG study was carried out with six subjects performing imagery mental and motor tasks. We present a multi-class EEG decoding with a novel pairwise output coding method of EEGs to improve the performance of selfinduced BCI systems. This method involves an augmented one-versus-one multiclass classification with less time and reduced number of electrodes. Furthermore, a train repetition number is introduced in the training step to optimize the data selection. The difference among right and left hemispheres is also searched. Finally, the difference between experienced and novice subjects is also observed. The experimental results have demonstrated that, the use of proposed classification algorithm produces high classification accuracies (98%) with nine channels. Reduced numbers of channels (four channels) have 100% accuracies for mental tasks and 87% accuracies for motor tasks with Support Vector Machines (SVM). The classification accuracies are quite high though the proposed one-versus-one technique worked well compared to the classical method. The results would be promising for a real-time study.


Introduction
A brain-computer interface (BCI) is a communication system that translates the brain signals into commands for communication or for controlling external devices without requiring any peripheral muscular activity [1][2][3].Electroencephalogram (EEG) is the most efficient and widely used recording modality in BCIs due to its non-invasive measurement procedure, portability and reasonable cost [2].Due to the large numbers of methodologies developed for signal processing, feature extraction and classification of EEG data, there are no gold standards on data processing and machine learning algorithms [4][5][6].The output of a BCI contains the decoding of the intended task and then it is transferred to the related device.The discrimination of the tasks is done with a classifier.A number of linear and nonlinear classifiers have been studied for classification of EEG signals under different conditions like Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Neural Networks (NN) and its special implementations, Bayes Quadratic, Common Spatial Patterns (CSP), Hidden Markov Models and hybrid classifiers [6][7][8][9][10].It is concluded from the literature [2][3][4] that, the EEG data has a non-Gaussian nature and differs from person to person so the features and classifiers should adapt that changing character.Therefore, it is difficult to suit a single feature extraction or classification algorithm for the EEG signal.Many BCI methodologies are tried in the literature and the studies that rely on mental task or motor imagery discrimination are regarded as flexible methods [5,6].The mental task based BCI studies enable independence to the user and to the system developer without the need for an extra monitor or a gaze tracker.Decoding mental tasks can be done with event related potentials (ERP), visually evoked potentials (VEP) and with EEG.ERPs and VEPs require additional interfaces, such as screens of alphanumeric characters or gaze tracking devices.The principal aim of this study is to introduce a multiclass EEG decoding for BCIs, with a novel pairwise output coding method.For building binary classifiers, one-versus-all (OVA) or one-versus-one (OVO) classification techniques are generally used.The choice between the two methods is based on the computation time and data storage.OVO seems faster and more memory efficient especially with LDA and SVM classifiers [11].In the classical OVO scheme, a classifier is trained between each pair of classes and the final class of a test sample can be predicted by the maxwin voting strategy.Whereas ties can arise in the voting and that could affect the final prediction badly [11].Different from the literature, a new modification is added to the validation step, in which a train repetition number is introduced, is implemented while determining the training data of the classifiers.The train data are randomly selected from the rest of the test set, and they are obtained with multiple training sessions.The final performance of the classifiers was compared with the classical OVO results showing a higher classification ability.
Moreover, this study introduces a new data set which was recorded during mental and motor task experiments at Mechanical Engineering Department of Karadeniz Technical University.Whereas most of the previous studies were generally conducted from common two data source which are Keirn and Aunon's five class mental task data [12] and and Schlögl and Pfurtscheller's four class motor imagery data [7].Huan and Palaniappan [13] designed a bi-state BCI for the five class mental task data given in [12], and they used three different feature extraction methods with NNclassifiers.Flores and friends [14] also developed an architecture based on adaptive neuro-fuzzy inference systems through recurrent neural networks.They used five class mental task data in [12].Solhjoo and friends [15] used the dataset in [7].They studied the performance of Hidden Markov Models in classification.Tolić and Jović [16] classified the wavelet transformed EEG signals with Neural Network.They studied with the mental task data from [12] and imagined motor tasks [17].Apart from their work, experimental paradigm enables a user centered flexible environment for performing real time BCI applications in the future.Furthermore, one more significant contribution of this study is the channel reduction to shorten the data processing time for online BCI applications.Depending on the demand from final-users of BCI applications, portable and easy-to-use devices are encouraged to be developed [4].This paper is organized as follows.Section 2 gives a brief description of the recording procedure of the experimental design and the data used.Section 3 presents the details of the feature extraction approach and Section 4 contains the improved one-versus-one classification mode for LDA and SVM classifiers.In Section 5, the multiclass classification performance of classifiers for six subjects is reported along with a discussion of the results.Finally, the concluding remarks are given in Section 6.

2.1.Experimental setup and data description
A 64 channel Biosemi ActiveTwo EEG system was used to record the EEG data.All experiments were carried out at the Mechanical Engineering Department of Karadeniz Technical University [18].Apart from the literature, the recordings were all performed eyes closed which enables participants to concentrate thoroughly.Furthermore, eye blinks and eye movements produce a high amplitude signal called electrooculogram (EOG) that can be many times greater than the EEG signals which are regarded as artifacts [19,20].For a good and accurate classification, the artifacts added to the EEG signal during the recording session must be removed from the signal itself.One way of solving this problem is to reject the eye blinked segments or the whole trial of that signal.However this can also cause to miss the valuable part of the data and additionally this could not be efficient when there is limited data [21].

1) Participants:
The study was performed with 6 healthy participants who were initially naïve to the use of an EEG and the tasks except Subject 1.The 5 men and 1 women, with a mean (standard deviation, SD) age of 30.5 (14.4), had no medical diseases and were all right-handed.Each volunteer participated in several sessions over a period of 2-3 weeks.All of the participants signed the informed consent form before the experiments.
2) Procedure : The subjects were seated on a comfortable chair in a dim lighted, silent room during the recordings.Before each trial, they were informed about the type of the task (resting state, multiplication, right hand, etc.) by auditory cues.The sequence of mental and motor tasks was as follows: resting state, mental arithmetic, imagination of right hand movement, imagination of left hand movement, and imagination of letter 'A' (see Figure 1).Each trial lasted 10 seconds and the interval between consecutive tasks was about 3-4 seconds.The first 2 seconds in trial were the task preparation time for the subject.The experiments comprised of 5 experimental runs of 20 trials each (100 trials per task in total).The details of each task are provided below: • Resting state (RS): The subjects were asked to sit and relax as much as possible without thinking anything.
• Mental arithmetic task (MA): The subjects were given a two-digit multiplication problem to solve in mind without vocalizing or any movement (e.g.24×76 =?).The problems were not repeated.After the trial, the subject verified whether he reached the solution or not.
• Right hand imagination task (RH): The subjects were told to imagine right hand movement.
• Left hand imagination task (LH): The subjects were required to imagine left hand movement.• Letter 'A' imagination task (LA): The subjects were told to imagine the letter 'A' in their mind.
3) Recordings: EEG data were recorded from the subjects during the experiment, using a 64-Channel Biosemi ActiveTwo EEG system with Ag/AgCl electrodes The international 10-20 electrode placement system was used.The grounding electrodes CMS and DRL were mounted on the back of the head.The EEG signals were sampled at 512 Hz. 4) Channel selection: We selected 9 channels from four different brain regions and hemispheres, i.e. frontal (F3/4), central (C3/4), parietal (P3/4, Pz) and occipital (O1/2).Because, each region was constituted with an EEG pair where different EEG rhythms can distinguish patterns of neuronal activity associated with specific motor and cognitive processing functions.Any change in brain patterns could result from different forms of processing or computation in the brain and represent different rhythmic states [7,[9][10][11]13,16,21].Hence, the alpha wave can be detected primarily from the occipital lobe (O1 and O2), but also from the parietal (P3 and P4) and frontal regions (F3 and F4) of the cerebral cortex.The motor imagery of human right/ left hand is typically reflected in EEG spectra in the beta rhythm obtained from C3 and C4, and mental arithmetic is mostly reflected in frontal cortex at F3 and F4.
Feature extraction and classification were performed at each single channel.

Data preprocessing
The raw data obtained from the Biosemi system is transferred to the Matlab environment.The raw data was normalized by Cz channel's data by subtracting it from the remaining 63 channels.The data were visually inspected, the beginning two seconds part was excluded from the entire 10 seconds signal because of the rough changes at the beginning of the imagination task.Then, the rest 8 seconds signal is divided into two 4 seconds signal by making 200 trials for each task.At the final stage, totally 6000 data samples (200 data samples x 5 tasks x 6 subjects) are collected to be analyzed.For online BCI studies, the user could be trained for some trials before the online application instead of excluding the data.The signal analyses are done on the data samples of 10 channels separately, (F3, F4, C3, C4, Cz, P3, P4, Pz, O1 and O2) which are given in Figure 2.
Before the feature extraction step, the EEG signal was filtered using the 10 th order 50Hz low-pass digital Butterworth filter.with the selected 10 channels in yellow

Feature extraction
Many feature extraction methods from basic [22] to highly complex ones [23][24][25] are proposed in BCI history.As the accuracy, ease of use, efficiency and speed are important parameters to consider [26], the feature extraction approach proposed in [27] is used in this study.This method relies on the band powers of EEG signal which is a common and powerful technique to distinguish different frequencies [28][29][30].There, a stable pattern in the PSD was observed with different amplitudes for all subjects and for all tasks.This biologically phenomenon allows a classification between different mental tasks.Based on this biological phenomenon, we extracted three features from the alpha (8-13 Hz) and beta (13-30 Hz) bands of PSD by searching the local peak values in the alpha and beta bands separately.

Classification taxonomy
Classification is the act of assigning a predefined class to each instance.For this discrimination, we used LDA and SVM classification methods because of their good performances [6,7].Both of these classification methods were originally designed for binary classification, but in this study they are adapted for multi-class problems.The simplest form is to build independent binary problems and to predict the score according to each binary classifiers result.The structure of binary classifiers are build with one-versus-all (OVA), one-versus-one (OVO) or the error correcting code techniques which are commonly used.The decision between OVO and OVA totally depends on the data type.It is stated that OVO is faster and more memory efficient than the OVA [31].Regarding with this truth, an extended approach of OVO technique which is tested with LDA and SVM is developed in this study.

LDA classifier
LDA is an easily implemented, classical classification method.Hence, it is very popular and is often used as the baseline method for comparison with different classification methods.In this method, the data is projected onto a lower-dimensional vector space such that the ratio of the between-class distance to the within-class distance is maximized.Therefore, maximum discrimination is obtained.The optimal projection can be computed by applying the eigendecomposition on the scatter matrices.According to Fisher's two class LDA, the multivariate observations x are transformed to univariate observations y such that the y's derived from the two classes are separated as much as possible [8].First of all, the feature vector x is mapped by the following linear transformation in (1): T y=V x (1) where V represents the projection matrix.It is determined by maximizing the ratio of between-class variance to within-class variance.The within-class variance matrix and between-class variance matrix are defined below with equations ( 2) and (3): ) where K is the number of classes, and µi is the mean vector of the class i, Li is the number of samples within-class i and µ is the mean of the entire training sample set.The projection matrix V is calculated by . Once the transformation is done, the classification is then performed in the transformed space based on some distance metric, such as Euclidean distance given in (4); The final class is attained as arg min ( , upon the arrival of the new instance z (a row vector).
Here, k  is the centroid of the k-th class.
The extended OVO approach is built to apply this binary LDA method to multiclass classification.We implemented a computer program for LDA algorithm in MATLAB® for two-class and multi-class classification.
The multi-class case consists of several two-class runs.
In a classical OVO approach, K(K-1)/2 binary classifiers are built.A new example is tested according to the max-win voting strategy among the classifiers, and the class with the maximum number of votes is assigned.In some cases, ties can arise and the computation for that run is neglected.This is disadvantageous when there is limited data.Apart from the existing OVO approach, the proposed approach in this study uses K binary classifiers to classify the K class data.The data and the class labels are introduced as    used 50% of the data for testing and 50% for training which means that the data of a subject is split equally as training and testing.This selection is made randomly by a two-fold cross validation in each run.
Step 1: For each binary classifier, select 50 train data for all task pairs.A task pair contain 100 data.K=5 binary classifiers are built as a module, for instance k1-k1,k1-k2,k1-k3,k1-k4,k1-k5, as depicted in Figure 4.As the proposed method constructs K task pairs for K class data, we use ½ of the train data of each class.Label half of the train data as +1 and -1 in each classifier.As a result of this methodology, 50 train data of the same class will be labeled as +1 and another 50 train data from the same class will be labeled as -1 (k1-k1).The rest of the pairs (k1-k2,k1-k3,k1-k4,k1-k5) are constructed such that 50 train data from one class will be labeled as +1 and another 50 train data from the other class will be labeled as -1.
Step 2: Take another random 50 test data which are totally different from the train data, from any of the classes.As we don't know the test class label, we will compare it with all the K classes.By considering k1-k1 case, if it's true class is k1, the greatest possible majority of a test sample will definitely be 50.The 50 is not the final classification performance value, it is only a mathematical way for us to predict the class labels.Alike pair's comparisons will be different from 50.It is the key point which enables us to predict the test class label.The program is designed to compute all nine channels' percentages separately.The highest classification performance results are observed on all of the nine channels.Then, the predicted class labels are attained by using max-win voting strategy among nine channels.The predicted class labels are displayed as P1, P2, P3, P4 and P5 in Figure 4.In Table 1, the K class module results that give way to decision on final class label is indicated.You will see from Table 1 that, the task pairs Task1-Task1, Task2-Task2, Task3-Task3, Task4-Task4 and Task5-Task5 have values around 50.The task pair results are independent from each other.

Figure 4. Schematic explanation for extended OVO approach
Step 3: Finally, the overall program is repeated 100 times to obtain a mean value.

SVM classifier
SVM is a powerful classifier which has demonstrated its excellent generalization properties in various BCI applications [7,8,10].The basic idea of SVM is finding the optimal separation hyperplane by maximizing the margin.
The general SVM solution is obtained from the following optimization problem [32] given in (5): ,, 1 1 min 2 ), 0, 1,......,   is the user defined parameter which shows the kernel function's width.MATLAB's SVM package which is originally designed for binary classification is applied for multiclass classification according to the improved OVO approach that is used in LDA.

Results and discussion
The classification performance of the proposed approach and the regular OVO with LDA and SVM were given in Table 2 for all six subjects and nine channels.The classification performance per mental and motor task is defined as the number of correct predictions in a run over the total number of data points in that run and expressed in % for 100 times trials.In the tables, a two group representation is performed.
First group represents the results of extended OVO approach and the next group shows the results of regular OVO approach.The leftmost column represents the subjects from one to six, the next column shows the classifiers, the following five columns represent the classification accuracies of five tasks and the last column represents the mean values of all tasks.All computations were done by using the same number of train and test data.It is seen from Hence, more trained subjects (Subject 1, familiar with EEG and BCI studies) produce a higher classification performance.To be able compare the proposed extended OVO approach with the regular OVO approach, the classification accuracies of five tasks and the mean values of all tasks were also included in Table 2.The maximum mean classification performances of subjects were achieved by Subject 2 as 69.36% with LDA and 61.95 % with SVM which are very poor when compared with extended OVO results (81.38 % LDA and 89.98 % SVM).The mean classification results calculated due to six subjects and also for five tasks show that extended OVO approach performs better than the regular method which is the superiority of the proposed approach.The reason for this performance drop may be the occurrence of many ties at the max-win voting strategy during the final class decision of regular OVO approach.It is clear in Table 2 that the classification performance of Task 1 and Task 2 data samples are very high for all six subjects with both methods.An important finding from Table 2 is that Task 3 and Task 4 (the motor imagery tasks) have very low performance results for both OVO approaches (54.83% mean with extended OVO and 58.37% mean with regular OVO).The computation times for both classifiers are given in Table 7.The proposed method has shorter computation times (0.9 seconds for LDA and 2.42 seconds for SVM) than the regular OVO approach (3.44 seconds with LDA and 4.21 seconds with SVM).
Another point which is observed for this subject is reducing the number of electrodes from 9 to four depends on the cortex placements (Table 3).Four electrodes F3, F4, C3 and C4 are selected because they are at the frontal regions and sensorimotor area of the cortex and moreover they are more important for recognition of mental and motor tasks than the rest of the channels.The results of this study also supports this truth that during motor tasks while the classification accuracies with four channels (F3, F4, C3, C4) are higher, increasing the number of channels to nine does not increase the classification accuracy.So, it can be concluded that, the use of four channels (F3, F4, C3 and C4) data is enough also for the motor task classifications.
In Tables 4-5, the classification results of each classifier for left (F3, C3, P3, O1) and right (F4, C4, P4, O2) hemisphere electrodes and the midline electrode (Pz) is given.The classification results of (F3 C3 P3 O1 Pz) are as follows: for Task 1 100 with LDA and 100 with SVM, for Task 2 95.10 with LDA and 100 with SVM, for Task 3 47.54 with LDA and 81.44 with SVM, and for Task 4 57.66 with LDA and 69.06 with SVM and finally for Task 5 90.64 with LDA and 99 with SVM.Whereas, the classification results at Table 4 are very close to the results of Table 5 which means that we cannot judge about hemispheric changes under these circumstances for this subject.However, during the imaginary tasks, the performance of the task results are much more dependent on how well the imagination performed.Both 5-electrode configurations had better classification results for right than for left arm movement imagination (see Tables 4-5), instead for having better results for the corresponding arm movement.A possible explanation for this fact could be that the dominant hand characteristics may affect the classification results.
One important point of this study is to obtain an ideal training data without discarding noisy or bad data during the analysis.While working with an online BCI system, it would be difficult to discard the data.For this reason, a parameter called the train repetition number is introduced to select a fine train data set.The train data set is randomly selected for n times (n=1,…,5).The selection of train data set is different from crossvalidation.Here, the selection of a fine train data set is searched.First, the data is split to equal number of train and test sets.Then, by keeping the test set unchanged, train set is formed after n repetitions (n=1,…,5).As the n increases, the computation time increases which is not preferred for online applications.The effect of repetition number to the computation time for LDA and SVM classifiers is given in Table 6.It is obvious from the table that, the response time of LDA is faster than the SVM's and reducing the number of electrodes used from nine to four makes a 0.25 seconds time consuming with LDA, on the other hand, this is about 0.56 seconds with SVM.It was observed that, the classification performances of motor tasks are so low compared to the other tasks.
Using several train repetition numbers also affects the classification accuracy of Right Hand Imagination task results which are displayed in Figure 5 and Figure 6.By considering Figure 6 and Table 6, the train repetition number three is suggested for optimum classification accuracy.As it is noticed in Figures 5-6, any increase in train repetition number also increases accuracy at 15.2 in LDA and at 12.1 in SVM for right hand imagination task.The accuracy, sensitivity and specificity values for the classifiers are obtained for the two class case and the results of a binary SVM are given in Table 8.Sensitivity and specificity measures are used to measure the statistical performance of a binary classification test.Sensitivity is defined as the proportion of number of true positives to the total number of true positives and number of false negatives.On the other hand, specificity is defined as the ratio of number of true negatives over the total number of true negatives and false positives.The definition of sensitivity and specificity are given in

Conclusion
The main research finding of this study is proposing an alternative solution step that brings about an extended approach for one-versus-one classification of data.With this method, the computation time and the data storage are lessened.One other finding is the necessity of certain electrode channels required for BCI systems.For mobile BCI systems, reduced number of all technical equipment including electrode channels are preferred.Therefore, in this study, a four channel system provides results that are on par with more channels which is a success.We obtained better classification performance with SVM; on the other hand less computation time with LDA which is a fact.A thorough comparison between mental and motor tasks and between right and left hemispheres were searched.For homogenous seperation of train data, a repetition number is introduced.Moreover, the difference between the experinced and novice subjects were searched and it is concluded that the a short training period for subjects before the online applications will improve the overall performance.
It is also observed that, selecting the proper electrode channel is an important task.In this study, it is concluded that, the use of frontal and central lobe electrodes would be enough to distinguish some basic tasks especially mental and motor tasks separately with the proposed features and classifiers.
The main contribution of this paper is its original extended OVO output coding methodology which can be used instead of regular OVO algorithm during the multiclass classification scenarios.An extra contribution is the use of less channel data that reduces the processing time and producing a quick response.

Figure 1 .
Figure 1.Order of the experiments

Figure 2 .
Figure 2. International 10-20 electrode placement system with the selected 10 channels in yellow

Figure 3 .
Figure 3. Proposed feature extraction scheme.The curve indicates the PSD of EEG in alpha and beta bands.We select the highest PSD peak value in the alpha band and first two highest PSD peak values in the beta band as the features.

0 i
 and c>0 have to be introduced.The output of a binary SVM classifier can be computed by the following expression in (Lagrangian multipliers obtained by solving a quadratic optimization problem, and ( , ) ij k x x is called Kernel function.The most commonly used kernel function is the Gaussian RBF function which is also used in this study as in (7 script.Accuracy is used to see how well the result of a binary classifier correctly identified [33].Hence, an accuracy of 100 means that the tested values are exactly the same as the true values.The results are obtained for each electrode channel separately.The overall accuracies of Task 1 are around 90.The overall sensitivity values are around 87 and the overall specificity values are around 92.These results also support the good classification ability of the proposed method and the classifiers.

Figure 5 .Figure 6 .
Figure 5. Change in classification accuracy for right hand imagination with LDA

Table 1 .
Proposed final class label decision table Table 2 that, the data of Task1 is finely discriminated from the rest in all the subjects.The highest classification rates are 100 with SVM and 99.96 with LDA for Subject1.It is 99.90 for Subject 2 with SVM, 63.90 for Subject 3 with SVM, 75.92 for Subject 4 with SVM, 83.44 for Subject 5 with SVM and finally 79.39 for Subject 6 with LDA.Moreover, the classification performances of Task 2 and Task 5 are fairly good for Subject 1 and Subject 2, and medially for the rest four subjects.The classification results of motor tasks, Task 3 and Task 4 which are Imagination of Right Hand and Imagination of Left Hand, are lower than the mental task results.The highest rates are again observed with Subject 1 which is 75.44 for Task 3 and 75.20 for Task 4 which are obtained with SVM.The highest mean values of subjects for different tasks with LDA are 79.34 and 81.64 with SVM.The overall performances of all subjects show that the results are subject dependent.

Table 2 .
Multi-class classification results with standard deviations for 9 electrodes with extended OVO and regular OVO approaches.S:Subjects, C:Classifiers, M:Mean

Table 7 .
All calculations are performed by writing a MATLAB

Table 6 .
Effect of repetition number to computation time.C:Classifiers, NE:Number of electrodes

Table 7 .
Calculation of sensitivity and specificity

Table 8 .
Accuracy, Sensitivity and Specificity values for a two-class SVM Finally, the study imparts that multiclass SVM using extended technique and combined with the proposed feature extraction algorithm can be used for classification of motor task EEG signals for various applications when verified with more subjects.For further study, the current results obtained from this study would be supported with different BCI data sets.[33]Anderson,C.W., Stolz, E.A. & Shamsunder, S.Dr.Özmenreceived her Bachelor Degree from Balıkesir University, Department of Mechanical Engineering in 2001.She got her Msc and PhD Degrees from Karadeniz Technical University, Department of Mechanical Engineering in 2005 and 2010 respectively.She studied at University of Gent, Belgium in 2005-2006 during her PhD.She worked as a visiting Professor at the Technology University of Delft, Netherlands in 2014-2015 with TUBITAK's 2219-scholarship.She worked on the" 4D-EEG project" which is an Advanced Grant from the European Research Council, funded under the Seventh Framework Programme (FP7).ERC n. 291339-4D-EEG.Her research topics include robotics, biomedical signal processing and control applications.She is fluent in English with speaking and writing.Prof. Gümüşel is at Department of Mechanical Engineering of Karadeniz Technical University.He received his Msc, and Ph.D degrees from The Catholic University of America, Washington DC., ABD in 1986 and 1990 respectively.His research interests include robotics, mechatronics, and dynamics of machinery, brain computer interfaces, and automatic control of systems.He has several papers on robotics and control systems.He works on different scientific projects.He is fluent in English.