abstract Background and objectives: P300 is an Event Related Potential control signal widely used in Brain Computer Interfaces. Using the oddball paradigm, a P300 speller allows a human to spell letters through P300 events produced by his/her brain. One of the most common issues in the detection of this event is that its structure may differ between different subjects and over time for a specific subject. The main purpose of this work is to deal with this inherent variability and identify the main structure of P300 using algorithmic clustering based on string compression. Methods: In this work, we make use of the Normalized Compression Distance (NCD) to extract the main structure of the signal regardless of its inherent variability. In order to apply compression distances, we carry out a novel signal-to-ASCII process that transforms and merges different events into suitable objects to be used by a compression algorithm. Once the ASCII objects are created, we use NCD-driven clustering as a tool to analyze if our object creation method suitably represents the information contained in the signals and to explore if compression distances are a valid tool for identifying P300 structure. With the purpose of increasing the level of generalization of our study, we apply two different clustering methods: a hierarchical clustering algorithm based on the minimum quartet tree method and a multidimensional projection method. Results:Our experimental results show good clustering performance over different experiments, showing the structure extraction capabilities of our procedure. Two datasets with recordings in different scenarios were used to analyze the problem and validate our results, respectively. It has to be pointed out that when the clustering performance over individual electrodes is analyzed, higher P300 activity is found in similar regions to other articles using the same datasets. This suggests that our approach might be used as an electrode-selection criteria. Conclusions: The proposed NCD-driven clustering methodology can be used to discover the structural characteristics of EEG and thereby, it is suitable as a complementary methodology for the P300 analysis. © 2019 Elsevier B.V.
keywords brain computer interface clustering by compression data mining dendrogram kolmogorov complexity multidimensional projections normalized compression distance silhouette coefficient similarity brain computer interface computational complexity data mining electrodes trees (mathematics) dendrograms kolmogorov complexity multidimensional projections normalized compression distance silhouette coefficient similarity clustering algorithms algorithm article brain computer interface brain function chemical structure data compression electroencephalography event related potential phylogenetic tree signal processing steady state visual evoked potential algorithm brain brain computer interface cluster analysis computer simulation electrode human information processing physiology procedures algorithms brain brain-computer interfaces cluster analysis computer simulation data compression electrodes electroencephalography event-related potentials; p300 humans