Application of supervised machine learning algorithms for the evaluation of utricular function on patients with Meniere’s disease: utilizing subjective visual vertical and ocular-vestibular-evoked myogenic potentials

Abstract Background Research on the otolith organs remains inconclusive. Objectives This study seeks to further elucidate utricular function in patients with Meniere’s disease (MD) in three ways: (1) We aimed to disambiguate the role of the Subjective Visual Vertical (SVV) and Ocular Vestibular Evoked Myogenic Potential (o-VEMP) tests regarding which utricular subsystem each is measuring. (2) We sought to characterize the acute and chronic state of MD by identifying differences in the relationship of SVV and o-VEMP results across patients with acute and chronic MD. (3) We attempted to find a machine-learning algorithm that could predict acute versus chronic MD using SVV and o-VEMP. Methods A prospective study with ninety subjects. Results (1) SVV and o-VEMP tests were found to have a moderate linear relationship in patients with acute MD, suggesting each test measures a different utricular subsystem. (2) Regression analyses statistically differed across the two patient populations, suggesting that SVV results were normalized in chronic MD patients. (3) Logistic regression and Naïve Bayes algorithms were found to predict acute and chronic MD accurately. Significance A better understanding of what diagnostic tests measure will lead to a better classification system for MD and more targeted treatment options in the future.


Introduction
In 1861, Dr. Prosper Meniere proposed the existence of a disorder initially thought to originate from the central nervous system. Rather than a central neurological disorder, Dr. Meniere instead attributed the causes of symptoms such as vertigo, tinnitus, and hearing loss to an inner ear labyrinthine dysfunction [1]. These symptoms are now understood as the main constituents of a single syndrome called Meniere's disease (MD). The currently accepted American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) criteria for definite MD (revised in 2020) include two or more spontaneous episodes of vertigo lasting 20 min to 12 h, audiometrically documented fluctuating hearing loss on at least one occasion, tinnitus or aural fullness in the affected ear, and other causes excluded by other tests [2,3]. While many previous studies have investigated the function of the semicircular canals in patients with MD, studies on the function of the otolithic organs are still inconclusive [4,5].
Originally thought to be uniform in its function, recent studies on the otolith organs have suggested two different systems within the utricle: sustained (static) and transient (dynamic). These two systems differ in the type of receptor their hair cells are anchored to and the type of stimulus they are most sensitive to. Hair cells corresponding to the sustained system are firmly anchored to the utricular macula via barrel-shaped type II receptors and respond best to long, static stimuli [6]. These two types of hair cells are found all over the utricular macula. By contrast, hair cells corresponding to the transient system are loosely anchored to the utricular macula via amphora-shaped type I receptors and respond best to short, rapid stimuli. Transient hair cells are primarily found near the striola and can rapidly adapt to maintained stimulation. Other physiological differences between the two utricular systems include afferent nerve axon diameter, conduction velocity, and the presence of phase locking [6].
The anatomical and physiological differences between the transient and sustained systems suggest these two systems could react differently to diagnostic tests that evaluate utricular function. Previously published data supports this claim. Immediately after the surgical unilateral vestibular loss, such as acoustic neuroma removal or labyrinthectomy, the results of a Subjective Visual Vertical (SVV) test initially appear asymmetrical. However, over time the effect of unilateral loss on SVV is not consistent such that the sustained function of the otolithic organ does not clearly detect a chronic unilateral vestibular loss [4]. It has been speculated that processes of vestibular compensation might contribute to reducing any asymmetry. The data from studies of SVV on patients after vestibular neuritis also show significant variability. Nevertheless, it is not known conclusively whether otolith function may have returned after the neuritis [7]. By contrast, the asymmetrical VEMPs are preserved for many years and probably permanently after the vestibular loss [8].
Machine learning is an increasingly important technology dealing with the growing complexity of the digitized world. One of the key advantages of machine learning is that it is an efficient tool that is free from the shackles of linearity assumptions and variable characteristic constraints. Therefore, machine learning has the potential to change clinical diagnostics, outcome assessment, and also cost analysis [9]. While we live in a "big data world" where almost 'everything' is digitally stored, such as bioinformatics, there are numerous areas of biomedical research where researchers are still faced with small data samples as in patient-outcome research. Most recent studies have shown that highquality small samples can overcome the limitation of a small data set using a machine learning algorithm if the power of machine learning in recognizing patterns is usually proportional to the size of the data set [10]. Using an adequate unsupervised machine learning algorithm on high-quality small sample size data could yield better outcomes compared with a low-quality large sample [10].
We had three aims in this study. The first aim was to analyze the association of SVV with o-VEMP data in a population of unilateral definite MD patients. We hypothesized that SVV and o-VEMP results most likely do not correlate in this patient population. Our reasoning behind this hypothesis is that the SVV and o-VEMP tests may exclusively measure different utricular systems. Specifically, we predict that the SVV measures the sustained system and the o-VEMP measures the transient system. Therefore, we expect there to be no relationship between the asymmetry ratios (ARs) of each test. The second aim was to compare the association of SVV with o-VEMP data between newly diagnosed acute definite MD and chronic MD. We hypothesized that SVV and o-VEMP are correlated with each other in acute phases MD, but not necessarily in chronic phase MD. A previous study with a small number of cases has already shown that SVV is most often abnormal in the acute phase of vestibular dysfunction and returns to near normal thereafter in an acute phase of MD [11]. In our third aim, we wished to establish a better classification algorithm to predict acute vs. chronic state of MDs by utilizing SVV and VEMP ARs. We hypothesized that by choosing our dataset carefully, we could overcome the limitation of a small data set in machine learning algorithms, leading to better classification of our data. For this purpose, we utilized linear regression analysis and unsupervised machine learning algorithms to classify patients with unilateral MD using SVV and o-VEMP data to predict their nature of MD (i.e. acute vs. chronic).

Subjects
This is a prospective cohort study during the period of May 2017 to September 2022. The study was conducted at the Department of Otolaryngology-Head and Neck Surgery, Northwestern Medicine (Chicago, Illinois, the United States of America). A total of 283 patients were screened for this study. The medical history of each subject was thoroughly investigated and neurotological evaluation was performed by two American board-certified neurotologists (AJM and AGM) for accurate diagnosis with MD. The two neurotologists confirmed the diagnosis of definite MD after diagnostic imaging (i.e. MRI of the internal auditory canal with and without contrast). We used the diagnostic criteria established by the American Academy of Otolaryngology-Head and Neck Surgery to define unilateral definite MD [2,3]. More specifically, MD subjects had two or more definitive spontaneous episodes of vertigo, each lasting 20 min to 12 h; audiometrically documented low to mid-frequency sensorineural hearing loss in one ear, defining the affected ear on at least one occasion before, during, or after one of the episodes of vertigo; fluctuating aural symptoms (hearing, tinnitus, or aural fullness) in the affected ear; and not better accounted for by another vestibular diagnosis. The affected ear was identified as the ear in which there was low-frequency hearing loss, and the symptoms of fullness were reported. We included 39 males and 51 females between the ages of 27-76 (mean ¼ 49.7 ± 19.6) without any history of middle or inner ear diseases, neck injury, chronic neck pain, or acute or chronic arthritis. If the opposite ear of a unilateral definite MD patient had inner ear symptoms, such as tinnitus, aural fullness, or hearing loss (i.e. possible bilateral MD), they were excluded from this study.
After the screening, we divided our subjects into two groups: acute and chronic. The acute MD group had at least one acute MD attack within the last three months. An acute MD attack was defined as vertigo severe enough to cause nausea, vomiting, or impaired daily activities for the subject [12]. The chronic MD group had no recent acute MD attack (i.e. last attack was more than 3 months ago). To ensure each subject's MD was not in an active phase, subjects in the chronic MD group were tested with Video Frenzel goggles (Interacoustics, Middelfart, Denmark) to confirm they did not have any spontaneous nystagmus. None of the MD subjects reported taking any medications that may affect vestibular function, including benzodiazepines, within 24 h of the evaluation with SVV and VEMPs.

VEMP testing
All tests were performed in a sound-treated room. Before performing the o-VEMP measures, we first performed a c-VEMP as it is now known that c-VEMP mostly represents a function of the saccule [14]. In this study, c-VEMP was therefore used to obtain a negative control condition. In the supine position, participants were asked to lift and turn their heads to each side to identify the maximum contraction of the sternocleidomastoid (SCM) muscle. The thickest part of the SCM, which assures the area of maximum contraction, was identified by the researcher using their fingers along the neck of the subjects. Electrodes were placed on the upper third of the SCM. Subjects were scrubbed with skin prep gels (Nuprep; Weaver and Company, Aurora, Colorado, USA), and cleaned with a sanitizing wipe at the SCM muscle on both sides of the neck and forehead. The non-inverting, disposable electrodes (Safelead; Natus Neurology Incorporated, Middleton, Wisconsin, USA) were placed on the suprasternal notch of the neck, and the inverting electrodes were placed on the SCM on either side. The ground electrode was placed on the forehead. The c-VEMP was recorded from the SCM muscle ipsilateral to the stimulated ear in response to air-conducted tone burst stimuli (500 Hz) presented through a calibrated ER-3A insert earphone (Etymotic Research, Elk Grove Village, Illinois, USA) at 97 dB nHL using a VEMP evoked potential system (Eclipse EP25; Interacoustics AS, Middelfart, Denmark). Subjects were asked to lie supine and lift their head and turn in the contralateral direction of the stimulated ear to activate the ipsilateral SCM muscle. Electromyography (EMG) scaling was performed as described previously for the calculation of comparable asymmetry [15]. Positive peak (P13) and negative peak (N23) latencies as well as peak-topeak amplitudes of P13-N23 were measured. The asymmetry ratio (AR) was then calculated. The VEMP AR was determined by the following equation [15]: We also define the absolute value of the asymmetry ratio (absjARj) as follows.
AbsjARj ¼ abs j ðAmplitude right VEMP À Amplitude left VEMPÞ ðAmplitude right VEMP À Amplitude left VEMPÞ Â 100j We used the real value (real number) of the asymmetry ratio (AR) to calculate Pearson's linear coefficient of determination for comparing our SVV angle deviation with c-VEMP AR and air conduction (AC)-, and bone conduction (BC)-elicited o-VEMP AR. We used absjARj to compare the degree of asymmetry in VEMP data.
Once c-VEMP measures were completed, the subject continued to the o-VEMP measurement. Subjects remained in the supine position, and the area directly below their eyes was scrubbed and prepped for electrodes. The ground electrode remained on the forehead, whereas the non-inverting electrode was moved to the chin, and the inverting electrodes were placed as close to the lower eyelashes as possible for optimal o-VEMP recording. O-VEMP responses were recorded from the extraocular muscles contralateral to the stimulated ear in response to two different modes of auditory stimuli: AC and BC. AC stimuli (500 Hz tone burst presented at 97 dB nHL) were generated by the Eclipse EP25 system and presented through a calibrated ER-3A insert earphone. BC stimuli were produced with a RadioEar B-81 high-output bone transducer (RADIOEAR, New Eagle, Pennsylvania, USA). The B-81 bone transducer was chosen because it generates substantially lower distortion by using the balanced electromagnetic separation transducer principle. This feature allows B-81 to produce higher output levels with less harmonic distortion than the conventional B-81 bone transducer (RADIOEAR, New Eagle, PA, USA). A B-81 bone transducer was calibrated using an artificial mastoid (model #4930) (Br€ uel & Kjaer, Naerum, Denmark). During the calibration, the B-81 transducer was coupled to the artificial mastoid with a static force of 5.4 ± 0.5 N. A Quest Electronics 1700 sound level meter (Quest Electronics Hardware, Colorado Springs, Colorado, USA) with a 3 M OB-100 octave filter set (Lipin/Dietz Associates, Inc., Guilford, Connecticut, USA) was used to read force level with sound pressure levels. A series of 500 Hz tone burst stimuli were presented at 65 dB HL at the mastoid position. The Eclipse EP25 system was once again used for this measurement. Subjects were asked to hold their gaze directly upward at 30 for the duration of the sound stimulus. N1-P1 amplitude, latency, and AR were measured and calculated.

Virtual SVV TM testing
The justification for use of Virtual SVV is as follows. Our previous study has demonstrated Virtual SVV data obtained from our healthy subjects were consistent with previously published normative SVV data [16]. Therefore, Virtual SVV is an attractive substitute for traditional SVV in the clinical setting. Following VEMP testing, subjects rose from the supine position and sat up straight to perform the Virtual SVV measurements as instructed ( Figure 1). Next, the participants wore the Virtual SVV goggles while holding a remote hand controller ( Figure 2). Inside the goggles, subjects saw a luminous line in their field of vision in otherwise complete darkness. Then, the subjects were instructed to align the luminous line vertically using the arrow buttons on the hand controller and press the 'OK' button on the controller to save the data when the line reached a perceived vertical orientation. A deviation was recorded in degrees corresponding to the discrepancy between the subject's perceived alignment position of the luminous line with the vertical and the actual alignment position of the luminous line with the vertical. A positive deviation corresponds to a discrepancy leaning to the right, while a negative deviation corresponds to a discrepancy leaning to the left. For example, if a subject aligns a luminous line presented at 15 to the right of the vertical by shifting it 20 to the left, the line will end up being 5 to the left of the true vertical, resulting in a À5 deviation value. This procedure was repeated five times in each head position: first at 0 , then positions with the head tilted to 15 , 30 , and 45 from vertically on each side. SVV angles and associated head positions were recorded in the real-time interface on a laptop computer for analysis ( Figure 3).

Statistical analysis
Statistical analysis was performed using independent twogroup Student's t-test and one-way ANOVA (with Tukey-Kramer's post hoc test to identify significant differences between means while controlling the family-wise error rate). Equal population variance was not assumed in one-way ANOVA in order to perform a more stringent statistical analysis given the number of subjects. Pearson's linear correlation coefficient of determination (r 2 ) was also calculated. The linear regression line was computed using the method of least-squares. The Shapiro-Wilk test was performed to confirm that the data set followed a normal distribution. We set a null hypothesis in that the data are not significantly different from a normal distribution. All these statistical analyses were performed using Python 3.10.5 (Python Software Foundation, Wilmington, Delaware, USA). The following libraries were used for the statistical analysis: Scipy, Numpy, Matplotlib, and Seaborn. Results are reported as means ± one standard deviation unless otherwise noted. A significant p-value is indicative of a significant difference where the probability is less than (p < 0.05 Ã , p < 0.01 ÃÃ ).   Classification model with a small sample size One of the challenges of machine learning data analysis is the insufficient quantity of training data. In our study, the sample size was less than 100, which is considered a small number of training data in machine learning [10]. To circumvent this issue, we performed the following steps to prepare our data sets. Note that these steps were based on two previously published protocols [10,17]. We eliminated any data with noisy waveforms and ambiguous N10-P15 peaks from our training data set. We assumed the Gaussian distribution on our data and Anderson-Darling test was performed to confirm this assumption. We eliminated the data that failed the Anderson-Darling test. In some instances, we only had o-VEMP or SVV data and vice versa. In these instances, we filled in the missing values with the median values of the respective attribute. To avoid overfitting the training data, we used the default regularization library in scikit-learn (version 1.1., October 2022). Each class implements logistic regression using 'liblinear' library, 'newtoncg', 'sag', 'saga' and 'lbfgs' solvers by default. We used 0.3 for the test size and 0.7 for the training size.

Supervised machine learning
To visualize the actual decision boundary between the classes that each of the four classification models generated, we computed a decision boundary using DesicionBoundaryDisplay in the sklean inspection module of scikit-learn. To determine which classification algorithm most accurately predicts the probability of an example belonging to each class label, we used the area under the curve (AUC) on the receiver operating characteristic (ROC) curve. A ROC curve is a graphical plot generated by plotting the fraction of true positives out of the positives (TPR ¼ true positive rate) vs. the fraction of false positives out of the negatives (FPR ¼ false positive rate), at various threshold settings. A ROC curve can function as a summary measure of performance across potential thresholds for positivity, rather than performance at any specific threshold. TPR and FPR were calculated using the following formula. To compute TPR and FPR, we used Sklearn.metrics.roc_ auc_score in the sklearn metrics module of scikit-learn. The micro-average and macro-average performance of each machine learning algorithm was computed using TPR and FPR. The micro-average is calculated from the individual classes' TPR and FPR of the model. The macro-average is calculated as the arithmetic mean of individual classes' precision and recall scores.
We also performed a comparison of machine learning algorithms. We did so by ensuring that each algorithm was evaluated in the same way on the same data by forcing each algorithm to be evaluated on a consistent test harness. For this purpose, the 10-fold-cross-validation procedure was used to evaluate each algorithm. We configured with the same random seed to ensure that the same splits to the training data were performed and that each algorithm was evaluated in precisely the same way.

c-VEMPs
This study included data from 90 adults (39 males and 51 females) ages 27-76 (mean age: 49.7 ± 19.6). Table 1 shows the summary of the P13-N23 amplitudes and P13/N23 latencies of AC-c-VEMP measurements. The overall result is that P13-N23 amplitudes of the affected sides were substantially decreased as compared with the amplitudes of healthy sides. The P13/N23 latencies in the affected sides were not statistically different from the healthy sides. Table 2 shows the summary of P13-N23 amplitudes and P13/N23 latencies of AC-and BC-o-VEMP measurements. Consistent with the results of c-VEMP, AC-and BC-o-VEMP of the affected side were significantly decreased as compared with the amplitudes of the healthy sides, however, latencies in the affected sides were not significantly different than those of the healthy sides. The average ARs (affected side vs. unaffected side) for c-VEMP, AC-o-VEMP, and BC-o-VEMP were around 25% and not significantly different.

o-VEMPs
Linear regression analysis Figure 4 shows the relationship between SVV angle deviation at 0 and AC-o-VEMP (Figure 4(B(a))) and BC-o-VEMP ARs (Figure 4(B(b)) obtained from 90 subjects with acute MD (orange dots, n ¼ 41) and chronic MD (blue dots, n ¼ 49). Table 3 shows Pearson's correlation coefficient of the determinant (r 2 ), intercept, and slope of the linear regression lines on the scatter plots between the SVV angle deviation at 0 ARs and c-VEMP and o-VEMP ARs. All r 2 values were between 0.41 to 0.51, demonstrating only a moderate positive linear relationship and thus suggesting little consistency in the subject measured by each test. One-way ANOVA with Tukey-Kramer's post hoc test demonstrated statistically different slopes in chronic MD data as compared with acute MD data, suggesting that a different relationship exists between both BC-and AC-o-VEMP ARs and SVV angle deviations at 0 for each MD type (acute and chronic). For negative control, Figure 5 shows the relationship between SVV angle deviation at 0 and c-VEMP AR. r 2 was À0.033, demonstrating a weak negative linear relationship.

Supervised machine learning
To go beyond the numerical prediction of a linear regression model, we used supervised machine learning algorithms to abs jARj: asymmetry ratio (absolute value). ÃÃÃ p < 0.005. Non-paired t-test were performed on all conditions between the affected side and the unaffected side. visualize the decision boundary among the four classes of MD (i.e. LA MD, LC MD, RA MD, and RC MD). Figure 6 shows a visualization of the decision boundary for four machine learning classification algorithms that we built for the data set. The resulting decision boundaries were represented by the background color. In general, the models were able to cluster these 4 different classes in this Figure. As for the logistic regression algorithm, the decision boundary can be shown in a linear line, which separates the four classes of MD shown in blue, green, red, and orange ( Figure 6(A(a) and B(a))). Next, we used the Naïve Bayes algorithm ( Figure  6(A(b) and B(b))). The decision boundaries of the NB algorithm shown in Figure 6 were smooth and nonlinear. On the contrary, the random forest algorithm demonstrates the  lowest sensitivity, with isolated points having much less extreme classification probabilities ( Figure 6(A(c)) and (B(c))). Finally, the support vector machine algorithm is the least sensitive (i.e. small, scattered, and nonlinear decision boundary) and has a smooth decision boundary ( Figure  6(A(d) and B(d))). These findings mentioned above in classification models were found in both sets (SVV AR at 0 vs. AC-o-VEMP AR and SVV AR at 0 vs. BC-o-VEMP AR). This trend was also seen in the correlation coefficient data shown in Figure 4. Figure 7 shows AUC on ROC curves. Note that micro-average ROC is the sum of the true positive rate divided by the sum of the false positive rate. In other words, each class will have a weightage. Figure 8(A) shows a comparison of the four machine learning algorithms; the logistic regression, Naïve Bayes, and random forest algorithms present accuracies of over 70% whereas the superior vector machine algorithm presents accuracies below 50%. Figure 8(B) shows a heatmap of the true positive rate (Class 0-3) in four machine learning algorithms, demonstrating that class 2 (right acute MD patients) had less than 50% of true positive rate both in AC-and BC-o-VEMP AR vs. SVV AR.

Discussion
In this study, our first aim was to analyze the association of SVV with o-VEMP data in patients with unilateral definite MD. We hypothesized that SVV and o-VEMP do not correlate in this patient population. The SVV and o-VEMP tests may exclusively measure different utricular systems. (i.e. the sustained system vs. transient system) [14]. Our results indicate that all correlation coefficients were between 0.41 to 0.51, demonstrating only a moderate linear relationship, supporting our hypothesis. Additionally, results on AC-o-VEMP and BC-o-VEMP did not show a significant difference either. We also confirmed no linear relationship between SVV angle deviation at 0 and the c-VEMP AR (r 2 ¼ À0.033). This result is consistent with this relationship's role as a negative control, as the c-VEMP is primarily understood to evaluate saccule function while the SVV measures utricle function [14].
Our second aim of this study was to compare the association of SVV with o-VEMP data between newly diagnosed acute definite MD and chronic MD. We hypothesized that SVV and o-VEMP are correlated with each other in the acute phase of MD, but not necessarily in the chronic phase of MD. Notably, there was a statistically significant difference on the slopes of the linear regression equation between acute and chronic MD data (Table 3). This finding suggests that the SVV angle deviation at 0 tended to be normalized in chronic MD patients, whereas AC-and BC-o-VEMP ARs were not so normalized as in acute MD patients. This finding is consistent with the results of previously published studies [7,18,19]. As indicated in these articles, the incongruent SVV and o-VEMP ARs can be explained by the different functioning of the two cell types found in otolith organs, one effective for sustained stimulation, and the other for dynamic stimulation [12]. Also, Lin and Young noted that the rate of abnormal o-VEMPs was 40% in MD and that a significant correlation existed between SVV and o-VEMP test results [11]. We speculate that had we been able to collect SVV and VEMPs from a subject whose attack was less than seven days ago, this trend would have been more obvious. Also, comparing previous results might not be productive as each study defined the acute and chronic phases of MD differently. We also speculate that the central compensation mechanism of sustained response is more robust and quicker than that of dynamic response. In contrast, at least driven by the labyrinth, dynamic compensation may never be perfect following hemi-labyrinthectomy [20].
In our third aim, we wished to establish better classification to predict MDs' acute vs. chronic state by utilizing SVV AR and VEMP AR data. By choosing our dataset carefully, we could overcome the limitation of the small data set using a machine learning algorithm, leading to better classification of our data. Our results indicate that the decision boundary was clearly seen in the logistic regression and Naïve Bayes algorithm; but less clean in the random forest algorithm; and not so evident in the superior vector machine algorithm. These results suggest that both logistic regression and Naïve Bayes algorithms may warrant further study for this application. AUC and ROC curve data further support this suggestion. The logistic regression algorithm works better on small sample data than the nonlinear algorithm. Since a Naïve Bayes algorithm assumes that all variables are independent, only a small amount of training data is necessary to estimate the parameters for classification. Previous studies have shown that both the NB and logistic regression algorithm can be generalized to all generative and discriminative classifiers, which is consistent with our data.
Due to the nature of our data, only a single point of measurement was possible. Due to the dynamic nature of Figure 5. A scatter plot demonstrating the relationship between the SVV angle deviation at 0 and c-VEMP asymmetry ratios (AR)s obtained from 90 subjects with diagnosed MD. For each subject, SVV angle deviation at 0 is plotted on the x-axis against its corresponding c-VEMP AR on the y-axis. The best fit linear regression line corresponding to the SVV angle deviation at 0 and VEMP ARs is also present in a plot along with the regression line's 95% confidence interval (shaded area) and correlation coefficient (r 2 ).
SVV and o-VEMP, it would have been more advantageous to track each patient's data in a time sequence. Future studies should seek to expand the realm of the work further. Also, the ratio of males to females included in this study is another potential confounding factor for our data. This study included 39 males and 51 females. Future studies should look to even this ratio to see if it affects the relationship between SVV angle deviations and VEMP ARs.
A potential pitfall in interpreting our results was the possible asymmetric electromyographical activities from the sternocleidomastoid muscle (c-VEMP: control) and the ocular muscles (o-VEMP). In addition, the lack of a method for calibrating surface EMG activities could have affected the peak-to-peak amplitude of our o-VEMP results. In future studies, VEMP and SVV may need to be performed in the same physiological position, and a method of calibrating the surface EMG for o-VEMP should be included. These additional measures can provide us with more consistent results.
It should be noted that while we collected SVV at 0 , 15 , 30 , and 45 on all 90 subjects, we decided to focus on SVV at 0 in this study. There were two reasons for this decision. Firstly, due to the already complex data structure where we computed both the SVV deviation and the o-VEMP deviation, introducing an additional layer of data dimension (i.e. SVV at 15 , 30 , 45 ) would not only increase the data set four-fold, but it may not be suitable to introduce such a large and complex data set for a proof-ofconcept study such as this one. However, we will certainly utilize these data in our subsequent study. Secondly, tilting the body trunk and neck to various degrees activates proprioceptive information from cervical muscle spindles and mechanoreceptors or cervical disc facet joints. These maneuvers could be strictly categorized as a subjective postural vertical, which is beyond the scope of this study.
With many breakthroughs in machine learning algorithms being realized over the past decades, most recent studies propose a shift toward 'small and smart data,' focusing on high-quality data and explainable examples. Our data set follows the same philosophy-the small size was a result of a process of a careful selection of data. We implemented a few strategies to avoid overfitting the data to a model. First, we carefully removed outliers by utilizing Shapiro-Wilk normality test. We also avoided multidimensional models. A shift toward small data would have a considerable impact on data science. It opens doors for many problems that do not have massive associated data sets. It allows the generation of high-quality artificial data sets and it aligns with the Explainable AI movement that has been gaining traction. Indeed, it would be a paradigm shift in the field.
In a clinical setting, when a patient comes in with a balance/vestibular complaint, they tend to provide and share a large amount of data about what they are going through. In addition to that, they go through a comprehensive battery of tests. The data often needs to better utilized as many clinicians assume most of the results are irrelevant or bothersome to deal with. This study only used SVV and o-VEMP data to categorize MD patients. We can also utilize the significant variables to classify MD in the future.  differs between acute and chronic MD patients, suggesting that SVV tended to be normalized in chronic MD patients, whereas AC-and BC-o-VEMP was not normalized. 3. Using machine learning algorithms, logistic regression and Naïve Bayes algorithms could classify acute and chronic MD using SVV and o-VEMP ARs, however, random forest and superior vector machine algorithms worked in a suboptimal fashion. 4. Finally, this proof-of-concept study indicates that a supervised machine learning algorithm with a small data set can be applied to a larger set of data that most of the vestibular patients provide in our clinic to classify MD in the future.