DIMA-fr: a French adaptation and standardization of the Dutch Diagnostic Instrument for Mild Aphasia (DIMA-nl)

ABSTRACT The Dutch Diagnostic Instrument for Mild Aphasia (DIMA-nl) is a standardized battery recently created for evaluating the language performance of patients during the perioperative period of glioma surgery. Our aim was to establish normative data for the DIMA-fr, a French version of the DIMA-nl. The DIMA-nl was first adapted to French. The 14 subtasks of the DIMA-fr were then administered to 391 participants recruited from the general French population. The effects of sex, age and level of education were determined by analysis of variance (ANOVA). Normative data were computed as means, medians, standard deviations and percentiles. Our results demonstrated that age and level of education had an effect on the performance of all subtests but not sex. We thus stratified the norms into four different groups: (i) 18–69 years-old with Baccalauréat (Bac, the French High School Diploma) (n = 246); (ii) 18–69 years-old without Bac (n = 70); (iii) >70 years-old with Bac (n = 48); (iv) >70 years-old without Bac (n = 27). The DIMA-fr is thus the first standardized French battery of tests to specifically assess language during the perioperative period of awake glioma surgery. However, to be used in the clinic, the DIMA-fr must now be validated in patients. The DIMA, which is currently standardized in several languages, could become a reference tool for international studies.


Introduction
Gliomas are the most common primary brain tumours in adults and are often located in functional areas of the brain (sensorimotor, language, visuo-spatial). Without adequate management, median survival times range from less than 1 year to 10 years, depending on the type of glioma (Duffau, 2019). Diffuse low-grade gliomas (DLGG), which are the focus of this study, evolve by 4 mm in diameter per year for several years . This slow growth of the tumour allows

Study population
The standardization of the DIMA-fr was performed on 391 healthy participants from the general French population. Participants were recruited according to random quota sampling based on three variables: 210 females and 191 males, aged 18˗89 years (mean (±SD) 46.5 ± 19.6; median 47.7) divided into four age groups (18-29 years; 30-49 years; 50-69 years; ≥70 years) and three levels of education, in reference to the Baccalauréat (Bac), the French High School Diploma (<Bac = no Bac; Bac-Bac+3 = between 1 and 3 years after Bac; >Bac+3 = more than 3 years after Bac, corresponding respectively to: no High School Diploma; High School Diploma to Bachelor's degree; Master's degree and higher). The characteristics of the study population are shown in Table 1. Hand preference was not taken into account.
Each participant was informed of the objective of the study and gave their written informed consent that could be revoked at any time. The study was approved by the Commission Nationale de l'Informatique et des Libertés (anonymization of data).

Selection criteria
The study participants were healthy volunteers who were ≥18 years-old and whose native language was French or had attended primary school in France. Participants were excluded if they had a history of cardiovascular, cognitive, neurological and psychiatric diseases, language development disorders, or addiction to a drug or alcohol. Various tests were administered to exclude participants with cognitive deficits or uncorrected sensory disorders: score <26/30 on the Montreal Cognitive Assessment (MoCA), French version 8.1 (Nasreddine et al., 2005); score <P2 (normal vision) on the Parinaud's near visual acuity scale (used in clinical practice by French ophthalmologists); score <100% on Fournier's 10 dissyllabic words (used in tonal audiometry) at conversational intensity without lip reading.

Development of the DIMA
As part of a project to develop an international language assessment protocol for awake surgery, the DIMA is currently being adapted and standardized in several languages (Satoer et al., 2021). It is derived from the Dutch Linguistic Intraoperative Protocol (DuLIP). DuLIP is a language assessment battery designed to facilitate intraoperative mapping during awake surgery. It also allows longitudinal follow-up of patients with low-grade gliomas (De Witte et al., 2015). Both DuLIP and DIMA are based on a "location-function-task" model in a connective view of brain function, particularly in DLGG. Anatomo-functional landmarks, which are necessary but of great inter-individual variability, must therefore be mapped at the level of the cortex and subcortical networks (Duffau, 2019) using specific tasks (De Witte et al., 2015), as proposed by DuLIP. During intraoperative mapping, responses should be given within a maximum of 4 sec. This delay corresponds to the maximal time of safe electrical stimulation of the brain. DuLIP suffers from two shortcomings: it is a long protocol (90 min) and not all tasks are sensitive enough (because they are specifically designed for intraoperative use). Therefore the most complex DuLIP tasks were selected. Composed of 14 tasks (out of the 18 DuLIP tasks), the DIMA evaluates the comprehension and production of language in three domains: semantics, phonology and morphosyntax. The DIMA-nl was translated into French in 2019 controlling for the sociocultural and psycholinguistic variables (Bonin et al., 2003) (New et al., 2001. In the French version, response times (RT) were also measured, as the speed of information processing is often affected in patients with brain tumours (Duffau, 2019).

Repetition tasks (RepA, RepB, RepC, RepD): phonological level
Four repetition tasks evaluate the phonological input and output channel. For each item, word and sentence length, the presence of consonant clusters (underlined in the examples below) and phonological similarities (in bold in the examples below) are used to increase articulatory and phonological complexity (Gierut, 2007).

Odd-picture-out (Sem): semantic level
This task evaluates semantic cognition and naming from a visual input. Ten series of three images belonging to a supra-ordinate category (objects or animals) are presented on a computer screen. The participant is asked to point and name, within 4 seconds, the picture semantically distant from the two others (e.g., snake, dog, cat; car, bicycle, bus).

Sentence judgment task (JuSem, JuPho, JuSyn): semantic, phonological, syntactic levels
This task assesses semantic, phonological and morphosyntactic awareness and phonological decoding in visual verbal modality. Thirty correct and incorrect sentences are presented on the screen in random order with Praat software (Boersma & Weenink, 2019). Five sentences contain semantic errors (JuSem) of the semantic selection violation type (e.g., Le chat a acheté un canapé [the cat bought a sofa]), five sentences contain phonological errors (JuPho) of the non-word type (e.g., Le steil fait fondre la ningle [The sut melts the sdow]) and another five sentences contain syntactic errors (JuSyn) of the grammatical word error type (e.g., Il aide sa petite soeur de manger [He helps his little sister in eat]) or verb tense error type (e.g., Léa chante une chanson hier [Lea sings a song yesterday]). Psycholinguistic variables and sentence length were checked for all items, in addition to phonological complexity (phonological judgment) and type of error (syntactic judgment).

Sentence completion task (Compl): syntactic and semantic levels
This task assesses the production of semantically and syntactically semi-spontaneous speech; the beginning of a sentence is presented orally and must be completed in restricted (e.g., I listen to . . .) or open (e.g., Every day . . .) contexts.

Fluency tasks (FluProfessions, FluAnimals, FluD, FluP): semantic and phonological levels
Four tasks evaluate the integrity of the lexico-semantic representations and the efficiency of the strategies implemented to access them: two semantic fluency tasks for professions and animals (respectively FluProfessions and FluAnimals) and two phonological fluency tasks for letters D & P (respectively FluD and FluP). This task is absent from the DIMA-nl, but has been integrated into the DIMA-fr with an increased timing of 2 min, so as to elaborate recent norms in the French population.

Articulation task (Dia): phonetic and phonological levels
This verbal diadochokinetic task (Dia) evaluates the planning, coordination and rapid and alternating sequencing of speech movements. It consists of four sets of three repeating or alternating consonant-vowel syllables (e.g., pa pa pa; pa po pu; da na la), five sets of three repeating or alternating consonant-vowel-consonant syllables (e.g., paf paf paf; paf pof puf; paf pass pach). Each series must be repeated five times. This task, not kept in the DIMA-nl, was included in the French version, because it is sometimes the only test that reveals a speech disorder in patients with glioma located in the lower frontal gyrus, precentral gyrus and anterior insula regions (De Witte et al., 2015).

Administration of the protocol
In July 2019, the first two authors (AC, AP) and the last two authors (MB, IP) of this article trained six students of Master 1 in Speech and Language Therapy on the DIMA-fr as part of their "Introduction to Research" internship. The first two authors and six students recruited the participants and conducted the test between September 2019 and February 2020. Meetings were organised to ensure that the testing and scoring procedures were homogeneous. Approximately 1 hour was required to complete the protocol; 25 min were needed for the presentation of the research project, informed consent, the health questionnaire and exclusion tests, and 35 min for the DIMA-fr.

Procedure and scoring
The tests were always administered in the same order: repetition of words, compound words, pseudowords and sentences; semantic-odd-picture out; sentence completion; sentence judgment (semantic, phonological and syntactic); semantic and phonological fluency; verbal diadochokinesis.
Tests requiring visual support were presented using PowerPoint (semantic odd-picture out and verbal diadochokinesis) and Praat software (sentence judgment task) (Boersma & Weenink, 2019).
The participants' answers were recorded and transcribed in the International Phonetic Alphabet (IPA) for the repetition and articulation tests and in the Latin alphabet for sentence completion and fluency. One point was awarded for each correct answer, given without any hesitation or self-correction. For the repetition, semantic odd-picture-out detection and sentence completion tasks, answers had to be given within less than 4 seconds. For verbal diadochokinesis, 0.2 points were allocated for each correct output. For semantic odd-picture out task, each item scoring was splited into 0.5 points if the participant pointed to the correct item and 0.5 points if the participant was able to name it. In case of doubt about the scoring of an answer, it was subject to a group decision.
The response times were recorded by the investigators, who were trained how they should start and stop the digital timer for each task. Of note, for the Sentence judgment task, the true reaction times were automatically recorded in Praat. For the shake of simplicity, reaction times will be also named response times -or RT -throughout the rest of the manuscript.

Statistical analysis
Normal distribution for the entire sample (N = 391) for each performance (14 scores x 10) was visually checked. The effects on performance of the independent variables of sex, age and level of education, as well as their interactions, was assessed by analysis of variance (ANOVA).
When a significant impact (p < .005) was found, comparisons of the means for all pairs (Tukey-Kramer HSD test) revealed subgroup(s) that differed significantly. The mean, standard deviation (SD), minimum-maximum, percentiles (pc 5, 10, 25, 50, 75, 90, 95) and distribution of performance (for score and time) for each test were then calculated for the entire sample and by significant group.
All data were analysed using JMP Trial 15.1.0 software.

Effect of sex
A significant effect of the variable sex was observed only on response time (RT) of verbal diadochokinesis (p = .0007). Females repeated the nine three-syllable series in a longer mean total time than males: 42.4 s vs. 39.1 sec for men. No significant difference was found for the other performances (p > .005). The mean of each performance for each sex and the p-values are shown in Appendix A of supplementary file S1.

Effect of age
Age had a significant impact on 8 of the 14 scores ( Figure 1) and on all 10 RT ( Figure 2): score (p = .0013) and RT (p < .0001) of repetition of words; RT of repetition of compound words (p < .0001); score (p < .0001) and RT (p < .0001) of repetition of pseudowords; score (p = .0012) and RT (p < .0012) of repetition of sentences; score (p < .0001) and RT (p < .0001) of semantic odd-picture out; score (p = .0013) and RT (p < .0001) of sentence completion; RT of semantic, phonological and syntactic sentence judgment (p = .0002, p = .0002, p < .0001); score of professions fluency (p < .00001); score of animals fluency (p < .0001); score (p = .0047) and RT (p < .0001) of verbal diadochokinesis. Comparison of the mean values for all pairs showed that the group >70 years-old had the lowest scores and longest RT for all tests. The 50-69 years-old were also significantly slower (p < .005) than 18-29 years-old and 30-49 year-olds for repetition of pseudowords, sentence repetition, semantic odd-picture out and sentence completion tests.
There was no significant effect of age on the scores for the following six tests (p > .005): compound word repetition; semantic, phonological and syntactic sentence judgment; D fluency; P fluency. The mean values for each performance by age group and the corresponding p values are shown in Appendix A of supplementary file S1.

Effect of level of education
Level of education had a significant impact on nine of the 14 scores ( Figure 3) and seven of the 10 RT ( Figure 4): RT of repetition of pseudowords (p < .0001); score (p = .0022) and RT (p < .0001) of sentence repetition; score (p = .0001) and RT (p < .0001) of sentence completion; score (p = .0037) and RT (p < .0001) of semantic sentence judgment; RT of phonological sentence judgment (p < .0001); score (p = .0002) and RT (p < .0002) of syntactic sentence judgment; fluency for professions (p < .0001); fluency for animals (p < .0001); fluency for letter D (p < .0001); fluency for letter P (p = .0004); score (p < .0001) and RT (p < .0001) of diadochokinesis.  Comparison of the means for all pairs showed that the group <Bac had the lowest scores and the longest RT for these tests. The Bac-Bac+3 group was also significantly (p < .005) slower than the >Bac+3 group for the semantic, phonological and syntactic sentence judgment tests.
There was no significant effect of education level on the following eight performances (p > .005): score and RT of repetition of word; score and RT of compound word repetition; score of repetition of pseudowords; score and RT of semantic odd-picture out; RT of phonological sentence judgment. The mean values for each performance by education level and the corresponding p values are shown in Appendix A of supplementary file S1.

Age/level of education interaction
No significant interaction was observed between age and level of education (p > .005).

Age/sex interaction
An interaction between age and sex was found only for the fluency professions test (p = .0016). Males aged 50-69 years gave a mean of 21.1 professions names while females of the same age gave 19.9. However, females >70 years of age gave an average of 19.3 words while males of the same age gave 15.2 words.

Level of education/sex interaction
An interaction between level of education and sex was found only for RT of repetition of pseudowords (p = .0047). Among the most educated participants, only females were faster than the least educated; in fact, males in the >Bac+3 group were significantly slower (18.5 sec) than males in the Bac-Bac+3 group (17.9 sec).

Stratification of data
On the basis of the analysis by test and by effect of the variables sex, age and level of education, the data were stratified into four groups: (i) 18-69 years-old with Bac (n = 246); (ii) 18-69 years-old without Bac (n = 70); (iii) >70 years-old with Bac (n = 48); (iv) >70 years-old without Bac (n = 27). As the effect of sex affected only one performance, this variable is non-discriminatory for the stratification of norms.
The normative data for the entire sample and the four groups are shown in Appendix B of supplementary file S1. The mean (SD), minimum-maximum and distribution of performance by percentiles (5,10,25,50,75,90,95) are given by group for the score and RT of each test. The scores and RT of verbal fluency tests were normally distributed. For all other scores, the distribution departed from a Gaussian distribution. This was due to ceiling effects, as reflected in the values of the percentiles.

Discussion
In this study, we designed and standardized the French adaptation of the DIMA-nl, a testbattery that has been developed in Dutch to detect mild language impairment in patients with low-grade gliomas undergoing awake surgery (Satoer et al., 2021). We propose to discuss our results, in light of those of the DIMA-nl and of the literature.

Effects of sex, age, and education level
Sex had no significant impact, except on the RT of verbal diadochokinesis (p = .0007). No interaction with age or level of education was found that could explain this difference. As the DIMA-nl does not include this measure, we cannot compare the results. However, this gender effect was also reported for syllable repetition tasks in the recent data from the MonPaGe protocol, a computerised speech evaluation tool in French (Laganaro et al., 2021).
As expected, age had a significant impact on 8 scores out of 14. As in the Dutch study there was no effect of age on compound word repetition and sentence judgment scores. The phonological fluency tasks, reintegrated in the DIMA-fr, were not affected by age, whereas they were in the DuLIP (De Witte et al., 2015). This result is also not consistent with the recent study conducted in a French-Quebec control population where an influence of age was found on all phonological fluency tasks (St-Hilaire et al., 2016). However, since the demographic characteristics of the samples are different, this comparison should be taken with caution. Regarding RT, our study shows that all of them were affected; older participants responded less quickly than younger participants, including in the compound word repetition and sentence judgment tests. Furthermore, our segmentation into four age groups, compared to two for the Dutch study and the French pre-standardization (18-54 years and 55-85 years), made it possible to specify that the most significantly different age group is the over-70s.
As hypothesised, level of education had a significant impact on nine scores and seven RT; participants without Bac made more errors than participants with Bac and were slower overall. The only significant difference between the Bac-Bac+3 (Bachelor's degree) group and the >Bac+3 (Master's degree and higher) group was seen in RT for sentence judgment (the only task in written mode). All of these results are in line with the standardization of other recent francophone tests evaluating language such as the Batterie d'Évaluation Cognitive du Langage (BÉCLA) .

Normative data of DIMA-fr
Following the analysis of the effects of the variables, four groups were formed to calibrate the norms: (i) 18-69 years-old with Bac (n = 246); (ii) 18-69 years-old without Bac (n = 70); (iii) >70 years-old with Bac (n = 48); (iv) >70 years-old without Bac (n = 27). However, major ceiling effects were observed in all groups for all performance scores (except for the verbal fluencies). The resulting distributions being not gaussian, it is meaningless to use standard deviations for defining individual z-scores. Rather, percentiles should be used in the following way: for a given test, if the value of the pc5 (or pc10) is lower than the maximal score, this value should be considered as the pathological threshold. If the value of the pc5 (or pc10) is the maximal score, then any subject scoring lower than the maximal value should be considered as pathological. For verbal fluencies and RT, values were normally distributed. In this case, z-scores can be computed from the SD and the usual threshold of −1.5 can be used to define disability.

Limitations
Initially, we wanted to recruit the participants homogeneously into 24 cells (respecting the 24 possibilities imposed by the three variables: 4 [age groups] × 3 [education level] × 2 [sex] = 24). At the end, participants without Bac represented only 25% of the sample and >70 year-olds represented 19%, with only four males in the group Bac-Bac+3. Even though each group contained a sufficient number of participants (>30), the results should be interpreted with caution due to the imbalance between some groups. Moreover, the participants were mainly recruited from the entourages of the female investigators, which could have hindered a truly objective evaluation or biased the performance.
Another source of variability comes from differences in the way investigators instructed the participants and/or measured the performances. For example, during the sentence repetition and completion tests, and despite collective training, the investigators' speech flow tended to vary. High speech flow of the investigator not only shorten the whole task timing, but also might in itself increase the speech flow of the participant, by imitation. In addition, the delay in starting and stopping the digital timer likely caused the recorded times to vary, but such fluctuations are of concern only for very short events, such as in the diadochokinesis task. Moreover, the fact that our protocol relied on several investigators makes our norms more robust to this source of variability, thus enhancing its clinical applicability. Last but not least, a computerised version would make it possible to limit these variations in measurement.

Clinical implications and future research
In the DIMA-fr, the fluencies and diadochokinesis of the DuLIP, have been reintegrated to complete the assessment. Indeed, diadochokinesis is sometimes the only sensitive test revealing a speech disorder in patients with a glioma located in the lower frontal gyrus, precentral gyrus and anterior insular regions (De Witte et al., 2015). In the same vein, some tasks which are still under development, need to be added to make the DIMA a complete language assessment battery. In particular, a picture naming task, which is a basic and universally used test for language evaluation, is missing.
Mild and moderate aphasia also have an impact on spontaneous language (Satoer et al., 2018), so it seems judicious to evaluate it perioperatively using a precise grid. Current sentence completion is not demanding enough, although it already reveals morphosyntactic difficulties in some participants. In addition to these objective tasks, the Dutch team is working on a questionnaire that would make it possible to define the language and communication problems of patients with mild aphasia.
The 4 sec time constraint (related to the intraoperative electrical stimulation time) proved to be a hindrance to accurate evaluation of performance for the tasks of repetition, semantic odd-picture out and sentence completion. This is because, according to the original DIMA-nl, after the 4 sec delay, a point was not awarded even if the answer was correct. However, if the DIMA is used perioperatively (or in other cases of mild to moderate aphasia), this constraint could be relaxed and the RT for each item would be reflected in the total time of the task.
The DIMA-nl appears to be more sensitive for detecting mild language impairment in patients with brain tumours than the tests generally used (Satoer et al., 2021). It should be checked whether this is also true for the French version. (Satoer et al., 2021) confirmed the interest of the DIMA-nl for the longitudinal evaluation of oral language in patients with brain tumours, such as low-grade gliomas, undergoing awake surgery. Since the DIMA must allow for pre-and postoperative language evaluation, the creation of a second version based on the same language constraints would be necessary to avoid a test-retest effect. Currently, only the fluency test has been created in this sense (a semantic fluency and phonological fluency for the preoperative and a second one for the postoperative).
The DIMA is currently undergoing digital development for tablet use. This computerisation will allow precise measurements of scores and RT for the various tests. Indeed, since the speed of information processing is one of the sensitive points in patients with low-grade gliomas (Duffau, 2019), an objective measurement is essential. In addition, the DIMA being currently adapted and standardised in different languages, automated data collection should facilitate international studies.

Conclusion
We described the standardization of the French adaptation of the DIMA-nl, designed to detect mild language impairments, such as those that may occur in patients undergoing brain surgery or in patients suffering other neurological diseases inducing mild language disorders. Sex had no effect on performance, while age and level of education had a significant impact. Four groups were established to calibrate the norms: (i) 18-69 years-old with Bac; (ii) 18-69 years-old without Bac; (iii) >70 years-old with Bac; (iv) >70 years-old without Bac. Various improvements to the DIMA are now being considered including a computerised version for better data collection, and, if necessary, a parallel battery for postoperative use in order to avoid the test-retest effect. Standardisation of the DIMA in different languages will make it a common battery for international centres that practice awake surgery. Validation of the DIMA-fr and crossover studies on cohorts of patients are now required.