Analyses on performance measures

Accuracy

A one-way ANOVA on target word accuracy showed a significant difference between target conditions, F(3, 92) = 22.43, p < .001. Post-hoc pairwise comparisons with Holm corrections showed that accuracy for the NonCore condition was significantly worse than all others (all ps < .001), but no significant differences were found between any of the core-word conditions.

The below figure shows the relationship between target word accuracy and target word predictability, stratified by target condition. The best-fitting regression line for each condition is shown. Model fitting results supported a model with no both target condition and predictability but no interaction, suggesting that all conditions showed the same relationship between accuracy and predictability. The full results are reported in the main paper.

Number of guesses

Number of guesses can only be analysed for correct trials. Consequently, the analyses on number of guesses must be understood on top of the original accuracy analyses.

The mean number of guesses it took to get a target word was calculated by averaging over all of the trials on which the target word was successfully guessed. A lower mean number of guesses reflects that people got the target word more quickly on average and therefore better performance for that target word. A one-way ANOVA on target word number of guesses showed a significant difference between target conditions, F(3, 92) = 11.62, p < .001. Post-hoc pairwise comparisons with Holm corrections showed that all core-word conditions were guessed significantly more quickly than the NonCore condition (all ps < .002), and additionally, the WF condition was guessed significantly more quickly than the AoA condition, p = .002. No significant differences were found between any of the other conditions.

A linear mixed-effects model was conducted to predict number of guesses on each trial from condition, with target word and participant included as crossed random factors. Multiple versions of this model were compared using BIC, and the full model with condition and both random effects was favoured, suggesting that number of guesses varies significantly by condition as well as by participant and target word.

The role of condition: Linear mixed-effects model
Model Description BIC
M3null numg ~ 1 12512
M3C numg ∼ condition 12407
M3RE numg ∼ (1|target) + (1|subj) 12165
M3full numg ∼ condition + (1|target) + (1|subj) 12164

The below figure shows the relationship between target word number of guesses and target word predictability, stratified by target condition. The best-fitting regression line for each condition is shown.

The role of predictability: Linear regression
Model Description BIC
M4null numg ~ 1 191
M4C numg ∼ condition 174
M4P numg ∼ predictability 150
M4CP numg ∼ condition + predictability 155
M4CPI numg ∼ condition * predictability 162

A linear regression model was conducted on the target word level predicting target word number of guesses from target condition and predictability. This time, model fitting results showed that the model with predictability only was better fitting than models with both target condition and predictability, suggesting that target condition doesn’t explain differences in number of guesses more than predictability alone does. Since the number of guesses measure is conditional upon successfully getting the target word in the first place, this may suggest that the biggest effect of condition is to be seen on accuracy, and that once a target word is guessed correctly, there isn’t much difference between the conditions.

The results of the linear regression predicting target word number of guesses from target word predictability show that number of guesses significantly decreases with target word predictability (i.e., more predictable target words are guessed more quickly), b = -2.67, p < .001.

Analyses on incorrect responses

Analyses using 20% threshold

When looking at the incorrect responses that were given on at least 20% of target-word trials, there are a total of 95 target-response pairs.

Responses still tend to be more core than the intended targets, with the same pattern of frequency of more/less core responses across target conditions.

One interesting difference when looking at these responses is the appearance of antonyms of the intended target (e.g., hard instead of floppy, small instead of gigantic), exclusively in the NonCore condition. These were all more core than their respective targets on all three coreness measures. These are theoretically interesting as they may indicate that people can grasp the idea of the sentence (e.g., that it’s about size) but guess the opposite of the intended meaning (e.g., small instead of big).

The same pattern of relation types across target conditions holds. Responses to NonCore targets tended to be basic synonyms, whereas responses to AoA targets tended to be taxonomic alternatives.