Reversal learning in those with early psychosis features contingency-dependent changes in loss response and learning

ABSTRACT Introduction People with psychotic disorders commonly feature broad decision-making impairments that impact their functional outcomes. Specific associative/reinforcement learning problems have been demonstrated in persistent psychosis. But these phenotypes may differ in early psychosis, suggesting that aspects of cognition decline over time. Methods The present proof-of-concept study examined goal-directed action and reversal learning in controls and those with early psychosis. Results Equivalent performance was observed between groups during outcome-specific devaluation, and reversal learning at an 80:20 contingency (reward probability for high:low targets). But when the low target reward probability was increased (80:40) those with early psychosis altered their response to loss, whereas controls did not. Computational modelling confirmed that in early psychosis there was a change in punishment learning that increased the chance of staying with the same stimulus after a loss, multiple trials into the future. In early psychosis, the magnitude of this response was greatest in those with higher IQ and lower clinical severity scores. Conclusions We show preliminary evidence that those with early psychosis present with a phenotype that includes altered responding to loss and hyper-adaptability in response to outcome changes. This may reflect a compensatory response to overcome the milieu of corticostriatal changes associated with psychotic disorders.


Introduction
Psychosis is associated with a range of cognitive impairments that affect most domains (Marder, 2006).Decision-making impairments in particular, are considered a core symptom that adversely affects functional outcomes for those with persistent psychosis (Hochberger et al., 2020).Although differences in learning and cognition in some individuals are evident years before psychosis onset (Reichenberg et al., 2010), many studies in those with early psychosis observe mild or negligible differences in decision-making compared with age-matched controls (Kesby et al., 2021).This supports a neurodegenerative model of psychosis where impairment in decision-making processes occurs subsequent to psychosis onset (Stone et al., 2022).Understanding how and why this occurs is an area of particular interest in the psychiatric research (Kesby et al., 2021;Nelson et al., 2017;Reichenberg et al., 2010) Decision-making is heavily dependent on cortico-striatal circuitry, and this circuitry has been implicated in both the psychotic and cognitive symptoms associated with disorders such as schizophrenia (Conn et al., 2020).There is strong evidence for increased dopaminergic function in the associative striatum of those with schizophrenia (Ersche et al., 2011;McCutcheon et al., 2018) which is tightly linked with psychotic symptoms (Abi-Dargham et al., 1998).However, recent work suggests that increased striatal dopamine may also impact decision-making processes (Clatworthy et al., 2009;Young et al., 2022).We proposed that outcome-specific devaluation and reversal learning were two tasks that may be useful in schizophrenia research for identifying dysfunction in the associative striatum (Kesby et al., 2018).For outcome-specific devaluation, a participant learns two action-outcome associations.Then after one outcome is devalued, if the participant has intact goal-directed action they will show a preference towards the valued action over the devalued.This shows the ability to adapt to newly acquired information.For reversal learning, a participant is presented with two stimuli and must adapt their responding when the outcomes are continuously reversed/switched.This is generally conducted in a probabilistic environment, where one stimulus has a high reward probability (80%) and the other, a low reward probability (20%).We have shown recently that a large proportion of those with persistent psychosis show broad impairments across these two tasks and is associated with problems in experience updating (Suetani et al., 2022).Moreover, imaging studies in those with persistent psychosis have observed altered associative striatal (caudate) activity alongside impairments in outcome-specific devaluation (i.e., goal-directed action; Morris et al., 2015).Alterations in associative striatal dopamine have also been associated with poorer reversal learning in healthy people (Clatworthy et al., 2009) and reversal learning impairments are commonly reported in schizophrenia (Ceaser et al., 2008;Pantelis et al., 1999;Reddy et al., 2016;Waltz & Gold, 2007;Weiler et al., 2009).
Deficits in reversal learning have also been observed in those with first episode psychosis (Leeson et al., 2009;Montagnese et al., 2020;Murray et al., 2008), and these deficits may be stable over time (up to 6 years after onset; Leeson et al., 2009).However, other studies have shown that cognition more generally may improve at follow-up (Bora & Murray, 2014).Nevertheless, when reversal learning impairments are encountered in individuals with early psychosis, their phenotype has key differences to that observed in persistent psychosis.For example, persistent psychosis is commonly associated with alterations in the response to reward (Deserno et al., 2020;Reddy et al., 2016;Suetani et al., 2022;Waltz et al., 2013), whereas studies looking at reversal learning and reinforcement learning in those with early psychosis have observed decreased sensitivity to losses (Chang et al., 2016;Leeson et al., 2009;Montagnese et al., 2020).When considering reward systems in the differing illness stages of psychosis, we have suggested that subcortically driven deficits are more prominent in early psychosis, and as illness progresses cortically derived impairments become more prominent (Kesby et al., 2021).
Therefore, the aims of the present proof-of-concept study were to: (1) assess whether the broad deficits in outcome-specific devaluation and reversal learning in those with persistent psychosis (Suetani et al., 2022) were also present in early psychosis, and (2) examine reward and punishment learning during reversal learning to identify if similar or opposing phenotypes were evident to that commonly observed in persistent psychosis.As reversal learning protocols often rely on a single contingency, we opted to assess performance at multiple contingencies as recent work in mice suggests that differing phenotypes may be evident when challenged with more uncertain probabilistic outcomes (Young et al., 2022).Moreover, whether these phenotypes include differences in learning and/or evidence accumulation over time, akin to those in persistent psychosis (Suetani et al., 2022), was of particular interest to the current study.

Participants
A total of 30 participants, between 18-29 years of age, were classified into two groups based on psychiatric history.Healthy controls had no diagnosis of a psychotic disorder and had not experienced a psychotic episode (N = 15).Those with early psychosis had received a diagnosis of a psychotic disorder (determined to be their first episode by the treating clinician) and were within 6 months of receiving initial antipsychotic treatment (N = 15).All participants were recruited as part of a larger study (Suetani et al., 2022), the control group reflects an age-matched selection from this larger cohort and the early psychosis participants were recruited in addition to those in the prior study.All participants underwent only the tasks specified here, and no other assessments were undertaken that could confound these data.Table 1 provides the demographics, IQ and substance use in these groups (see Table S1 for details) and Table 2 provides the general psychiatric characteristics for those with early psychosis (see Table S2 for Positive and Negative Syndrome Scale [PANSS] sub scores).Detailed inclusion criteria are described in the Supplementary Methods.

Procedures and experimental design
All procedures were approved by the Royal Brisbane and Women's Hospital, and University of Queensland Human Research Ethics Committees (HREC/17/QRBW/168).Participants were remunerated $40AUD (see Supplementary methods).Premorbid and current IQ was assessed using the Test of Premorbid Functioning (TOPF; Pearson Clinical, Sydney, Australia) and Wechsler Abbreviated Scale of Intelligence, second edition (WASI-II; Pearson Clinical).Substance use was assessed using a Substance Misuse Scale (Duhig et al., 2015).The cognitive tasks were run using PsychoPy v3 (Peirce et al., 2019) with stimuli being displayed on a computer monitor.Responses were recorded on a joystick box (Fighting stick mini 4; Hori Co. Ltd, Yokohama, Japan).

Instrumental training
Participants were told that three tokens (visual stimuli) were of equal value.Using a 7-point Likert scale, participants rated each token based on how valuable they  considered them to be, and their motivation to earn tokens.Training involved liberating two of the tokens from a virtual vending machine.The joystick was moved left or right, with 5-10 consecutive responses (drawn randomly) in one direction required to earn the associated token (e.g., star or circle).After every three rounds, a question was posed to assess participants' understanding of the association between action and reward.After getting six questions correct in a row, instrumental training ended.Then participants are explicitly informed that one outcome is now worth less credits (outcome devaluation) before completing the choice test to assess goaldirected action (the preference towards the now more valuable action).For serial reversal learning (B), participants are presented with two stimuli (which remain the same throughout the entire task).The serial reversal learning (SRL) 1 stages are probabilistically rewarded at an 80:20 contingency (e.g., 80% reward probability for the cloud-like stimulus and 20% reward probability for the spiral-like stimulus).After a selection the credit reward is displayed on the screen (0 or 1 credit for SRL1).For the SRL2 stages the probability of receiving a reward on the poorer stimulus is increased to 40% and participants can receive either 2 or 6 credits for a win (equal probability).See Suetani et al. (2022) for detailed methods.

Devaluation test
Participants were informed that one of the tokens had been counterfeited (counterbalanced) and was therefore less valuable.Participants were instructed to tilt the vending machine to earn the associated tokens, and that their actions in this stage would dictate their monetary compensation.The virtual machine was displayed for 10 blocks (12 s) and could be tilted at will during each block.Aside from visually tilting the vending machine, no outcome feedback was presented.Subsequently, participants were asked probe questions about which outcome was associated with which action and rerated the value of each token and their motivation to receive tokens.Behavioural outcomes included the preference for each token ( = token responses / total responses), responses (raw number of responses per second), and rating change ( = rating post-devaluationrating prior to devaluation).

Serial reversal learning task
For the reversal learning task (Figure 1(B)), all stimulus pairs were binary images and all combinations counterbalanced.For detailed methodology and training stages see Supplementary methods and (Suetani et al., 2022).

Probabilistic reversal learning
Participants underwent a probabilistic reversal learning task consisting of 11 stages; initial discrimination (1 stage), initial reversal (1 stage), and serial reversal learning phase 1 (SRL1; 5 stages) and phase 2 (SRL2; 4 stages).Each featured the same pair of stimuli but varying in reward rate (probabilistic) and outcome (credits).For the first seven stages, the probabilistic reward contingencies were set at 80/20, meaning that the target stimulus was rewarded 80% of the time, and the non-target stimulus was rewarded 20% of the time.One credit was earned for a rewarded trial and 0 credits for a non-rewarded trial.For the SRL2 stages, the contingencies were set at 80/40, increasing the task difficulty by providing more misleading feedback.Two or six credits were earned for a rewarded trial (equal probability) and 0 credits for a nonrewarded trial.Criterion for progressing from each stage was 6 correct responses in a row.

Reversal learning performance measures and strategies
General performance measures included total trials to criterion, perseveration (number of errors in the first 6 trials after a reversal), and response rates.Whether a subject selected the same stimulus after attaining a reward (win-stay) or selected the alternative stimulus after a loss (lose-shift) was quantified as a proportion of the total applicable trials.To determine how long a reward or punishment altered future choices (regardless of their feedback), the probability of selecting the same stimulus (after a reward or loss) was calculated for the subsequent 6 trials (excluding when a reversal was encountered).

Computational modelling and simulation
The underlying cognitive processes in reversal learning were calculated by modelling latent task variables using the hBayesDM package for R (version 3.6 [Platform: ×86_64-w64-mingw32/×64 (64-bit)] on Windows 10 v1809) developed by Ahn et al. (2017).A reward/punishment learning model (RP) that had previously shown a good fit to reversal learning behaviour was examined (den Ouden et al., 2013).These participant data were modelled alongside those from our prior study (Suetani et al., 2022) to gain better population estimates and minimise issues due to lower sample size.Parameters included reward learning rate, punishment learning rate, and inverse temperature (the deterministic or exploratory nature of the choices made).Higher learning rate values indicate that increased prediction error signalling is biasing learning towards more recent information (be that reward or punishment).
Lower inverse temperature values reflect less deterministic or more exploratory decision-making.

Data analysis
Binary variables were examined using χ 2 -tests and continuous variables used analysis of variance (ANOVA) with Group as the independent variable (with repeated measures where necessary).The preference and response rates for outcome devaluation were also analysed using within-group paired t-tests to confirm significant goal-directed action.All statistical analyses were performed with IBM SPSS Statistics 26 (Armonk, NY, USA).When appropriate, post hoc comparisons were performed using Šídák corrections.Results are expressed as mean ± standard error of the mean (SEM).Differences were considered statistically significant at p < 0.05.Preference and response bias figures were made with code adapted from (van Langen, 2020).

Results
Goal-directed action is intact in those with early psychosis Both control and early psychosis groups showed a significant preference towards the valued response (Figure 2(A); for all comparisons see Table S3).There were no significant differences between groups in the response rates (Figure 2(B)).The change in ratings for all three stimuli (valued, devalued and an irrelevant stimuli) and the participants motivation to earn tokens were not significantly different between groups (Figure 2(C)) but there was a significant main effect of Rating (F 3,84 = 20.3,p < 0.001), with the rating change for the devalued stimulus significantly different to all the other ratings (p < 0.001).This confirms the current protocol leads to outcome-specific devaluation rather a general decrease in valuation and motivation.Importantly, a significant decrease in the rating for the devalued stimulus compared to the valued stimulus was observed for both controls (p < 0.01) and those with early psychosis (p < 0.01).

Reversal learning in those with early psychosis reveals potential changes in punishment learning
There were no significant differences observed between the control and early psychosis groups in the trials to criterion for any stage (Figure 3(A); for all comparisons see Table S4).There were also no differences in the number of perseverative errors (consecutive errors after a reversal) in either SRL stage (Figure 3(B)).The strategies used during SRL1 were similar between controls and those with early psychosis (Figure 3(C)) but early psychosis subjects shifted less after losses (i.e., decreased lose-shift use) during the SRL2 stage (Figure 3(D); F 1,27 = 4.8, p < 0.05).This difference reflected an almost 20% lower chance of shifting after a loss for those with early psychosis compared with controls (Controls = 0.53 ± 0.28, early psychosis = 0.35 ± 0.14 [mean ± standard deviation]).Controls maintained a similar level of lose-shift use between the SRL1 and SRL2 stages (t 13 = 0.0, p = 0.99), whereas early psychosis subjects significantly decreased their lose-shift use in the SRL2 stage (t 13 = 2.6, p < 0.05).
Although trial x trial-based analyses (i.e., win-stay and lose-shift) provide a measure of reward and loss learning/sensitivity, they do not take trial history into account.Computational approaches can leverage the response to outcomes from an individual over time to quantify biases between past and recent outcomes.This can help inform why changes in win-stay or lose-shift are evident in a certain population.The parameters fitted by the RP model indicated no differences in learning or inverse temperature between controls and those with early psychosis during the SRL1 stage (Figure 3(E)).However, during the SRL2 stage, punishment learning parameters for those with early psychosis were significantly lower than for controls (Figure 3(F); F 1,27 = 7.5, p < 0.05) indicating a greater reliance on past losses over recent ones.The outcomes from this computational model suggest that the changes in lose-shift use observed in those with early psychosis are not solely limited to the following trial but reflect a slower or less Figure 3. Serial reversal learning in those with early psychosis.There were no differences in the average trials to criterion between controls and those with early psychosis for any stage (A).There were also no differences in the number of perseverative errors (consecutive errors after a reversal; B).Those with early psychosis used similar proportions of win-stay and lose-shift strategies as controls during SRL1 (C) but shifted less after losses in SRL2 (D; Controls = 0.53 ± 0.28, early psychosis = 0.35 ± 0.14 [mean ± standard deviation]).No differences in computational modelling parameters were observed for SRL1 (E), but the punishment learning parameter was lower in those with early psychosis for SRL2 compared with controls (F).Lower punishment learning in this model indicates a greater reliance on past losses over recent ones reflecting a relative impairment to adapt behaviour with novel information.Note: one data point is missing for Discrimination data in controls (panel A) to improve visibility of the remaining data.Data are displayed as the mean ± standard error.*p < 0.05.
sensitive response to loss.Taken together, these data indicate that people with early psychosis are capable of navigating reversal learning similarly to controls but adapt differently to changing contingencies.This adaptation appears to be selective for responses to, and learning from, losses rather than rewards.
Contingency change leads to a persisting impact of a loss on subsequent choice history Alterations in lose-shift use reflect an acute response to a loss, whereas the decreased punishment learning in early psychosis indicated a greater reliance on past losses.This suggests that those with early psychosis are less sensitive to recent losses.Although the lose-shift and computational parameters provide evidence of a lower sensitivity to recent losses, they do not provide an idea of how long this may persist.We were interested in the temporal impact of a reward or loss on a subsequent choice i.e., how long does a recent loss (or win) impact future choice, and how many trials into the future do those with early psychosis take to return to control levels.To quantify this, we calculated how likely a participant was to select the same stimulus after a win or loss (regardless of subsequent outcomes but excluding trials that crossed a reversal).For rewards during the SRL1 stage (Figure 4(A) left), there were significant main effects of Trial (F 5,140 = 60.0,p < 0.001) and Group (F 1,28 = 4.4, p < 0.05) on the probability of selecting the same stimulus after a reward.All participants were highly likely to select the same stimulus after a reward, with the probability only returning to chance (0.5 probability) six trials later.Those with early psychosis were slightly less likely to select the same stimulus compared to controls, regardless of the trial.For losses during the SRL1 stage (Figure 4(A) right), there was also a significant main effect of Trial on the probability of selecting the same stimulus after a loss (F 5,140 = 11.7,p < 0.001) but no differences between groups.Unlike after a reward, the probability to select the same stimulus after a loss began at chance (0.5 probability) and decreased over each trial.But the slope of this decrease was much shallower than that for after a reward.Similar profiles were observed for the SRL2 stage (Figure 4(B)), with significant main effects of Trial observed after a reward (F 5,135 = 61.0,p < 0.001) and loss (F 5,135 = 9.5, p < 0.001).However, after a loss there was also a significant Trial x Group interaction (F 5,135 = 2.4, p < 0.05), those with early psychosis were more likely to select the same stimulus after a loss than controls on the first (p < 0.05) and second (p < 0.05) trial after a loss.Furthermore, the probability of selecting the same stimulus in these first two trials was greater than chance in those with early psychosis, whereas controls biased their responding towards the alternative stimulus on all subsequent trials.These data provide empirical support for the premise that a loss alters the subsequent choice in those with early psychosis for multiple trials.

Decreased response and learning to losses at more difficult contingencies is associated with higher IQ and lower clinical severity
To identify any relationships between SRL2 loss responding/learning and clinical symptoms, we ran an exploratory multivariate general linear model (GLM), including behavioural variables (SRL2 lose-shift values and punishment learning parameters), and symptom scores (total positive, negative and general psychopathology scores, and Clinical Global Impression (severity) score).Chlorpromazine equivalent dose and IQ were included as control variables.SRL2 lose-shift probability was negatively associated with increased IQ (Figure 5(A); F 1,15 = 6.4,p < 0.05; Partial Eta Square = 0.45) and punishment learning was positively associated with the Clinical Global Impression (severity) score (Figure 5(B); F 1,15 = 10.8, p < 0.05; Partial Eta Square = 0.58).In contrast, there was no significant relationship between IQ and SRL2 lose-shift probability in controls (F 1,14 = 0.3, p > 0.58; Partial Eta Square = 0.03).These associations were surprising in the context of these specific behavioural outcomes.As a group, those with early psychosis significantly altered their responding to loss (decreased lose-shift use and .There were no differences in the likelihood to select the same stimulus after a reward or loss between controls and those with early psychosis at the 80:20 contingency (A).For the 80:40 contingency (B), those with early psychosis were more likely to select the same stimulus after a loss for the following two trials compared with controls.This lag in shifting after a loss supports the punishment learning parameter changes in Figure 3 and indicate that loss learning is slower (or less sensitive) in those with early psychosis.Data are displayed as the mean ± standard error.*p < 0.05 between groups, # p < 0.05 between trials in those with early psychosis.
decreased punishment learning parameters) in the SRL2 stage compared with the SRL1 stage, whereas controls did not.Our expectation was that the magnitude of this adaptation would be associated with worse clinical outcomes.However, greater decreases in the behavioural response to loss and loss learning were associated with increased IQ and decreased clinical impression severity, respectively.Therefore, these data indicate that a more complex relationship may exist, whether this be compensatory or otherwise.

Discussion
The present proof-of-concept study demonstrates that those with early psychosis are largely comparable to the controls with regards to goal-directed action and reversal learning.However, when the reversal learning probabilistic contingency changed and became more difficult (or uncertain), those with early psychosis showed a pattern of altered responding to loss.This phenotype reflects less sensitivity to recent loss and more weighting on past losses.Surprisingly, loss responding in those with early psychosis was associated with higher IQ and lower clinical severity scores.These outcomes highlight that decision-making processes may change over time in people with psychosis, and differences in cognitive adaptability in those with early psychosis may be a compensatory response given they are associated with better clinical indicators.Taken together, this supports other evidence of cognitive decline in the early stages of illness in some individuals with psychosis and provides potential avenues whereby this process could be arrested to decrease progressive disability.

Goal-directed action in early psychosis
Impairments in goal-directed behaviour have been observed in persistent psychosis using other behavioural approaches (Pantelis et al., 2004).Studies using outcome-specific devaluation have also shown that most people with persistent psychosis have impaired goaldirected action (Morris et al., 2015;Morris et al., 2018;Suetani et al., 2022).Our recent work found that over half of those with persistent psychosis had impairments in goaldirected action (Suetani et al., 2022).In contrast, the present study found that those with early psychosis had intact goal-directed action and we found no differences in comparison with control participants.In a prior study, deficits observed in those with persistent psychosis were associated with alterations in caudate activity and alogia/avolition symptoms (Morris et al., 2015).Whereas, in our prior work we found that response bias was associated with grandiosity and difficulty in abstract thinking (Suetani et al., 2022).Nevertheless, there are multiple factors that could account for a decline in goaldirected action with prolonged illness.For example, medication dose in those with persistent psychosis is higher (Suetani et al., 2022), and striatal dopaminergic hyperfunction has been shown to increase with illness progression (Howes et al., 2011).
Probabilistic reversal learning: loss-specific response adaptation in those with early psychosis Unlike other studies using deterministic reversal learning in attentional set-shifting paradigms (Leeson et al., 2009;Murray et al., 2008), in our study those with early psychosis performed reversal learning equivalent to age-matched controls, at least under standard contingencies (i.e., 80:20).However, when the contingency become more difficult and uncertain, those with early psychosis altered their response and learning to loss (i.e., decreased lose-shifting).This change in behavioural strategy and approach was not observed in controls.For example, similar decreases in win-stay use (6-7%) from SRL1 to SRL2 were observed in both groups highlighting that early psychosis subjects adapt similarly to controls in their response to reward.Most reversal learning studies use only the 80:20 contingency, but this work highlights that specific phenotypes may only be revealed at more complex contingencies.For example those with early psychosis were slower than healthy controls in resolving stimulus conflict in a multi-source interference task (Burgher et al., 2021).These phenotypic differences in people with EP are supported by preclinical work showing that increases in the striatal dopaminergic system of mice leads to lose-shift alterations after acute manipulations (Young et al., 2022).However, other phenotypes become apparent after chronic exposures and at more difficult contingencies (Young et al., 2022).Other studies have also demonstrated that those with first-episode psychosis have a lower sensitivity to punishment than controls in reinforcement learning paradigms (Chang et al., 2016;Leeson et al., 2009;Montagnese et al., 2020).This fits with our findings, both from the RP model which indicated that recent losses were less likely to guide learning and from the temporal profile of how a loss alters future choices.
Alterations in reversal learning are associated with improved psychiatric characteristics Impairments in reinforcement learning tasks in those with psychosis are commonly associated with increased negative symptoms and deficits in nucleus accumbens activity (Kesby et al., 2021).Using this same task in those with persistent psychosis, we found that poor rapport was associated with worse performance (trials to criterion and winstay use) in the SRL1 stage (Suetani et al., 2022).The only symptom associated with lose-shift use was suspiciousness and persecution, with increased scores associated with decreased lose-shift use.The sample size in this study was too small to include the individual PANSS sub scores in a GLM (which also makes these analyses exploratory in nature, requiring confirmation in larger cohorts), but neither positive or negative symptoms were associated with SRL2 lose-shift use or the punishment learning parameter.Rather, IQ and the Clinical Global Impression (severity) score were associated with lose-shift and punishment learning, respectively.Lower IQ scores tend to infer a greater risk of developing schizophrenia and an earlier age of onset (Aylward et al., 1984;Fusar-Poli et al., 2012;Khandaker et al., 2011).There remains the possibility that rather than IQ deficits conferring risk, increased IQ may be protective.Therefore, this hyper adaptability to changing contingencies in those with early psychosis may be a form of behavioural compensation.If this is the case, assessing how people with early psychosis respond to a wider range of contingencies may reveal situations where performance is improved relative to controls.

Evidence for differential phenotypes in early and persistent psychosis
Decision-making Impairments in those with early psychosis are often mild or non-existent when compared with persistent psychosis (Kesby et al., 2021;Leeson et al., 2009;Montagnese et al., 2020;Murray et al., 2008).In contrast, increases in trials to criterion and decreased win-stay use are commonly observed in those with persistent psychosis (Deserno et al., 2020;Kesby et al., 2021;Reddy et al., 2016;Suetani et al., 2022;Waltz et al., 2013).We have previously identified a subgroup of those with persistent psychosis with broad impairments in goal-directed action and reversal learning, which featured decreased win-stay use and a slower experience updating (Suetani et al., 2022).This phenotype is in stark contrast to the phenotypes observed in those with early psychosis in the present study, but there are certain outcomes that require further investigation.For example, although punishment learning did not underlie performance deficits in those with persistent psychosis, compared with controls they had increased punishment learning values during the SRL1 stage (Suetani et al., 2022).This may indicate that the punishment learning parameter increases with illness progression.Whether those subjects with early psychosis and high punishment learning values are more liable to progressive changes (increases) or more likely to stabilise at control levels remains to be seen.Punishment learning in rodents has been shown to map on to multiple cortical regions (Verharen et al., 2020), that may act in a sort of parallel redundancy.We have posited that perhaps cortical areas involved in reward may be spared early in psychotic illnesses and progressively decline (Kesby et al., 2021).Potentially, this hyper-adaptability towards loss in early psychosis reflects a form of cortical compensation.But as the illness progresses (or as medication doses increase), this compensatory response is attenuated.

Future directions and considerations
We recruited a modest number of those with early psychosis (N = 15) with moderate to low symptom severity as assessed with the PANSS, making this a proof-of-concept study.Nevertheless, former studies on outcome devaluation suggest this is sufficient to observe deficits in those with persistent psychosis (Morris et al., 2018).Although there is increased chances of a type II error, the distinct reversal learning phenotype observed in those with early psychosis compared with persistent psychosis (Suetani et al., 2022) suggest that differing cognitive processes are impaired in early psychosis.This is particularly interesting in light of recent imaging work in first-episode psychosis showing structural striatal changes are dependent on antipsychotic treatment and associated with symptom reductions (Chopra et al., 2020).Another factor that may impact the results observed in this study is the switch from 80:20 to 80:40 contingencies.This was not counterbalanced because we wanted to ensure the participants could complete enough reversals for analysis, and therefore included the simpler contingency first.Whether similar outcomes in those with early psychosis (or controls) would be evident if exposed to the 80:40 contingency initially is not known and warrants further investigation.Future studies looking in larger cohorts, longitudinally, are required to confirm these exploratory findings and establish which behavioural phenotypes in early psychosis subjects predict subsequent declines in decision-making capacity.

Conclusions
Our observations in those with early psychosis suggest decision-making processes are different to that of persistent psychosis, indicating a potential decline in a proportion of those with psychosis over time.Under standard reversal learning outcome probabilities, the phenotype in the early stages of illness is equivalent to that of controls.However, our novel task that seamlessly alters the outcome probability reveals a hyper-adaptability in those with early psychosis.Surprisingly this was associated with alterations in loss learning and response, and better clinical indicators.Aside from being distinct from phenotypes observed in persistent psychosis, this may be a compensatory mechanism.Nevertheless, there may be a critical period for intervention approaches after onset in these individuals to maintain, or delay, declines in decisionmaking processes.

Figure 1 .
Figure1.Decision-making tasks.For outcome-specific devaluation (A), participants are trained to learn two action-outcome associations.Then participants are explicitly informed that one outcome is now worth less credits (outcome devaluation) before completing the choice test to assess goaldirected action (the preference towards the now more valuable action).For serial reversal learning (B), participants are presented with two stimuli (which remain the same throughout the entire task).The serial reversal learning (SRL) 1 stages are probabilistically rewarded at an 80:20 contingency (e.g., 80% reward probability for the cloud-like stimulus and 20% reward probability for the spiral-like stimulus).After a selection the credit reward is displayed on the screen (0 or 1 credit for SRL1).For the SRL2 stages the probability of receiving a reward on the poorer stimulus is increased to 40% and participants can receive either 2 or 6 credits for a win (equal probability).SeeSuetani et al. (2022) for detailed methods.

Figure 2 .
Figure 2. Goal-directed action in those with early psychosis.Comparison of performance in the outcome devaluation task in healthy age-matched controls and those with early psychosis.Early psychosis subjects showed a significant in their preference (A) and responses (B) towards the valued stimulus after devaluation.Both controls and those with early psychosis showed a significant decrease in their valuation of the devalued stimulus compared with the valued stimulus (C).There were no changes in the valuation of an irrelevant stimulus or the subjects motivation to earn tokens.Data are displayed as the mean ± standard error.*p < 0.05, **p < 0.01, ***p < 0.001.

Figure 4 .
Figure4.Persisting impact of loss on future choices in those with early psychosis.Quantifying the probability of selecting the same stimulus up to 6 trials into the future after a reward (left panels) or loss (right panels) during reversal learning at 80:20 (A) and 80:40 (B).There were no differences in the likelihood to select the same stimulus after a reward or loss between controls and those with early psychosis at the 80:20 contingency (A).For the 80:40 contingency (B), those with early psychosis were more likely to select the same stimulus after a loss for the following two trials compared with controls.This lag in shifting after a loss supports the punishment learning parameter changes in Figure3and indicate that loss learning is slower (or less sensitive) in those with early psychosis.Data are displayed as the mean ± standard error.*p < 0.05 between groups, # p < 0.05 between trials in those with early psychosis.

Figure 5 .
Figure 5. Associations between behaviour and other scores.A multivariate general linear model (GLM) focussed on which specific symptoms, clinical symptom severity and control variables (i.e., IQ) were associated with lose-shift use and punishment learning in the SRL2 stage indicated two significant associations in those with early psychosis.Higher IQ was associated with decreasing lose-shift use (A).In contrast, increasing Clinical Global Impression (severity) scores [CGI (s)] were associated with increases in the punishment learning parameter (B).These outcomes indicate that differences in SRL2 lose-shift use and punishment learning were driven by participants with early psychosis who had higher IQ and less severe clinical Impression scores.ηp 2 , partial Eta squared.*p < 0.05.

Table 1 .
Demographics, IQ and substance use characteristics for age-matched control subjects and those with early psychosis.

Table 2 .
Psychiatric characteristics and symptom assessments.
Notes: AP, antipsychotic; PANSS, Positive and Negative Syndrome Scale.The data are expressed as mean (standard deviation) where applicable.