A history and evaluation of catch-only stock assessment models

Understanding the status of fish stocks is a critical step in ensuring the ecological and economic sustainability of marine ecosystems. However, at least half of global catch and a vast majority of global fisheries lack formal stock assessments, largely due to a lack of sufficient data. Catch data, loosely referring to any catch records be it inclusive of discards or not, are the only type of fishery data available across a wide range of fisheries at a global scale. This has given rise to a long list of so-called “catch-only” models, intended to estimate aspects of stock status based primarily on characteristics of a fishery's catch history. In this paper, we review the history, performance and potential of “catch-only” models to estimate stock biomass status. While individual catch-only models often report good performance, repeated efforts to examine the performance of these models have consistently found them to be imprecise and biased when applied to new data-limited fisheries. We demonstrate that a large reason for this is the simple lack of information on stock status contained in the shape of a catch history alone. Off- the-shelf use of catch-only models can lead to poor and biased estimates of stock status, potentially hindering efforts at effective management.


| INTRODUC TI ON
Understanding the status of fish stocks is a prerequisite for an effective management system.Given the expense and complexity of surveying the oceans, for many fisheries, commercial catch data are the only source of information available to the public.Catch is often defined as the sum of both fisheries landings and discards, but throughout this paper, we use it as a broad term referring to any catch data available to fisheries scientists and managers, be it inclusive of discards or not.The Food and Agriculture Organization (FAO) of the United Nations maintains a global database of fisheries landings that includes over 20,000 individual catch histories by FAO statistical region, country and taxon.In contrast, the RAM Legacy Stock Assessment Database (RLSADB, www.ramle gacy.org), which includes most of the publicly available stock assessments conducted around the world, contains information on just over 1,300 stocks.
There is a large gap then between what is being caught and what is assessed in a formal way, but there is an alluring intuitive connection between the history of catches and the state of a fishery.A fish stock can sustain much lower catches when it is severely depleted (e.g. when biomass B much less than the biomass that could produce maximum sustainable yield B MSY ) than when it is more abundant (closer to B/B MSY of one).The expectation that trends in catches should provide information about stock status, combined with a lack of survey data typically needed for formal stock assessments, has given rise to numerous efforts to assess the status of a stock based largely or exclusively on catch histories, so-called "catch-only models."However, despite this long history of development and use, catch-only models have yet to provide a robust means of estimating either regional or stock-specific status, usually defined as biomass or fishing mortality rates relative to a reference point (Free et al., 2020;Ovando et al., 2021).In this paper, we explore the history of catchonly stock assessment methods, summarize evidence as to their performance and demonstrate why catch-only models have limited potential as a means of stock assessment.

| A HIS TORY OF C ATCH -ONLY MODEL S
To sustainably manage fisheries to produce food, employment and profits, we need to understand the status of fish stocks, especially so we can identify when they have been depleted to a level where the potential yield is reduced, requiring management actions to rebuild them.
We also want to identify which stocks have been so reduced in abundance that they are of conservation concern and should be included in national or international endangered species listings such as CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) or the IUCN (International Union for the Conservation of Nature) Red List.In the 20th century, many fisheries management agencies began tracking the changing abundance of many stocks through various methods including scientific surveys, tagging programmes and commercial trends in catch per unit effort (CPUE).These abundance trend data are now available for over 1,300 fish stocks constituting roughly 50% of global marine fish landings in the RAM Legacy Stock Assessment Database (Hilborn et al., 2020;Ricard et al., 2012).
Unfortunately, we do not have abundance trend data for thousands of other fish stocks that are vital to many people's food security and incomes.Instead, catch histories are the most commonly available data type for fisheries across the globe.Data on catch have been recorded by various agencies since the 19th century, and commercial records have been used to reconstruct the historical catch for some fisheries for up to five centuries (Cushing, 1982).For example, the near extinction of the great whale species has been reconstructed from logbooks of whaling vessels that document the rise and fall of catch, and especially the decline of catch per unit effort, over centuries of whaling (Bockstoce & Botkin, 1983).Similarly, Cushing (1982) used processor and sales records between Norway and Sweden to reconstruct the catch of herring and to document changes in both the abundance and distribution of the stock.Caddy and Gulland (1983) used catch histories to identify four types of fishery patterns including (i) steady state, (ii) cyclic fisheries, (iii) irregular stocks and (iv) spasmodic stocks all of which were based on examining trends in landings.They recognized that differences in patterns of catch have two fundamental causes, changes in the environment or changes in fishing pressure, and that either could cause a stock to behave in a way other than "steady state." By the mid-1990s, the Food and Agricultural Organization of the United Nations (FAO) had compiled data on global landings going back to 1950.These landings data are summarized primarily by major FAO region, country of landing and taxonomic group-often not only described as an individual species, but also frequently described in larger taxonomic aggregates, with "marine fish not elsewhere included (nei)" representing the coarsest aggregate.We will use the term "stock" to refer to these categories (region, country and taxonomy) recognizing that, in many if not most cases, they are coarse aggregates of many biologically independent populations.
In 1996, Grainger and Garcia (1996) used the FAO landings data to estimate when fisheries catch for individual stocks peaked, which they interpreted as an indicator of full exploitation.They state: "Through a preliminary analysis of trends, globally and by oceans, we attempt to demonstrate that these extended time series can be very useful in interpreting developments in the world's fisheries and so help in assessing the present situation as well as for planning and policy-making for the future."Part of their goal was to provide estimates of global fisheries potential, but, more relevant to our purpose here, they used the ratio between the maximum historical catch and the most recent catch to indicate whether there was further potential for expansion of catch, or whether the stock was fully or even overexploited and provide an estimate of overall stock status for each FAO region (Table 6 in Grainger & Garcia, 1996).
But, Grainger and Garcia were certainly circumspect regarding what the catch pattern shows.They wrote: "The difference between peak and current landings must be interpreted with caution.Peaks in a smoothed production probably give an indication of the average long-term yield that the species assemblage in a given area may be able to produce sustainably in the future, with proper management.However, in the case of demersal stocks sensitive to regime shifts on a decadal scale, peak harvests resulting from transient favourable environmental situations bear little relation to the average long-term yield."These concerns are much better understood today where we now understand that catches are known to decline for many reasons other than overfishing, perhaps the most common being shifts in productivity, which have been estimated to impact over 70% of fish stocks (Vert-pre et al., 2013).For example, changes in recruitment, and subsequently eventual fishable biomass, are often much better explained by factors other than spawning biomass (Cury et al., 2014;Szuwalski et al., 2015) This is closely followed by catches declining due to the implementation of fisheries regulations explicitly designed to reduce catch, which has now been observed in many coastal and high seas fisheries (Hilborn et al., 2020;Pons et al., 2018).
Following similar methods to Grainger and Garcia, (Grainger & Garcia, 1996), Froese (Froese & Kesner-Reyes, 2002) attempted to assess the status of global fisheries using the ratio between the current catch and the maximum catch to determine stock status.Stocks were considered collapsed if the catch has peaked and recent catches were less than 10% of the peak, overfished if recent catches were between 10% and 50% of the peak, and fully exploited if recent catches were greater than 50% of the peak.
This broad method has been used extensively by the Sea Around Us Project to judge the status of global and regional fisheries (Pauly, 2007) (Figure 1).
In 2006, Worm et al. (2006) used these catch-ratio heuristics to identify stocks classified on this basis as "collapsed," calculated the proportion of stocks collapsed for each year and projected a trend forward.Their projection of the proportion of fisheries categorized as collapsed reached 100% by 2048 (Worm et al., 2006), a claim that was intensively criticized by examination of the estimates of individual stocks where abundance was known (Hilborn, 2007;Murawski et al., 2007).As a result of the critique, a joint group of the original authors and critics evaluated trends in changes in abundance, rather than catch, and found only a small decline in the overall abundance of stocks with abundance estimates from data-rich stock assessment models (Worm et al., 2009).This group developed a global database of stock abundance called the RAM Legacy Stock Assessment Database (Ricard et al., 2012), which now contains estimates of the trends in abundance of stocks representing about 50% of global catch.The most recent global assessment of these data suggests that assessed fish stocks are, on average, increasing in abundance and above management targets (Hilborn et al., 2020).
Nevertheless, the remaining 50% of global fish landings reported to FAO come from stocks that are not formally assessed, and these stocks tend to come from countries where fisheries are particularly important to food security and employment-largely in the tropical world.Thus, there has been an ongoing interest in using the reported landings data to infer the status of the unassessed stocks.

| A TA XONOMY OF MODERN C ATCH -ONLY MODEL S
Early catch-only models were based on simple heuristics comparing current landings to historic maxima.The importance of understanding the status of unassessed fisheries has led to a rapid proliferation of the number and types of catch-only assessment models built on Garcia and Grainger's (2005) initial efforts.In general, modern catchonly models utilize either an empirical, mechanistic or ensemble approach to estimate stock status from catch histories (Free et al., 2020).
Empirical methods use statistical models trained on assessed stocks to derive associations between stock status and catch time series, often with auxiliary information about the biology of the exploited species or characteristics of the exploiting fishery.In contrast, mechanistic methods postulate an underlying population dynamics model, sometimes coupled within an underlying effort dynamics model, to explain changes in catch through a combination of changes in both stock abundance and fishing effort.Finally, ensemble methods use statistical models to combine and leverage the strengths of individual catch-only models, including predictions from both empirical and mechanistic methods.We provide a brief overview of the methods included within each approach and their relative strengths and weaknesses.et al., 2012;Thorson et al., 2012); (ii) a boosted regression tree model that uses properties of the catch time series (Zhou et al., 2017); and

| Empirical approaches
(iii) a boosted regression tree model that uses characteristics of the fishery including attributes of the catch history (Free et al., 2017).
In general, the developers of these methods validated performance through cross-validation conducted during model fitting and/or through application to a withheld test dataset.Despite these efforts, challenges remain in truly validating the performance of empirical catch-only models.First, empirical methods are difficult to simulation test-the gold standard in model validation-due to their reliance on real-world relationships that cannot be reliably simulated.
For example, surprising environmental responses, shifting management regulations, changing costs and prices and market disruptions from natural disasters, civil wars or global pandemics all impact abundance (Branch et al., 2011) but are challenging to simulate.Second, empirical methods are trained on data-rich stocks, whose dynamics may systematically differ from the data-poor stocks for which these methods are intended to assess, making their performance on datapoor stocks challenging to evaluate (Free et al., 2017(Free et al., , 2020)).

| Mechanistic approaches
Existing mechanistic catch-only models use various implementations of stochastic stock reduction analysis (SRA) (Kimura et al., 1984;Kimura & Tagart, 1982;Walters et al., 2006) to describe the distribution of population trajectories and characteristics that could have resulted in an observed catch time series.Generally speaking, SRAs reconstruct historical abundance and exploitation rates by simulating biomass trajectories that could produce an observed catch time series given priors on the levels of depletion in the initial and final years of the catch time series, and often on population dynamics parameters such as carrying capacity, K, or intrinsic growth rate, r, in simple biomass dynamics models (Pella & Tomlinson, 1969;Schaefer, 1954).
Existing mechanistic catch-only assessment models can be divided into those that estimate only a population dynamics model (Martell and Froese, 2013;Froese et al., 2017;Zhou et al., 2018;Ovando et al., 2021) and those that estimate a population dynamics model coupled with an effort dynamics model (Thorson et al., 2013;Vasconcellos & Cochrane, 2005).They also differ in the specific algorithms and assumptions used to perform the SRA.
Mechanistic approaches are attractive because (i) they can more easily be validated (or refuted) through simulation testing; (ii) they can estimate time series of biomass and fishing mortality and MSY-based reference points (unlike empirical approaches which generally estimate only one outcomes at a time, such as B/B MSY ); and (iii) they can estimate demographic parameters (i.e.r and K) that can be used to drive operating models for projecting fisheries outcomes under alternative management scenarios (e.g.Costello et al. (2016) and related works).However, the utility of these methods can be limited in that they are highly sensitive to the choice of priors for key parameters (Bouch et al., 2021), the default priors of these methods can lead to inaccurate and biased estimates of stock status (Free et al., 2020;Ovando et al., 2021), and as we will show here, the catch time series does little to update status predictions beyond the priors.Despite these serious challenges, SRA methods, especially the CMSY family of methods (a shorthand for "Catch" and "Maximum Sustainable Yield," Froese et al., 2017;Martell & Froese, 2013), have been used to provide tactical advice for large commercial fisheries (e.g.Barman et al. (2020)) and strategic advice for regional (Froese et al., 2018;Smith et al., 2021) and global (Palomares et al., 2020) fisheries management.

| Ensemble approaches
Ensemble models, which attempt to combine the strengths of in- Furthermore, existing ensemble models only predict biomass stock status in the final year of the catch time series (i.e they do not predict time series, reference points or effort status) and cannot be used to drive population dynamics in fisheries operating models.

| E VALUATI ON S OF C ATCH -ONLY MODEL PERFORMAN CE
The use of catch-only models in data-poor stock assessments and fisheries management depends on quantitative evidence demonstrating their ability to infer and predict aspects of stock status, and their subsequent utility to management.By inference, we refer to the ability of catch-only models to estimate often unobservable parameters of a stock, such as MSY, based on data, in the manner of any statistical stock assessment model.By predictive ability, we refer to the ability of a model to make accurate predictions of observations outside of those on which it was trained (e.g. in the manner of (Costello et al., 2012)).
A thorough evaluation of catch-only model performance must therefore demonstrate that a method can provide accurate estimates and predictions of stock status and useful management advice under a range of potential conditions (e.g.species type, fisheries management context, and length and uncertainty of the catch time series).These evaluations can be achieved through testing on assessed fisheries with data-rich estimates of stock status or testing on simulated fisheries with known stock status.
Simulation testing can offer true "known" values to compare model predictions against, greater sample sizes, the ability to compare performance across more conditions and an opportunity to evaluate the consequences of using advice from catch-only models in management decisions.However, the validity of simulation testing also depends on the degree to which the dynamics of the simulation model match the "true" dynamics of the systems in which catch-only models are intended to be applied.Empirical validation can allow comparison to a wider range of potentially more realistic models, but have the shortcoming of comparison against model estimates, not true known values.
Rigorous performance evaluations should also reflect the conditions under which a method is likely to be applied.For example, if the performance of a catch-only model depends on the priors used to parameterize the method, then testing should reflect an expert's ability to accurately set priors for a new fishery under the conditions they are likely to face in the real world.It has been rare for either internal evaluations (testing conducted by method developers) or external evaluations (testing conducted by peers) to include all of these idealized components (i.e.simulation testing, empirical testing, robust cross-validation across a range of applicable scenarios and management strategy evaluation).Thus, we synthesize the performance of catch-only models across a combination of several studies, acknowledging that a full review of the testing methods used to evaluate every catch-only model is beyond the scope of this paper.
We begin with an examination of the methods used to internally validate CMSY (Froese et al., 2017), arguably the most widely used model for assessing the status of fisheries using only catch data.
When testing the performance of CMSY on data-rich stocks, the developers manually updated the default priors for 54 (34%) of the test stocks to ensure that they contained the true parameter values.
They argued that this represents a scenario in which expert opinion has "not made gross errors in setting broad prior biomass ranges." However, this likely represents an optimistic view of the ability of experts to set priors in a new fishery and does not reflect the common use of these models.For example, default priors, not expert opinion, were used to assess 1,320 global stocks in Palomares et al. (2020), with the authors justifying the use of CMSY based on its performance under the idealized conditions it was tested on.The developers also assessed the performance of CMSY on a set of 48 simulated stocks.
However, the operating model used to generate the simulated test dataset was identical in structure to the underlying SRA model.A more rigorous testing approach would test the simplified population dynamics algorithms underlying CMSY and other catch-only models against more realistic age-structured models, which now have widespread software support (e.g.FLR; (Kell et al., 2007)).
As another example, Zhou et al. (2018) tested OCOM, another SRA approach, on assessed stocks that contributed to the parameterization of their prior on final year depletion.Best practices in predictive modelling would have the model tested on a set of data fundamentally isolated from its development (Kuhn & Johnson, 2013).The testing regime in Zhou et al. (2018) also employed a common "leave-one-out" strategy, in which the model is fit to all but one stock, and then used to predict the status of the omitted stock, with this process repeated across each of the stocks.tested the performance of both Catch-MSY (Martell and Froese, 2012) and CMSY (Froese et al., 2017) on a new set of 2,700 simulated stocks and still found both to be imprecise and biased, especially when the stocks were lightly exploited.Bouch et al. (2021) applied CMSY (Froese et al., 2017) to 17 ICES stocks and found it to overestimate relative fishing mortality and underestimate relative stock status, especially for stocks showing signs of recent recovery.Sharma et al. (2021) applied CMSY and sraplus (Ovando et al., 2021) to 48 ICES stocks and found both to generate biased predictions of stock status when using default priors.Ovando et al. (2021) similarly found that catch-only models frequently produced imprecise and biased estimates of stock status, misclassifying the overall state of a fishery 57% of the time.
Developing a robust set of tests for a predictive model, particularly for a problem as complex as predicting the stock status of a fishery, is a difficult task.It is understandable that different groups will develop different testing regimes based on their anticipated use of the model, and the suite of models and data at their disposal.
However, the potentially optimistic testing regimes used in the initial justifications for many existing catch-only models may explain the sharp drop in performance exhibited by these models when confronting external evaluations.
Estimating stock status is only the first step in the fisheries management process and fully judging the utility of catch-only models in data-poor fisheries management requires tracking the impact of their status determinations on harvest control rules, biomass status and fisheries outcomes such as catch, profits and employment.A well-designed harvest control rule can help overcome known biases in an assessment method.Such testing can be achieved through management strategy evaluation (MSE), which simulates the resource, fishing fleet and management decisions in a closed-loop system (Punt et al., 2016) and has been underutilized in either internal or external evaluations of contemporary catch-only models (but see (Harford & Carruthers, 2017)).

| WHAT C AN C ATCHE S TELL US?
While the extensions to the basic catch heuristics first employed by Froese, Garcia and Grainger (Froese & Kesner-Reyes, 2002;Garcia & Grainger, 2005) described above may-under the right circumstances-provide performance gains, the general consensus of numerous external evaluations is that catch-only models provide imprecise and biased estimates of stock status when applied to new fisheries.In this section, we consider why this might be, by breaking down the ability of catch-only models to infer information on stock status and the ability of catch-only models to predict stock status, two related but critically different tasks.
Drawing from Donoho (2017), this distinction relates to the "two cultures" outlined by Breiman (2001), prediction and inference.
Inference is concerned with estimating the parameters of a hypothesized data-generating process.This is the realm where most fisheries, and indeed statistical, models operate.A classic example of an inferential task would be to estimate the growth rate and carrying capacity, and by extension stock status, of a fishery based on an assumption of logistic growth and given an index of abundance and a time series of catches.Inferential models are often judged by the strength of the evidence in the data for a particular model structure, for example the posterior distribution of a parameter of interest (Tredennick et al., 2021).Critically, inference allows us to estimate the value of unobserved parameters (e.g. stock status relative to reference points) based on assumptions of a data-generating process linked to observed data (e.g.time series of catch and an abundance index).
Purely predictive models can be agnostic as to the underlying data-generating model.They are concerned only with the accuracy of predictions made by the model and not with the strength of evidence for particular parameters of the model.While more interpretable models such as linear regressions can be used for prediction, increasingly "black-box" models such as random forests, boosted regression trees and.other machine learning methods are employed for predictive modelling.In these cases, the model seeks to leverage correlations between an outcome of interest and candidate variables to make accurate predictions.Predictive models of this type cannot estimate the value of unobserved parameters on their own; they require training on a dataset with "known" values.In the fisheries context, a predictive model might seek to predict stock status as a function of a catch history-based off relationships observed between catch histories and stock status values for a subset of fisheries with "known" stock status.

| Inference from Catch Histories
The broad intuition behind catch-only models is relatively simple: Collapsed fisheries produce zero or little catch, and fisheries collapse after overexploitation.Therefore, a sharp rise in catches followed by a sudden and sustained crash in catches likely tells us that the stock was overexploited and collapsed.While this intuition seems sound Where t is time, C is catch, q is the catchability coefficient for fishing effort E, and B is fishable biomass.If all we observe is C, we cannot clearly estimate B using conventional statistical without making numerous other assumptions: that is we have one equation and three unknowns (though see discussions of Takens' theorem, as in (Thorson et al., 2013), for considerations of ways in which information on biomass may be embedded in catches).Given only a catch history, to equate changes in catch with changes in biomass, we must assume that both catchability q and effort E are constant over the same time period.While this may occur in some fisheries, it is an unlikely scenario.This is why formal fisheries assessments are instead based on catch-per-unit-effort data and focus largely on separating catchability from biomass (Hilborn & Walters, 1992).
It is clear then, that in the absence of other data, through most approaches, we cannot infer anything about trends in biomass from catch alone without invoking strong assumptions about parameters such as catchability and effort over time.What though about the shape of the catch history?Surely, conditional on a model, certain catch histories imply different fishery states?This is the core idea on which most catch-only models are based.To test this idea, we ran a series of stock reduction analysis (SRA) models to evaluate the (1) C t = q t E t B t impacts of the shape of the catch history on the stock status estimated by each algorithm.
The stock reduction algorithm used is sraplus, described in Ovando et al. (2021).The model is supplied with a prior on the growth rate r (log-normal with mean = 0.35 and CV = 20%) and the carrying capacity K (log-normal mean = 5 times max catch, CV = 100%, capturing a range similar to the version of CMSY used in (Anderson et al., 2017) ) for a Pella-Tomlinson (Pella & Tomlinson, 1969) surplus production model, as well as priors on B relative to K in the initial (log-normal mean = 1, CV = 25%) and final years of the catch history (log-normal mean = 0.2, CV = 25%).We chose these relatively informative priors to reflect ranges used by default for analogous parameters in many common catch-only models (Table S1).We then generated a series of catch histories that have the same average lifetime catch but vary starkly in their dynamics: a constant increase, a constant decrease, a peak then a decline, a decline then a recovery, a random walk and random values (Figure 2a).We then used sraplus to perform a standard stochastic stock reduction analysis on these artificial catch histories, using the same priors on growth rate, carrying capacity, and initial and final B/K for each simulated catch history.In each iteration of the stock reduction analysis, the algorithm randomly selects a value for the growth rate, carrying capacity and initial and final B/K from the supplied prior distributions.The Pella-Tomlinson model is then run with the selected prior values on growth rate and carrying capacity, initialized at the sampled initial B/K, using the artificial catch history in question.Any combination of prior draws that results in biomass values less than or equal to zero is immediately rejected.The remaining "viable" draws from the stock reduction analysis are then sampled in proportion to the likelihood of the projected final B/K relative to the prior distribution of the final B/K.These general methods match the steps laid out in Walters et al. (2006) barring the difference in the operating model used and reflect the general steps in CMSY (omitting a prior on intermediate

B/K and strict bounds on B/K).
We then examined the distribution of B/K predicted by the catch-only model relative to the explicit priors supplied on stock status.Note that the supplied B/K prior in sraplus is log-normally distributed, so the model has support for a continuous range of B/K values greater than zero.This means that, in theory, the post-modelpre-data B/K values generated by sraplus are allowed to vary greatly from their inputs.Any stock reduction analysis done in this manner will produce a positive bias in final B/K estimates, as there are always more ways for a stock to be less depleted than more depleted.
Consider a prior on final B/K with nearly all its probability density concentrated near zero, a stock reduction analysis model will assign a low probability to an individual draw from the stock reduction algorithm that produced a final B/K near one, at the tail end of the prior distribution.However, suppose that our prior on carrying capacity was much greater than the average catch volume, nearly all draws from the stock reduction algorithm will have a final B/K near one, since catches will be much lower than carrying capacity.The final distribution of B/K produced by sraplus will then be pulled close F I G U R E 2 Simulated case studies demonstrating informational content of catch histories alone.Key parameters are growth rate and carrying capacity (K).The prior for the growth rate is log-normally distributed with a mean obtained from FishLife and a coefficient of variation (CV) of 20%.The prior for K is log-normally distributed with a mean equal to five times the maximum catch and CV of 100%.The prior on stock status (B/K) in the final year is log-normally distributed with mean of 0.2 and CV of 25%, mean of 1 and CV of 20%.By default, sraplus employs a prior-predictive tuning procedure to correct this behaviour (see Ovando et al. (2021)).But, we have turned that feature off for this analysis to isolate the impact of the catch history shape itself on estimates of stock status from catchonly models.A common Bayesian model diagnostic is to compare the prior and posterior distributions, to see how much any priors in the model were updated by the data the model was confronted with.
The premise behind catch-only models is that the catch history on its own can tell us something new about stock status that was not encoded in the priors themselves.However, comparing the prior and posterior of the most recent B/K estimate for these simulated catch histories, we see that the prior and posterior are essentially identical despite the priors being confronted with starkly different catch history shapes.Conditional on our life history and stock status priors, and the model choice,1 in most cases, the shape of the catch history alone does not contain meaningful information about stock status in the final year of the fishery that was not already encoded in our priors, although there may be edge cases where this will not be true.
For example, if we know a given fish species exists, but catches for that species have been zero in every time step, baring massive process error, any stock reduction will not be able to find many parameters values that results in final B/K values much less than one.But for more typical catch history forms, like those used in our example, conditional on the priors, the shape of the catch history itself does not meaningfully update our prior beliefs about final stock status (Figure 2b).This result may appear somewhat confusing to readers who have previously encountered catch-only models that produced posterior distributions of final B/K values that appeared very different than their supplied priors.In our above examples, we use sufficiently diffuse priors on key life history parameters (growth rate and carrying capacity) such that they do not constrain the model.However, this is not always the case in catch-only models.Consider a fishery with an average annual lifetime catch of 10,000 MT (metric tons), suppose then that we supply a prior on carrying capacity K that is uniform [0, 11,000], and a prior on final B/K that is uniform [0,1].Clearly, given average annual catches of 10,000 MT and a prior on K that at most barely exceeds that level, the only possible solution for a standard model is that the stock is at very low levels.Therefore, despite having a very diffuse explicit prior on final B/K, the posterior will show a very tight distribution at a very low stock status.See Supporting Information for additional analyses demonstrating these phenomena across a range of more diffuse and more informative priors for both high and low stock status.These scenarios collectively demonstrate that stock status in the most recent time step is positively correlated with the diffusion of the priors in stock reduction algorithms of this form.
The problem here is that when we provide a model with priors on, in the case of a simple Schaefer model (Schaefer, 1954), growth rate r, carrying capacity K, and initial depletion, conditional on the model structure and the catch history, we have provided an implicit prior on stock status.When we then place a second explicit prior on final stock status, we create a problem known as Borel's Paradox (Poole & Raftery, 2000).Borel's Paradox suggests that when we supply an implicit and explicit prior on final stocks status, the posterior will reflect the joint combinations of all these layered priors, creating the appearance of learning.This means that certainly, if we provide sufficiently informative and accurate priors, a catch-only model may provide good results.But, the performance is entirely dependent on the priors and the validity of the priors and the chosen model, not on information gleaned from the shape of the catch history alone.This means that the choice of priors on life history has a substantial influence on the outcomes of catchonly models.
In summary then, catch histories alone do not allow us to infer stock status.The "information" about stock status comes almost exclusively from specific assumptions such as relatively constant effort, or priors on r and K and final stock status.Given specific priors, different catch histories can imply different states of a fishery, but this updating is dependent on the supplied r and K priors, not intrinsic information derived from the catch history that is preserved as you modify the r and K priors.

| Prediction from Catch Histories
This simple simulation exercise illustrates that the ability of catchonly models to infer recent stock status based on the shape of a catch history is limited.However, on its own, this does not necessarily mean that, as a measure of stock status, catch-only models would perform badly as a predictive model in the real world, suppose that every stock in the world followed Schaefer dynamics (Schaefer, 1954) and had the same r, K and initial depletion.If we knew the value of these parameters, or could provide reasonable priors on their distribution, applying that knowledge to the catch history of each stock through a catch-only model would provide the correct stock status for every fishery.The question then is how robust any selected priors are, which, as we show above is where the actual information about stock status in catch-only models is coming from.
Many catch-only models have built in models or heuristics for generating priors.For example, CMSY (Froese et al., 2017) has a series of internal heuristics that create priors on initial, intermediate and final B/K based on the characteristics of the catch history.Costello et al. (2012)  When we compare published predictions from catch-only models for regions of the world where we either have good information on stock status or general recognition of concern about stock statusthe catch-only models detect only minor differences between regions that have very different statuses (Table 1).A very high proportion of stocks are assessed in the NE Pacific and NE Atlantic and are well above target reference points while numerous assessments from the Mediterranean show that stocks there are in poor shape.
For the Western and Central Pacific (mostly China and Indonesia) and the Eastern Indian Ocean, we have expert opinion that stock status is poor (Melnychuk et al., 2017).All three catch-only models considered here struggle to reproduce this understanding of either the absolute or relative differences in stock status between these regions (Table 1).
Diving into these discrepancies, we examined the performance of the default priors used by Catch-MSY (Martell & Froese, 2013) and CMSY (Froese et al., 2017), arguably the two most widely used catch-only stock assessment methods, when applied to stocks with data-rich assessments in the RAM Legacy Stock Assessment Database (Ricard et al., 2012).We found that the default priors used by Catch-MSY and CMSY for final B/K, which are set based on the ratio of final to maximum catch (Table S1), correctly captured only 52% and 33% of the data-rich estimates respectively (Figure 3a).
Furthermore, the default priors were highly biased for stocks with low catch ratios (final/maximum catch <0.5), which represent over three-quarters of the evaluated RAM stocks.The default priors assume that all of these stocks are below B MSY when 38% of these stocks are actually above B MSY (Figure 3a).The default priors for initial B/K, which are set based on the first year in the catch time series, are also poor predictors of initial B/K.These priors correctly captured only 13% of the data-rich estimates (Figure 3b) and were especially biased for catch time series beginning after 1960.The priors assume that stocks with catch time series beginning after 1960 begin over to fully exploited (B/K = 0.2-0.6;Table S2) when, in fact, 82% of these stocks were lightly exploited in the first year of the catch time series (B/K > 0.6; Figure 3b).
The default priors for intrinsic growth rate r, which are based on resilience (Table S3), are more reasonably distributed than the B/K priors but include high values not relevant to most stocks.This is problematic given that the CMSY algorithm favours high growth rates and low carrying capacities in the "tip of the triangle" of viable r-K pairs (Froese et al., 2017).This drives the CMSY algorithm to favour high, and unlikely, growth rates, which could lead to biased predictions of stock productivity and target fishing mortality rates (F MSY ).We surmize that this could lead to high overfishing if default values of CMSY are used to assess stocks and to guide tactical management.Finally, the default priors used by Catch-MSY for carrying capacity are reasonable but are very broad (Figure 3d).There is little empirical support (Figure 3d) for the default priors for carrying capacity used by CMSY, which are based on a complex combination of resilience, initial B/K and maximum catch (Table S4).Furthermore, the "tip of the triangle" assumption employed by CMSY favours low carrying capacities, which contributes to CMSY's pessimistic bias toward low stock biomass status (the majority of this bias likely comes from the pessimistic final B/K priors).
To the extent that one could a priori reliably predict life history parameters r and K for a fishery, a catch-only model would produce reliable predictions to the degree that the population can be well represented by the selected population dynamics model.However, the default prior-generating algorithms for life history parameters utilized in one of the most commonly employed catch-only models, CMSY, have clear biases that will in turn bias estimates of stock status (Figure 3).An alternative strategy then is to identify empirical predictive relationships between catch history attributes and stock status.This is the underlying philosophy of efforts such as Costello et al. (2012), Thorson et al. (2012) and Zhou et al. (2017).We revisit these approaches here to illustrate both the potential and limitations of predictive catch-only models.
We trained a machine-learning model (a boosted regression tree) to predict stock status as a function of catch history.The model treats values of B/B MSY reported in the RLSADB as "known" and then fits a model using candidate characteristics of the catch history, along with life history parameters, to predict the values reported in the RLSADB.Performance of predictive models is assessed based on the accuracy of predictions made by the model on observations held out from the training process.The testing regime must then be carefully designed to reflect the intended use of the model in question.
In this case, the goal was to be able to predict the status of fisheries with unknown stock status, often in completely new geographic regions, based on observed relationships in fisheries with "known" stock status.We filtered the RLSADB down to 293 stocks with values of B/B MSY and at least 25 years of catch data.We then included only the values after the first 20 years of the fishery, to prevent the model from spending too much effort trying to estimate stock status in the early years of the fishery (assuming, in this case, that the purpose of the model was to provide estimates of current stock status).
In order to assess model performance when faced with stocks in an entirely new region, we then split these stocks into a training and testing datasets, where the training dataset contained all stocks that fit our criteria except those from New Zealand, Australia and South Africa (N = 293; 88% of stocks), and the testing dataset contained all stocks that fit our criteria from New Zealand, Australia and South Africa (N = 40; 12% of stocks).The data were split using these three countries to retain a testing dataset that had a reasonable sample size and was geographically isolated from the training dataset.All model tuning (Figure S2) and fitting were performed exclusively on the training dataset.
The candidate covariates for the model included various attributes of the catch history and life history traits of the species and are similar to the covariates used in other empirical catch-only models (Anderson et al., 2017;Costello et al., 2012;Thorson et al., 2012;Zhou et al., 2017).Life history traits include steepness, asymptotic size and the ratio of natural mortality to growth rate and were drawn from FishLife (Thorson, 2020).Attributes of the catch history included factors such as catch divided by the mean or maximum catch in each time step.We also used an unsupervised spectral clustering algorithm to classify each of the stocks in the RLSADB into one of four catch history shapes, with the exact shape of the catch histories clusters assigned to each cluster determined by the spectral clustering algorithm (Figure S1).This catch history classification was also included as a predictor of stock status in the model.See Table S5 for a complete list of candidate variables.These candidate predictor variables are intended to reflect the kinds of catch history traits and life history variables commonly included when constructing priors for catch-only models.In theory, if a strong predictive relationship between these types of covariates and B/B MSY exists, a flexible model such as a boosted regression tree should find it.
Boosted regression trees have a number of tuning parameters that must be set outside of the model fitting process itself.We performed a grouped v-fold cross-validation routine on the training dataset to tune these parameters, splitting the training dataset into five analysis and assessment splits, each of which assigns nonoverlapping sets of stocks to the analysis and assessment splits.We then fit a grid of candidate tuning parameters to the analysis splits and used them to predict the assessment splits.The set of tuning parameters with the lowest root mean squared error (RMSE) on the assessment splits was selected for use for model fitting.Catch-only models depend on the assumption that there is a connection between stock status and catch histories.Active management of catches clearly distorts this relationship.To account for this, we ran a secondary test, repeating the above process, but restricting the model to data before 1990.As strict fisheries management was rarer in these years, this provides a test of whether a stronger predictive relationship exists between catch and stock status in years with fewer catch restrictions.
The R 2 of the model on the training dataset was 0.7, with an RMSE of 0.5.However, unlike random forests, boosted regression trees do not automatically produce out-of-bag predictions for the training data.Therefore, this relatively good fit is not a true measure of the predictive power of the model as the stocks being predicted were also included in some of the model fits.The leave-one-out analysis provides an honest assessment of the predictive power of the model on the stocks in the training dataset, producing an R 2 of 0.38 and an RMSE of 0.72.Fits for the testing dataset were slightly poorer (R 2 of 0.28 and an RMSE of 0.7), but not by much, suggesting that our tuning process did a reasonable job of selecting parameter values that prevented overfitting (Figure 4).Results for fits to the pre-1990 data were similar, see Figure S3.
In our analysis, a model using only life history and characteristics of the catch history, in the manner of Costello et al. (2012), was able to explain roughly 30% of the variance, a value higher than reported in Costello et al., (reported fits in that paper are on log-scale, and so will be worse than the reported values when converted to natural scale as evaluated here), and in line with the values in Zhou Clearly, further work could be done to refine performance.
However, in our experience, these refinements (e.g.alternative model types and feature engineering) can produce some improvements, but rarely produce transformative increases in model performance.We suggest then that the levels of predictive power seen here and in Zhou et al. (2017)  simply not there to be found.This analysis demonstrates that, while in theory predictive empirical relationships could be found between catch and life history and stock status, we see no evidence that improved performance beyond the levels observed here is likely to exist given currently available data.It should also be noted that what evidence is present will degrade as soon as management measures have been put in place in response to a catch-only model, limiting the ability of predictive catch-only models to serve as a basis for continued assessment and responsive management.

| CON CLUS IONS
Catch histories are the most comprehensive source of global data on fisheries production, spanning both industrial and artisanal fisheries across a wide range of economic and ecological conditions (FAO, 2020), although it is widely recognized that small-scale fisheries are underrepresented in the data reported to FAO (Kelleher et al., 2012), and the quality and taxonomic resolution of the data is highly variable.These data are an important tool for helping us understand the role of global fisheries in human wellbeing and anthropogenic impacts.Catches are a function of the size of fish populations, and so contain information on the minimum size of stocks.However, based on first principles and numerous efforts at replication, it is clear that catch-only models are not a consistently reliable means of either inferring or predicting the state of fished populations.
The basic catch equation (Equation 1) shows that one cannot solve for biomass without making strong assumptions about factors such as catchability and effort.There certainly are fisheries in which both effort and catchability have been sufficiently constant or predictable as to facilitate inference of stock status based on catches alone.
However, in most fisheries, catches change for a wide number of reasons, including environmental forcing, economic incentives, technological advances and fisheries regulations (Hilborn & Branch, 2013).
Each of these factors can break the relationship between stock status and catch.Our results show that the shape of the catch history itself is uninformative on the state of a fishery, conditional on life history priors and the mean volume of the catch (Figure 2).
Efforts at predicting stock status based on catch histories reveal a consistent pattern of some but limited predictive power, often with out-of-sample R 2 less than 0.35, and substantial potential for bias (Anderson et al., 2017;Bouch et al., 2021;Free et al., 2020;Ovando et al., 2021;Pons et al., 2020).These values may be higher for cases where the fishery to be estimated closely resembles those included in the training dataset, but predictions are likely to be particularly poor for fisheries that look little like the highly managed stocks on which predictive models can be trained.This is certainly the case for the other half of the world's fisheries where catch-only models are most likely to be needed or applied.This is not to say that catch-only models are without value.The original formulation of CMSY (Martell & Froese, 2013) was designed not to estimate stock status but to estimate MSY itself.There is ample evidence and logic supporting the idea that for fisheries with catches that have ever come close to or surpassed MSY, the catch history can provide a plausible guide to the magnitude of this important reference point.In terms of stock status, to the extent that there is good reason to think that a particular stock can be represented with a surplus production model, if users are able to set reasonably accurate priors on population parameters such as r and K, then filtering these prior distributions through the lens of the population model combined with the catch history will indeed provide an accurate estimate of stock status.However, the focus then must be on understanding the critical importance of setting these life history priors accurately, and not on assuming that somehow the combination of the priors along with the shape of the catch history can provide information not encoded in the priors themselves.
Catch-only models are also likely to work better in some systems than others.However, evaluating catch-only models does present a bit of a paradox.The only fisheries for which we are likely to have robust data-rich empirical estimates of stock status will tend to be highly managed fisheries, exactly the cases where we would expect the weakest relationships between catch and biomass due to active management and strong market forcing.This means that empirical evaluation of catch-only models may underestimate their potential performance in less managed fisheries with perhaps a clearer link between catch and biomass.However, our analysis evaluating the predictive power of an empirical catch-only model using only pre-1990 data, when fisheries management was less intense in assessed fisheries, suggests that this gap may be small (Figure S3).The only alternative to empirical testing of catch-only models is simulation testing.While simulation testing can be designed to reflect a wider range of fishery dynamics than those represented in databases such as the RLSADB, simulated fishery dynamics can also be overly simplistic, and simulation tests often conform more to the assumed dynamics of catch-only models than reality will for the sake of computational efficiency (e.g.use of surplus production dynamics for both the operating and estimation model).
The critical need then is more robust estimates of the expected performance of catch-only models under a range of plausible circumstances, based both on confrontation with empirical observations and evaluation through simulation testing.Users can then make informed judgements as to whether the expected performance of catch-only models for their particular application are likely to be sufficient for their needs, or whether the expected imprecision and bias are simply too great for model outputs to be useful as an index of stock status.Where catch-only models are still selected for use as a source of stock status, it is critical that users are empowered with an understanding that estimates of stock status based on SRA-style catch-only models are purely a function of the accuracy of priors on stock status combined with the accuracy of required life history priors: There is no way for the shape of a catch history to "overcome" poor priors and infer the true state of a stock except by chance alone.For more empirical predictive models, the best available evidence suggests that while there is some predictive power in catch histories, it is limited and highly sensitive to the resemblance of the fisheries it is applied to compare the fisheries the models were trained on.
Our concern is that catch-only models can promise an off-the-shelf method to assess fish stocks that can and has been used without clear communication or understanding of their limitations.Despite increasingly being used as a tool informing direct management decisions, it is clear from both empirical and simulation testing that off-the-shelf use of catch-only models is not a reliable means of estimating stock status, and that a key to better global fisheries assessment and management is not wider application of catch-only models or model refinement, but rather expanded collection and curation of diverse data sources that contain meaningful information on the state of fisheries.
dividual catch-only models to generate better predictions of stock status, have emerged as the most accurate and least biased catchonly predictors of biomass stock status(Anderson et al., 2017;Free et al., 2020).Existing ensemble approaches have experimented with various formulations ranging from weighted averages to statistical models based on traditional regression techniques to machine learning models that are themselves ensemble models (leading to the term super ensembles)(Anderson et al., 2017).Ensemble approaches have been shown to outperform individual assessments when used appropriately.However, ensembles require a method for aggregating results, whether simply taking means of predictions, or weighting individual models based on their perceived performance for the task at hand.The latter can perform better but requires some measure of the performance of individual models, often relative to a historical empirical baseline or based on simulation(Anderson et al., 2017).
and would certainly apply to some overexploited open-access fisheries, what can we actually infer from simply observing the catch history of a stock?To answer this, consider a simple and common equation for catch in a fishery: (a) Shows simulated catch histories (solid line) and the resulting estimates of B/K (biomass relative to carrying capacity, dashed line) given each simulated catch history.(b) Shows the prior (dark grey) and posterior (light grey) distributions of B/K in the final year.Priors on initial and final B/K and life history are identical across all simulations to one, since even though the prior assigns a low probability to any individual draw with B/K equal to approximately one, the model has many more opportunities to select these high values.
also produced estimates of stock status based on empirically derived relationships between catch histories and observed stock status.Both of these methods share a common trait of having observed, either informally or statistically, relationships between catch histories and stock status.The question then is to what extent do such prior-generating mechanisms produce valid TA B L E 1 Status of fish stocks (measured as B/B MSY ) from different regions from known assessments or expert knowledge, and from three published catch-only models Melnychuk et al., 2017).b from RAM Legacy Stock Assessment Database.F I G U R E 3 Default priors for (a) final B/K, (b) initial B/K, (c) intrinsic growth rate, r, and (d) carrying capacity, K, from Catch-MSY (Martell & Froese, 2013) and CMSY (Froese et al., 2017) compared with values derived for stocks with data-rich stock assessments.In all panels, the coloured shading indicates default priors and the black points and/or boxplots indicate values derived from data-rich stocks.In (a) and (b), black and grey points indicate stocks with B/K estimates occurring inside and outside the default priors used in CMSY respectively.In these panels, the dashed lines indicate pre-exploitation biomass (B/K = 1), which can be exceeded due to age-structured population dynamics and the dotted lines indicate B MSY (B/K = 0.5).Catch-MSY sets default initial B/K priors based on the catch ratio in the final year of the catch time series and is not shown.CMSY sets default carrying capacity priors based on resilience, initial B/K and maximum catch and is not shown predictions when applied to new fisheries?To that end, we consider here whether catch-only models show evidence of being effective predictors of stock status.
We then fit the model to the complete training set using the selected tuning parameters and then used the fitted model to predict the B/B MSY values in the testing set (containing only stocks from New Zealand, Australia and South Africa).As a secondary test, we also conducted a leave-one-out cross-validation test on only the training data (holding the tuning parameters at their selected values), sequentially removing one stock from the training set, fitting the model on the remaining training stocks, predicting the omitted stock, then moving on to the next stock.

F
I G U R E 4 RAM Legacy Stock Assessment Database (RLSADB) reported (x-axis) and machine learning predicted (y-axis) B/B MSY values.Colour represents density of individual points across these coordinates.The training dataset (N = 293) includes RLSADB stocks from everywhere except Australia, New Zealand and South Africa.The testing dataset includes only stocks from Australia, New Zealand and South Africa (N = 40).Leave-one-out analysis are the out-of-bag predicted values for the stocks in the training dataset.Test indicates R 2 and root mean squared error (RMSE) for each individual panel.The black dashed line is the 1:1 fit, and the red solid line is a linear fit between the reported and predicted values et al. (2017).In both the leave-one-out and testing fits, the model underestimates stock status for stocks with B/B MSY greater than one and overestimates it for stocks with B/B MSY less than one.The RMSE of this model is high enough to frequently assign a stock to the wrong general status bin (e.g.classify as under fished when it is overfished and vice versa).
An example of estimation of stock status from catch-only data.From the Sea Around Us Project (SAUP), http://www.seaaroundus.org/data/#/global/stock-status methods employ a range of regression techniques and predictor variables including (i) panel regression models that use properties of the catch time series and characteristics of the stock(Costello Walsh et al. (2018)used a MSE model to evaluate whether an ensemble model, the most accurate catch- (Walsh et al., 2018)17;Free et al., 2020)Anderson et al., 2017;Free et al., 2020), can reliably inform fisheries harvest control rules.They found that the large inaccuracies in even this most reliable of catch-only models require cautious harvest control rules to prevent overfishing and improve B/B MSY status.While successful, cautious harvest policies result in considerable foregone sustainable yields(Walsh et al., 2018).Such trade-offs are likely larger with even less accurate status determinations.