AI for imaging plant stress in invasive species (dataset from the article https://doi.org/10.1093/aob/mcaf043)
This dataset contains the data used in the article "Machine Learning and digital Imaging for Spatiotemporal Monitoring of Stress Dynamics in the clonal plant Carpobrotus edulis: Uncovering a Functional Mosaic", which includes the complete set of collected leaf images, image features (predictors) and response variables used to train machine learning regression algorithms.
Briefly, this is a description of the performed work: Rapid, large-scale monitoring is critical to understanding spatiotemporal plant stress dynamics, but current physiological stress markers are costly, destructive, and time-consuming. This study aimed to evaluate the potential of machine learning to non-destructively predict leaf betalains—yellow to reddish pigments unique to Caryophyllales species—for the first time, and to explore betalains’ intra-individual variation on a clonal species and its role to respond to stressful periods.We characterized the betalainic profile of an invasive clonal plant for the first time, Carpobrotus edulis (L.) NE Br. (the cape fig), via HPLC. We measured multiple stress markers over a year, including betalain content using our optimized method, where the species is spreading. Additionally, 3,735 digital images at the leaf level were taken. Machine learning regression algorithms were trained to predict betalain accumulation from digital images, outperforming classic spectroradiometer measurements. Betalain content increased sharply in non-reproductive ramets during extreme abiotic conditions in summer and during senescence in reproductive ramets. The stress markers revealed a strong intra-individual functional mosaic, underscoring the importance of spatiotemporal dimensions in stress tolerance. We developed a scalable, non-destructive tool for betalain research that integrates digital imaging with machine learning. This approach opens new possibilities for understanding spatiotemporal stress responses, particularly in clonal plant systems, using artificial intelligence.
By publishing this dataset we encourage further researchers to train machine learning algorithms to understand plant physiology by using images.
Data uploaded by Dr Erola Fenollosa (erola.fenollosa@gmail.com)
1st april 2025.
A. Data collection methods
(For images reference see manuscript: DOI: 10.1093/aob/mcaf043)
Leaf sampling
Twenty separated patches of the invasive hybrid complex Carpobrotus sp. (Novoa et al., 2023) were selected in Punta del Podaire (NE Spain, 42°21′13.9″N, 3°10′36.3″E). We aimed to capture plant responses to the harsh Mediterranean summer and the species reproductive senescence progression, and thus samplings were made every 15 d on average from June to September 2019. We included an extra sampling during winter 2020 to contrast with the lowest annual temperatures (Fig. 1; Supplementary Data S1). Samplings were performed during solar midday and on clear sunny days. To test the differential implication of betalains during stress and senescence responses, two types of leaves were taken within each individual (Fig. 1B): leaves from ramets with apical fruits (reproductive senescing ramets) and leaves from ramets without apical fruits (non-reproductive ramets), as performed by Fenollosa and Munné-Bosch (2020). Furthermore, to explore intra-ramet betalain variability and test the role of physiological integration under natural conditions we collected leaves from two positions within the ramet (Positions 1 and 2) (Fig. 1C). Those leaf positions corresponded to the youngest fully developed leaf, found at the ramet tip (Position 1), and the last leaf found on the ramet (Position 2). From each ramet type and individual, the two fully developed leaves from each position (corresponding to the two opposite leaves of the verticil) were collected. One leaf was frozen in liquid nitrogen and stored at −80 °C to measure plant pigments in the laboratory, and the other was used for in situ ecophysiological stress markers to characterize photo-oxidative stress levels.
Photosynthetic efficiency, leaf hydration and photosynthetic pigments
To detect photoinhibition and the efficiency of absorbed light conversion into photochemistry, photosystem II (PSII) maximum photosynthetic efficiency (Fv/Fm) was measured with a fluorometer (Imaging-Pam, Walz, Effeltrich, Germany) in fresh samples. An area of 4 cm2 from the more sun-exposed side of the leaf was used to measure mean Fv/Fm. To measure the impact of low water availability during summer, leaf hydration was calculated as: [H = (FW − DW)/DW], where FW corresponds to the leaf fresh weight and DW to the leaf dry weight, measured as the weight of the leaf after 20 d at 60 °C. Leaf area per dry weight (specific leaf area) was measured at each leaf with ImageJ software (NIH, Bethesda, MD, USA).Chlorophyll and carotenoid content were measured by spectrophotometry. Approximately 150 mg of frozen powdered sample was diluted with 100 % methanol (MeOH). Then the extraction procedure was applied: 20 min of vortexing, 30 s in an ultrasound bath, 20 min of vortexing and 10 s of centrifugation at 13 000 rpm, all at 4 °C and avoiding direct light. The supernatant was collected, and the pellet was re-extracted for two other full cycles of extraction resulting in a final extract of ~1.5 mL of 100 % MeOH. The final extract was read at 470, 652.4, 665.2 and 750 nm (Lichtenthaler and Buschmann, 2001).
Betalain quantification by HPLC
As no leaf betalain extraction protocol for Carpobrotus spp. has previously been reported, different protocols were tested and contrasted with standards for full optimization. We started with protocols from the closest related species to the invasive species: Disphyma australe and Mesembryanthemum crystallinum (Jain and Gould, 2015b; Vogt et al., 1999) and tested different modifications considering water:methanol proportion and re-extraction number considering different reviews on betalain extraction methods (Das et al., 2022; López‑Cruz et al., 2023) (Supplementary Data S2). The final proposed betalain extraction procedure in the cape fig used 300 mg of frozen leaf samples ground with a mixer mill and homogenized with 0.5 mL of 50 % MeOH (aq.) and mixed for 20 min with a vortex. The homogenate was placed for 30 min in an ultrasound bath, then vortexed for another 20 min and afterwards centrifugated at 13 000 rpm for 10 s, all at 4 °C in the dark. The supernatant was recovered, and the pellet re-suspended in 1 mL of 100 % MeOH. The extraction procedure was then repeated. The resulting supernatant was discarded, and the pellet resuspended in 0.5 mL of miliQ H2O at pH 5. After extraction of this second pellet, the supernatant was mixed with the first supernatant obtained, resulting in an extraction of ~1 mL of 25 % MeOH.Betalain detection was tested with a structured subset of samples (545 samples). Detection was performed via HPLC coupled with a UV-Vis spectrophotometer. The chromatography protocol was based on previous studies, especially Jain and Gould (2015b). HPLC analysis was carried out in an Agilent HPLC series 1100, with a reversed-phase C18 5-μm column (Nucleosil 250 × 4 mm, 100 Å). The elution solvents used were 1 % formic acid in water (v/v, eluent A) and 80 % acetonitrile in water (v/v, eluent B). The column was maintained at room temperature, at a flow rate of 1 mL min−1. The injection volume was increased to 80 μL. All samples were filtered before HPLC analysis with hydrophobic filters of 0.22 μm. After several tests to optimize the distinctive cape fig betalains while reducing chromatographic time, the initial solvent composition was set to 10 % B and increasing linearly to 29 % over 15 min, then to 33 % over 5 min. Afterwards, eluent B was raised to 100 % in 2 min and stayed constant there for 5 min. Lastly, the column was washed with 2 % B for 5 min before starting the next sample. Absorbance was measured at 538 nm. Identification of betalains was based on the comparison with standards of the peak’s retention time and absorbance spectrum. Additionally, extracts’ absorbance at 538 nm was read using an Aquarius Double Beam UV-Vis spectrophotometer (Cecile) to calculate the total betalain content using the molar extinction coefficient of betanin (60 000 L mol−1 cm−1).
Betalain estimation by spectroradiometry
Besides betalain estimation from images, we tested the potential use of a spectroradiometer to estimate cape fig leaf betalain content. For this, we collected a total of 21 independent leaves during solar midday in the August sampling and measured their betalain content. Three measures per leaf, one per each leaf face at a basal, medium and apical position, were taken. The UV-Vis spectral reflectance was measured with a portable spectrometer (Ocean Optics Inc., Dunedin, FL, USA) equipped with a deuterium–tungsten halogen light source (200–2000 nm). Reflectance, relative to a white standard (WS-1- SL), was analysed with SpectraSuite v.10.7.1 software (Ocean Optics), from 300 to 900 nm at 0.22-nm intervals.
Training machine learning for betalain estimation from digital images
All sampled leaves were photographed immediately after detachment from the individual using a static framework platform to place the leaves (Supplementary Data S3). Images were captured in complete shade to avoid reflections. In the image we included a measuring ruler, three white reference balance cards (black 100 %, grey 80 % and white 20 %, for white balance correction), a dark frame around the leaf and a leaf label. The camera used was a standard Canon EOS 750D digital camera, with a Sigma lens of 17–50 mm 1:2.8 EX DC OS HSM. Focal distance was always set to 17 mm. Photos were taken with the manual mode to control all the camera parameters: ISO of 100, diaphragm aperture of F/5.6 and shutter speed of 1/30–1/60. Photos were taken in RAW format. As the cape fig leaf has a triangular section three photos were taken for each leaf: one for each leaf side. The total number of images collected was 3735. After file encoding, images were white corrected, precisely rescaled, cropped and rotated to finally obtain a standardized clean leaf image using the ImageJ-Fiji software (Schindelin et al., 2012). Extraction of image features was designed considering multiple colour indices of interest from different colour spaces (RGB, HSL, CIELAB). Five types of features were considered: (1) percentage of pixels of the different colour categories defined by hue angle, (2) average values of each channel, (3) average and median channel value of each colour category region, (4) threshold parameters based on reddish region descriptors and (5) total number of leaf pixels (Supplementary Data S4). The described extracted features were used to predict leaf betalain content (µg per FW) using multiple machine learning regression algorithms (Linear regression, Ridge regression, Gradient boosting, Decision tree, Random forest and Support vector machine) using the Scikit-learn 1.2.1 library in Python (v.3.10.1) (list of hyperparameters used is given in Supplementary Data S5). The dataset consisted of 545 labelled true betalain content values that corresponded to a structured random selection of the total samples, and the set of colour features. In total, 75 % of the labelled observations were assigned to train and 25 % to test the model. To evaluate the model performance, the root mean squared root error (RMSE, the standard deviation of the residuals that represents the mean difference between the prediction and the real value for the test set) and R2 were used, which was calculated using r2_score() from Scikit-learn metrics. We considered a model successful if it is unbiased (predicted vs real values are distributed across the 1:1 diagonal) with the smallest RMSE. Variable contribution was analysed via permutation_importances(). In addition to leaf betalain prediction we also tested the potential of the image features to predict chlorophyll and carotenoid content.
Data analysis
To determine if ramet type, leaf position and sampling time play a role in betalain accumulation, a linear mixed model using restricted maximum likelihood (REML) was fitted. A hierarchical nested random structure was used to consider that from each individual (repeated over time), two ramet types were selected from which two leaf positions were taken. For the post-hoc test contrasting sampling times, P-values were adjusted using the Tukey test. The Satterthwaite method was used to test differences between sampling times as suggested by the emmeans test. Data were tested with Shapiro–Wilk and Levene tests for normality and homoscedasticity and transformed whenever necessary. All analyses were performed using the lmer, emmeans and multcomp packages in R (v.4.3.1).
B. Uploaded data contents
The dataset contains these elements:
- The set of collected, white-corrected, scaled and with no background high quality leaf images used to extract colour features. If required we could also proivde the raw leaf images and their ROIs.
- The dataframe of extracted colour features from all leaf images and lab variables (ecophysiological predictors and variables to be predicted)
- Set of scripts used for image pre-processing, features extraction, data analytsis, visualization and Machine learning algorithms training, using ImageJ, R and Python.
Read the readMe.txt to find detailed information of each file.
Funding
Spanish Foundation for Science and Technology Margarita Salas Postdoctoral Fellowship
Institució Catalana de Recerca i Estudis Avançats ICREA Award
History
Usage metrics
Categories
- Ecological physiology
- Terrestrial ecology
- Bioinformatic methods development
- Plant biochemistry
- Plant physiology
- Biosecurity science and invasive species ecology
- Environmental assessment and monitoring
- Artificial intelligence not elsewhere classified
- Modelling and simulation
- Machine learning not elsewhere classified