Validation of the SmartPlate for detecting food weight and type

Abstract This study determined accuracy (comparing to criterion), inter-plate reliability (comparing measures between two plates), and intra-plate reliability (comparing successive measures on one plate) of the SmartPlate for food weight and type. Food weight validation included comparing SmartPlate weights to criterion [reference] scale weights (1,980 measures) and weights of 188 foods (2,256 measures). Food type validation included assessing SmartPlate accuracy for 188 foods. For weight, mean absolute percent errors for accuracy, inter-plate reliability, and intra-plate reliability were 6.2, 7.4, and 4.9%, respectively. For food type, foods were correctly identified/listed or searchable 67.0 or 98.9% of the time, respectively, with 76.0% inter-plate reliability and 86.3% intra-plate reliability. The SmartPlate had acceptable accuracy and reliability for assessing food weight and type and may be appealing for monitoring dietary surveillance or intervention. Due to high intra-plate reliability, the SmartPlate may be especially useful for one-on-one interventions and assessing change over time.


Introduction
Recent estimates of overweight and obesity prevalence are sobering, with 73.6% of US adults and 35.4% of US children and adolescents being overweight or obese (Centers for Disease Control and Prevention 2021a, 2021b). The COVID-19 pandemic seems to have worsened the trend, with estimates early in the pandemic suggesting body weight gains of $0.6 kg just in initial lockdown phases in adults (Bhutani et al. 2021) and later data suggesting that 48% of adults reporting weight gain (and $23% of these individuals reporting gains of at least 4.5 kg) during the pandemic (Khubchandani et al. 2022). Given the host of diseases and conditions linked to obesity (Apovian 2016), effective weight loss or weight maintenance strategies are needed. Additionally, as obesity is fundamentally a long-term mismatch between energy intake and energy expenditure, it is important to be able to measure these constructs accurately.
Several methods exist to quantify dietary (food and beverage) intake. For assessment of specific nutrients (e.g. Vitamin D), biochemical blood or urine markers can be used to assess total nutrient intake (Shim et al. 2014). However, biochemical markers can be used only for a subset of essential nutrients in the diet, cannot be used to assess overall energy intake, and may be affected by homeostatic processes or certain disease states (Clarke et al. 2020;Kaaks et al. 2002;Shim et al. 2014), rendering them of limited value in understanding overall dietary patterns. More common methods employed to understand dietary patterns are subjective recall methods such as food frequency questionnaires, food records/logs, and dietary recalls which can be paper-or online-based and are completed after the fact or in real time. Many such tools have been developed for different populations, age groups, durations of the recall period, and focus areas of the diet. These methods are widely utilised in clinical and epidemiologic research to understand dietary intake but suffer from shortcomings including biased or inaccurate reporting and low compliance. Even the best recall methods have modest accuracy (especially at the individual level) and vary in accuracy according to factors such as religious and cultural background, familiarity with foods, how hungry a person is, and meal type (Amoutzopoulos et al. 2020;Foster and Bradley 2018;Illner et al. 2012;Shim et al. 2014).
Recent technological innovations have opened new opportunities for dietary assessment. In the last 10 years, many recall methods and food logs have been moved to electronic platforms such as mobile phone applications. Literature on the validity of application-based dietary tracking reveals mixed results, although both accuracy/reliability and compliance with application-based tracking appears to be somewhat improved due to the immediate and convenient nature of recording intake (Cade 2017;Ferrara et al. 2019). Some mobile applications also allow for pictures of meals to be taken and logged, which can then be evaluated for food type and portion size later by the user or by researchers (Hales et al. 2016). While this likely improves accuracy in assessing types of foods consumed, it is still difficult to estimate portion sizes using recall methods (Hochsmann and Martin 2020). Uses of food pictures (on applications or on paper), food atlases, reference objects, and prompting questions within an application may improve accuracy and reliability of portion estimation (Amoutzopoulos et al. 2020;Boushey et al. 2017), but portion sizes are most accurately assessed by weighing foods. By measuring food weight before a meal and then the remaining food after a meal, a net weight and therefore the exact amount consumed can be recorded and converted to a precise portion estimate. However, such a process adds significant burden and expense. Additionally, food photos, atlases, diagrams, and models can be used to increase accuracy in portion size estimations, but such tools add extra burden (and therefore varying degrees of compliance) and are limited to the types of foods captured by these methods (Amoutzopoulos et al. 2020;Ngo et al. 2009). They also are of variable utility as food portion sizes are easier to recall for some types/shapes of foods (e.g. solid, symmetric foods) than others (e.g. liquid foods or those with undefined or irregular shape) (Amoutzopoulos et al. 2020).
Most recently, machine learning and image recognition algorithms have been developed to automatically recognise and record food types, and sometimes food portions, in a picture of a food and can therefore reduce burden of users and researchers. Accuracy of these algorithms is variable depending on the number of food choices available in the recognition database, lighting, number of pictures used, type of food (and if it contains multiple ingredients), and reliance on user input to define regions around distinct food types (Hochsmann and Martin 2020;Lo et al. 2020;Tahir and Loo 2021). Additionally, a number of methods have been developed to determine portion size based on volume estimates from food pictures. One review highlights mixed accuracy of such methods, with errors ranging from 1 to 57% depending on lighting, number of pictures, and assumptions about food shape (Lo et al. 2020). Therefore, portion size estimation from food pictures may pose a particular challenge in using such technology to accurately assess dietary intake.
The limitations of machine learning food recognition systems could be partially alleviated when combined with a food (kitchen) scale. Taking a picture (or scanning a food) for food type while simultaneously weighing the food could help determine precise portion sizes and nutrient intakes. One device, the SmartPlate (Fitly Inc.; Austin, TX, USA) and its associated mobile application (SmartPlate in the App Store; free version used for this study), aims to combine food image recognition along with a specialised three-tray plate that is set on a base with integrated weight scales to assess food type and weight accurately and in a user-friendly format. Through cross-referencing the foods and weights (for serving size determination) scanned in a database, the application records daily and meal-specific energy, macronutrient, and micronutrient content. The SmartPlate mobile application also allows users to enter demographic, exercise and goals (e.g. weight loss) and establishes macronutrient targets to allow users to compare actual vs. desired nutrient intake to assist users (if necessary) in modification of dietary patterns to meet established goals. The SmartPlate became available for consumer purchase in 2021, but to our knowledge, the accuracy and reliability of image recognition for food type and the scales for determining food weight have not been independently evaluated. Accordingly, the purpose of this study was to evaluate the accuracy, inter-plate reliability, and intra-plate reliability of the SmartPlate for both food weight and food type recognition in a laboratory setting.

Procedure
This study did not involve human or animal subjects, so it was exempt from regulatory ethics approval. The SmartPlate website (https://www.getsmartplate.com/) recommends placing up to one food type on each of the three trays while the plate is sitting on its base and then taking a scan/picture using a smartphone from a top-down position in the SmartPlate mobile application. The food type and the weight of each of the three trays is then automatically displayed in the mobile application. Testing of the SmartPlate included evaluation of the weight scales under each of the three sections/trays of the plate as well as detecting food types. More specifically, accuracy (comparison of SmartPlate to criterion), inter-plate reliability (comparison of the same weight/food type from two SmartPlates to each other), and intra-plate/test-retest reliability (testing of the SmartPlate twice consecutively with the same weight/food type and comparing assessments) were evaluated. These assessments were conducted for each of the three trays on the plate, as well as for three different locations on each tray (top, middle, bottom). A picture of the SmartPlate with trays and sections of each tray can be seen in Figure 1. The SmartPlate scales on the base under each tray are slightly smaller than the size of the trays themselves, so we felt it important in our analysis to determine if location of the food on the tray affected accuracy of weight measures. More specific descriptions follow.

Weight validation
For the weight validation, an industrial food scale (AvaWeigh PCOS10NSF, Lancaster, PA, USA) was used as the criterion (reference) measure of food weight. Before testing, the scale was checked for weight accuracy and reliability using two sets of standard metal weights (Acogedor, West-Flanders, Belgium) starting at 0.5 g and progressing in varying increments to 1,000 g. In all cases, the scale accuracy and reliability was within 1.0 g of the standard weights, confirming the scale's suitability as a criterion measure.
We conducted the weight validation in two phases, the first (hereafter referred to as "incremental weight validation") using weights starting at 28.3 g (1.0 oz) and increasing in 2.8 g increments (31.2, 34.0, 36.9, etc.) until 226.8 g (8 oz), and then increasing in 14.2 g (0.5 oz) increments (226.8, 241.0, 255.1, etc.) up to 567.0 g (20.0 oz). For the incremental weight validation, known quantities (weighed on the criterion scale) of sunflower seeds and small metal weights were added to each tray. The reason a sunflower seeds and weights were used instead of just weights is because the SmartPlate application must recognise a food on a tray in order to provide a weight measure. For weights up to 283 g (10 oz), sunflower seeds were placed into a paper cup on the criterion scale, until the desired weight was achieved and then transferred in the cup to the SmartPlate tray for testing. Above 283 g, the cup was full of seeds and small metal weights (Acogedor) were added until the desired weight was achieved on the criterion scale. Once the desired weights of sunflower seeds/metal weights were in the correct position on each tray, the research staff used a smartphone (iPhone X or XR, Apple Inc., Cupertino, CA, USA) equipped with the SmartPlate mobile application to scan the contents of the tray from a picture taken directly above (30-50 cm) the tray so that the reference area was filled while using the scan function on the app. The contents were automatically classified in the application for food type and weight on each tray, and the weights assessed for each tray of the plate were recorded for comparison to the criterion.
For the weight recordings from 28.3 to 154.1 g (1.0 to 5.4 oz), the measures were taken at nine locations on each plate (each of 3 trays [1, 2, 3] and each of 3 locations on tray [top, middle, bottom]; Figure 1). Then from 155.9 to 567.0 g (5.5-20.0 oz), three locations were assessed for each plate (in the middle location of 3 trays). Two scans were completed for each weight and averaged for comparison to the criterion (validity), and the two weights were then compared to each other (intra-plate reliability).
Testing was completed for two distinct SmartPlates, which were then compared to each other at each weight (inter-plate reliability). This process resulted in 1,980 measures used for validation of weight readings (for 28.3-154.1 g, it was 45 weights Â 3 trays Â 3 positions on tray [top, middle, bottom] Â 2 trials per weight Â 2 plates ¼ 1,620 measures; for 155.9-567.0 g, it was 30 weights x 3 trays Â 2 trials per weight Â 2 plates ¼ 360 measures). Figure 2 provides a schematic view of the incremental weight validation.
A second weight validation (hereafter referred to as the "food weight validation") was conducted by taking two weights in the middle of each tray on each plate for each of the 188 foods described below in the food type validation. This resulted in an additional 2,256 weight measures (188 foods Â 3 trays Â 2 trials per tray Â 2 plates) with which accuracy, inter-plate reliability, and intra-plate reliability were assessed.
Occasionally, apparently errant values were obtained from a scan. In cases where the weight recorded by the SmartPlate was more than 42.5 g (1.5 oz) different than the criterion either when using sunflower seeds/metal weights or the foods used in the type validation, the SmartPlate was reset and a second scan was completed and used for comparison.

Food type validation
Following weight validation, accuracy, inter-plate reliability, and intra-plate reliability assessments were conducted for food type recognition. A total of 188 different foods/preparations were tested on the plates. Foods were scanned in the middle of each tray, twice in succession, and then the trays were transferred to the other plate for testing. A table that categorises the food by category (fruit, vegetable, dairy, meat, grain) and provides a more specific description of each food can be found in the Supplemental Table. For foods with multiple ingredients across multiple food categories (e.g. pizza with cheese and sausage, spaghetti with marinara sauce), it was classified according to its base ingredient (e.g. pizza and spaghetti both classified as grains). The foods we chose to test the SmartPlate were selected from lists such as the food frequency questionnaire used by the National Health and Nutrition Examination Survey (NHANES; https://epi. grants.cancer.gov/diet/usualintakes/ffq.html) alongside availability of foods at the local grocery store. All scans were completed in a laboratory with abundant natural light as well as all interior lights turned on for optimal lighting.
Food preparation was unique for each food, with some foods not requiring any preparation and some requiring preparation before being tested on the SmartPlate. Fruits and vegetables were first tested as whole pieces, then often either sliced or diced. Some vegetables were tested on the SmartPlate when cooked as well, which consisted of saut eing the vegetables in a frying pan before testing the food on the plate. Protein food items (e.g. meat products) were cooked before testing as well, with many of them being grilled. Dairy products were tested on their own; the only exception was cream cheese, which was tested on a bagel. For foods with heavy liquid content such as yoghurt or cereal with milk, they were purchased in a single-serve bowl, which was then set the SmartPlate trays for assessment.
When scanning foods, if the app had high confidence in a food type, it reported only one food on the app. Alternately, if the app had some degree of uncertainty as to the food type, it reported a list of 4-5 potential foods, and (if necessary) a second list of at least 4 foods from which the user would be allowed to select the food that was on the plate. If the app incorrectly identified a food or if the food was not available in the lists of choices, the food was searched for by name on the app. While packaged foods can be searched by scanning the barcode, we did not utilise this feature for the present study.

Statistical analysis
For both the weight and food type validations, analyses were conducted in Microsoft Excel 2016 (Microsoft Corp., Redmond, WA, USA) and SPSS version 24.0 (IBM Corp., Armonk, NY, USA).

Weight validation
When assessing accuracy for weight detection, the average of the two trials in each plate position was taken and compared to the criterion. This was done for each tray position (for the weights of 28.3-153.1 g), on each tray, and for each plate. Intraclass correlations (calculated in SPSS using the formula by Shrout et al. (Shrout and Fleiss 1979)), mean absolute errors, and mean absolute percent errors were also calculated.
For inter-plate reliability, the average of the two trials in each plate position was taken and compared that of the other plate. This was done for each tray position (for the weights of 28.3-153.1 g) and on each tray. Intraclass correlations, mean absolute errors, and mean absolute percent errors were also calculated for comparing plates.
For intra-plate reliability, the first and second trials were compared to each other for each tray position (for the weights of 28.3-153.1 g), on each tray, separately for each plate. Intraclass correlations, mean absolute errors, and mean absolute percent errors were also calculated.
When the actual foods were being measured for food type validation, their weights were also recorded, and the same accuracy, inter-plate reliability, and intra-plate reliability assessments were conducted to confirm weight validation with foods other than the sunflower seeds/metal weights used for the incremental weight validation.

Food type validation
When assessing accuracy for food type, there were five possible outcomes (Table 1). In some cases, the app identified a single food, and in such cases it was either the correct food (#1 in Table 1) or not. When it was not correct, we then searched for the food in the database to determine if it was available on the app (#4) or not (#5). When the app did not identify a single food but rather identified a list of potential foods, it would be classified as #2 if it was in the first short listing of foods, #3 if it was in the second longer listing of foods, #4 if it was not in either listing of foods but was searchable by name, and #5 if it was not in either listing and not searchable. The percentage of time each of the following outcomes occurred was calculated over two trials for each tray position, tray, and plate. This was done overall by combining the two trials for each tray and plate. Additionally, a subanalysis by food category was conducted.
When assessing inter-plate and intra-plate reliability, the same five possibilities as listed above were possible. Therefore, for inter-plate reliability the percentage of time that the two plates (for a given tray and plate) had the same food, same first list, etc. occurred for each trial at each position was calculated. Similarly, for intra-plate reliability the percentage of time that the two trials within a given tray and plate had the same food, same first list, etc. occurred was calculated. Table 2 displays accuracy and reliability statistics for the incremental weight validation. For accuracy, mean absolute error overall was 5.3 g, ranging from 3.4 to 11.3 g for individual trays and tray positions. This corresponded to an overall mean absolute percent error of 5.9% (range 3.3-10.2%), with 87.2% of measures having a percent error of <10%. Point estimates of error tended to be lowest for the middle position and highest for the bottom position on a given tray but were consistent across trays and plates. Reliability analyses provide similar findings, with overall mean absolute errors of 7.3 g (mean absolute percent error 6.4%) and 5.3 g (mean absolute percent error 5.8%) for inter-plate and intra-plate reliability, respectively (range 2.6-10.4 g, 2.2-12.1%), again with lowest errors for the middle position and highest errors for the bottom position of a tray. Additionally, 83.2 and 84.6% of inter-plate and intra-plate comparisons, respectively, were within 10% of each other. Intraclass correlations were excellent in all validity and reliability analyses except for the intra-plate reliability comparison of tray 2 on the bottom section, which had a good correlation (0.873).

Weight validation
When actual foods were used for weight assessment (food weight validation), accuracy statistics (Table 3) were similar to those in the incremental weight validation, with mean absolute error of 6.3 g (range 4.5-7.8 g) and mean absolute percent error of 7.3% (range 5.7-9.0%) with 72.7% of measures with a percent error <10%. Reliability statistics were also similar to those of the incremental weight validation, although the food type validation revealed higher errors for inter-plate reliability (mean absolute error 8.7 g, mean absolute percent error 10.3, 61.1% of measures within 10%) than for intra-plate reliability (mean absolute error 1.7 g, mean absolute percent error 2.2, 95.2% of measures within 10%). All intraclass correlations were !0.990 and were considered excellent.
Food type validation Accuracy in recognising food types can be seen in Table 4. Overall accuracy for identifying the correct food was 42.3% across plates and trays (range 33.7-46.4%), was identified in the first list of possible foods 17.1% of the time (range 13.3-23.0%), in the second list 7.7% of the time (range 5.0-8.9%), was searchable 31.9% of the time (range 30.6-33.2%), and was not directly searchable 1.1% of the time (two foods; Little Caesars breadsticks and Club crackers). Table 5 shows some variation in accuracy by food category, ranging from 5.7% for dairy products to 60.5% for nuts/seeds/legumes. Reliability statistics can be found in Table 6. Inter-plate reliability was 76.0% overall (range 73.3-79.8%, and 62.5-87.5 for individual food categories), which was lower than the 86.2% found for intra-plate reliability (range 83.9-87.6% overall, and 66.7-93.8% for individual food categories).

Discussion
Technological improvements and high levels of smartphone ownership have encouraged innovative ways to

. Second list
Food was not chosen correctly and was not in the first list but appeared in the second, longer listing of options 4. Searchable Food was not chosen correctly and did not appear in either list of potential foods, but was searchable in the app 5. Not searchable (worst case) Food was not chosen correctly, was not in either list of foods, and was not searchable in the app 5.8 (7.5) 3.9 (5.1) 6.0 (5.0) 5.9 (6.1) 4.6 (4.2) 8.1 (4.9) 5.6 (5.5) 6.0 (4.7) 10. track dietary intake and, therefore, better understand how diet affects health. Indeed, automation of sleep and physical activity measurement using wearable devices has improved their measurement accuracy and has coincided with stronger relationships detected between these variables and health metrics (Ferrari et al. 2020;Shadyab et al. 2017), and it is likely to be similar when assessing dietary behaviours. As strategies for dietary monitoring continue to improve (using technologies such as the SmartPlate), it will become possible to better understand how dietary patterns affect health, specific nutrients associated with health or disease, and targeting of and intervention in those with dietary patterns putting them at elevated health risk. However, as new products and approaches to recording dietary intake become available, it is critically important to perform validation studies to understand how well these products/approaches work. Due to the recent release of the SmartPlate as well as the appeal of automatic logging of both food weights and types, our study sought to determine the accuracy and reliability of the SmartPlate food plate and associated smartphone application in both weighing foods placed on the plate as well as recognising the food type based on a phonetaken photo of the food on the plate. Overall, the SmartPlate performed well for food weight measures, with mean absolute errors 11.3 g for all accuracy and reliability measures and across over 4,000 weight measures. Using 10% mean absolute percent error as a threshold, the vast majority of measures were acceptably accurate/reliable for the weight measures. Weight measures were most accurate and reliable when the foods/weights were placed on the centres of the trays and least accurate at the bottom of the trays, especially for the smaller trays (Trays 2 and 3). It is unclear why position on the plate affected accuracy, other than the likely possibility that the scale is built to be under the centre of the tray, and foods placed closer to the centre are Table 3. Accuracy, inter-plate reliability, and intra-plate reliability for weight assessments with actual foods (food weight validation).

Plate 1
Plate 2   Tray 1  Tray 2  Tray 3  Tray 1  Tray 2  Tray 3 MAE 4.5 (6.0) 7.8 (8.1) 7.1 (7.2) 6.7 (7.1) 5.3 (6.4) 6.6 (7.2) MAPE 5.7 (7.5) 9.0 (10.0) 8.1 (8.4) 7.5 (7.3) 6.0 (7.7) 7.6 (8.    therefore more accurately measured. Additionally, it may be that the smaller trays are designed for smaller volumes of food and therefore heavier items are not weighed as accurately or consistently. Thus, from a practical perspective it would be prudent to recommend users place food in the centres or towards the tops of the trays and to place the largest/heaviest food items on the largest trays for optimal function. Because the SmartPlate has weight sensors under each tray the high accuracy and reliability of each of the trays for weight measures should not be surprising given that electronic scale technology has been around for decades and has been shown to have errors <1% in other contexts (Frija-Masson et al. 2021). Of note, because the SmartPlate is a specific plate and scale that must be purchased and used together, it is less convenient to use than image-capture technologies that assess food weight based solely on pictures and image recognition methods such as pixel counting or modelling of food shapes (Miyazaki et al. 2011;Shen et al. 2020) and therefore don't require a special plate. Additionally, the smartphone application has a free version and a more detailed subscription version, so more detailed dietary analysis may come with an ongoing cost to users. However, the use of image-only estimations of portion sizes or energy intake assessments likely comes at the expense of measurement accuracy, with recent reviews finding that image-only systems have errors as high as 57% (Lo et al. 2020;Tahir and Loo 2021). One recent study was able to obtain low errors ($1-5%) for food volume estimation (Makhsous et al. 2019) without a weight scale, but it was for only six foods and required use of a separate laser lighting system in tandem with a smartphone scan to obtain accurate measures, thereby making it similarly burdensome to the SmartPlate. Other recent studies finding errors of <10% for weight/size assessment have been conducted with small numbers of foods tested (i.e. 5-45), and all require multiple pictures from different angles (Dehais et al. 2017;Fang et al. 2015;Pouladzadeh et al. 2014;Rahman et al. 2012). Additionally, processing times for the image recognition can be long, especially when multiple pictures are taken, limiting feasibility of such models.
Alternatively, estimating portion size through use of food pictures, atlases, and reference objects may improve food portion estimation without use of advanced equipment (Amoutzopoulos et al. 2020). However, visual perception of food portions and amount eaten can be affected by the starting portion of a food, with larger starting portions often resulting in greater quantity of a food consumed (Almiron-Roig et al. 2018). Additionally, it may be difficult to notice subtle changes in food portions (especially with larger portions) through visual inspection since, according to the Weber-Fechner law, there would be a larger minimum change necessary in order to be detectable (Petzschner et al. 2015). Conversely, with smaller portions, even small absolute estimation errors may correspond to a large percentage error in recall, affecting accuracy in reporting of portion size.
While difficult to assess, it is critically important to understand portion sizes in order to be able to accurately determine energy and nutrient intake. Accordingly, food weights/sizes are very important in portion size determination in order to understand nutrient intakes, and past work has shown that use of food scales improves portion size estimation during dietary recall (Kirkpatrick et al. 2016). Recall is still variable in terms of portion estimation, and while technological and modelling advances will likely allow for improved accuracy of food weight/volume measures from a camera image or scan, at present it seems that having a weight scale or other additional equipment is needed for optimal food size accuracy. With this in mind, our study provides evidence that the SmartPlate is suitably accurate and reliable for food weight assessments to be useful in clinical or fieldbased settings and is likely to improve assessments of portion size and energy intake if used properly.
In addition to accuracy, our study also evaluated SmartPlate inter-plate and intra-plate reliability. Errors for inter-plate reliability were very similar to those of the accuracy analysis, suggesting that the SmartPlate has acceptably low error for measuring the weights of foods across different trays and plates. Intra-plate analyses reveal even better reliability (lower error), with mean absolute errors 5.1 g and mean absolute percent errors 6.2% from the top and middle of plate measurements during the initial weight validation and 1.8 g and 2.3%, respectively, for actual food measures. These findings indicate that while the SmartPlate has suitable inter-and intra-plate reliability, the SmartPlate is more reliable for repeated measures using the same plate than it is for measures compared across plates. Therefore, the SmartPlate will have highest reliability for individual tracking of dietary intake over time and slightly lower reliability if comparing across multiple users using different plates.
The second aim of our study was to evaluate the SmartPlate for food type identification. The SmartPlate performed more modestly in this area, correctly identifying foods only $42% of the time and $11-67% accuracy in any food category. However, a user should be able to determine when the image scan is incorrect and either browse a list for the correct food or enter the food name to search it in the app database. Roughly 67% of food scans were either correct or found in the two lists provided by the application, and the ability to search for foods resulted in $99% (186 out of 188) of foods being identifiable by the application. However, we acknowledge that the increased burden of searching for foods in the phone application adds time and burden to diet recording and may therefore lower compliance with using such technology to record dietary intakes.
A recent review by Tahir et al. (Tahir and Loo 2021) reported that validation studies found food recognition accuracies of 40-99%, with most studies having accuracies between 70-80%. Comparison of image recognition accuracy with previous research is complicated due to different numbers and types of potential foods available in the testing databases, food preparation, etc. but this review suggests that the SmartPlate accuracy falls close to that of most image-recognition modelling approaches when the SmartPlate lists are used to identify foods. Notably, image recognition accuracy is heavily dependent on the types of foods being tested; our study corroborates this, with whole foods (e.g. fruits, vegetables, nuts) classified with higher accuracy than foods with multiple ingredients or hidden ingredients (e.g. salad, sandwich, pasta, rice). Therefore, accuracy of this technology is likely to be dependent on the dietary patterns of the user, and this should be considering when evaluating viability of these technologies for surveillance of dietary behaviours. However, it is likely that multi-ingredient foods would also be reported less accurately on a selfreport measure, so capturing consumption of such foods remains difficult. That said, if users are willing to search lists for foods and occasionally search the SmartPlate food database to identify correct foods, our study provides evidence that the plate can be used effectively to recognise the type and weights of foods placed on the plate.
Our study also evaluated reliability of the SmartPlate in order to understand if data could be compared between users using different plates (inter-plate reliability) and within a user over time (intra-plate reliability). For traditional 24-hour recall, food frequency questionnaires, and food logs, past research has found limitations with both accuracy and reliability in dietary reporting (Biro et al. 2002). Encouragingly, the move of such tools to online and mobile platforms appears to have enhanced reliability and accuracy in reporting (Long et al. 2010). Despite past work evaluating accuracy of automatic image-recognition technologies for assessing food type, we are unaware of any past research examining reliability of such technologies. Our study of the SmartPlate provides reason for optimism, showing good inter-plate and intra-plate reliability both overall and for each of the food categories evaluated. As with the weight validation, intra-plate reliability was higher than inter-plate reliability for food type recognition, suggesting that the SmartPlate is best suited for tracking of individual dietary behaviours over time but is still acceptable for comparing a number of different people using multiple SmartPlates.
Our study has several notable strengths. First, the number of measures completed for both weight and food type classification exceeds most past research, providing confidence in the generalisability of our findings at least to the types of foods tested. Second, our use of multiple plates allowed for full assessment of accuracy and reliability, providing a complete picture of expected performance of the SmartPlate for food weight and type assessments. Third, by testing different sections of the trays, we were able to better simulate a real-world setting where foods may not be placed perfectly in the centre of each tray.
There are also several limitations to our study, and to the SmartPlate technology, which deserve mention. First, our study took place entirely in a laboratory setting and with optimal lighting conditions. It is possible that accuracy and reliability of measures will be lower in field-based settings with poorer lighting, less attention paid to placement of food on the tray, and quality of the image taken (which could be due to factors such as camera quality and/or skill of the person taking the picture). Second, while the SmartPlate works well for weighing and classifying foods before they are eaten, we did not evaluate its ability to assess remains of foods at the end of a meal. If foods are not completely consumed, it would be important to subtract the amount remaining from the initial amount for an optimal assessment of energy intake and nutritional content. Third, the SmartPlate cannot be used for measuring drink consumption, so any beverages consumed would have to be manually added for full diet tracking. Fourth, the foods in our study were reflective of a Western diet but did not incorporate ethically or culturally diverse food types that may be characteristic of certain populations, so the SmartPlate should be validated for such foods before use in populations consuming culturally diverse foods. Relatedly, our study focussed mostly on single food types rather than rice/pasta/salad dishes that would contain multiple ingredients and may be classified with lower accuracy by the SmartPlate. A user might be able to search for such foods in the SmartPlate food database or scan a food barcode (if the food is packaged) to improve accuracy in classifying these foods. Another potential limitation to the SmartPlate is its practicality, as it would need to be carried to school, work, and other settings in which a person eats, and its size could be cumbersome to transport. It also requires access to and knowledge of how to use a smartphone, precluding its use in some populations. Finally, as with any consumer technology, software in the SmartPlate is likely to get updated (and hopefully improved) over time, so it will be important for users to test the SmartPlate for their intended use to make sure it works well for their needs.
In conclusion, the SmartPlate had acceptable accuracy and both inter-plate and intra-plate reliability for the assessment of food weight and food type in a preliminary validation in a laboratory setting. Intra-plate reliability was especially strong, suggesting that the SmartPlate may be best suited to tracking of individual behaviours over time. Further work should assess accuracy and reliability of the SmartPlate for use in specific populations with special dietary intakes. Additionally, research should evaluate feasibility considerations to understand what types of individuals/ populations might be able to realistically use the SmartPlate for dietary assessment. Finally, past research has shown that using smaller utensil and bowl sizes has the capacity to assist in controlling portion sizes and aid in weight loss (Vargas-Alvarez et al. 2021). Since the SmartPlate has limited-size trays, future studies should evaluate if the tray sizes coupled with instant dietary insights from the mobile application promote portion control and help contribute to weight loss.