Video parameters for action observation training in stroke rehabilitation: a scoping review

Abstract Purpose Action observation training (AOT) is a therapeutic approach used in stroke rehabilitation. Videos form the core of AOT, and knowledge of constituent parameters is essential to make the intervention robust and generalizable. Currently, there is a dearth of available information on video parameters to be used for AOT. Our purpose was to identify and describe the parameters that constitute AOT videos for stroke rehabilitation. Method Electronic databases like PubMed, CINAHL, Scopus, Web of Science, ProQuest, and Ovid SP from inception to date according to PRISMA-ScR guidelines. Title, abstract, and full-text screening were done independently by two authors, with a third author for conflict resolution. Data on video parameters like length, quality, perspective, speed, screen size and distance, sound, and control videos were extracted. Results Seventy studies were included in this review. The most-reported parameters were video length (85.71%) and perspective of view (62.85%). Movement speed (7.14%) and sound (8.57%) were the least reported. Static landscapes or geometrical patterns were found suitable as control videos. Conclusion Most video parameters except for length and perspective of view remain underreported in AOT protocols. Future studies with better descriptions of video parameters are required for comprehensive AOT interventions and result generalisation. IMPLICATIONS FOR REHABILITATION Videos shorter than 5 min may be preferred during action observation training (AOT) intervention in post-stroke. Egocentric view may be better for upper limb dexterity function and allocentric view for gross actions like walking. Choice of video disseminating device depends on its dimension as well as observer distance. Movement speed, video sound, and quality must be considered to obtain more comprehensive AOT videos.


Introduction
Direct damage to cerebral cortex and disruption of corticospinal connections is caused due to stoke [1].This results in reduced corticospinal excitability in the affected cerebral hemisphere [1].Post-stroke brain repair is seen to occur through neuroplastic processes like axonal sprouting and re-mapping of cognitive functions [2].Neurorehabilitation relies on the capacity of the individual to "re-learn" for functional recovery to occur [2].Advances in stroke rehabilitation towards inducing these positive neuroplastic changes led to the use of priming techniques like action observation training (AOT) [3].
Research on neurophysiology suggests that observation of a novel activity and execution of the action have the same functional brain networks [4].AOT is a form of motor imagery that entails observation of goal-oriented activities, for example, forward stepping; and subsequent imitation of the same action for learning activity execution [4,5].Improvements in motor function like gross mobility, dexterity, balance, and walking capacity have been observed with post-stroke AOT [6].Additional advantages of AOT include tailoring of intervention to specific patient needs, the possibility of use for persons with severe physical limitations, and the ability of independent use by recipients [4,7].
Currently, AOT is administered as a video-based rehabilitation technique.These training videos, therefore, form the core of AOT intervention [4,6].An in-depth knowledge of constituent parameters like length and quality of videos, dimensions of disseminating devices, the form of control used, perspective of view, movement speed, and audio assistance is essential for acquiring comprehensive and reproducible videos that may benefit clinical practice as well as research.Early research has briefly touched upon intervention duration and the need for use of different movement speeds and perspectives of movement visualisation [4,7].A possible reason for the disparity in the videos used in existing AOT protocols could be the use of varied combinations of the aforementioned parameters [6].A recent review by Ryan et al. emphasised this existing heterogeneity of interventions while simultaneously pointing out that a consensus on optimal parameters for AOT implementation is yet to be achieved [7].
This heterogeneity of intervention videos makes it difficult to compare the research evidence, in addition to creating barriers for use of AOT in clinical practice.Consequently, it has become crucial to collate information on relevant AOT video parameters for stroke rehabilitation.Thus, the purpose of this review was to identify the parameters that are currently used to create AOT videos, the extent to which they have been reported, and to detect the gaps in information regarding the same in current protocols.

Methods
The authors used the scoping review methodology to determine the extent of information available on the video components constituting AOT videos for stroke rehabilitation.The Arksey and O'Malley framework, expanded upon by Levac et al. was used as a guide [8,9]; and the review process was pursuant to the PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines [10].

Research question
The overarching question of this review was as follows: What parameters constitute the core of AOT videos used in stroke rehabilitation?This broad research question was further divided into (a) Which video parameters are commonly reported in AOT protocols?(b) How well are video parameters described in AOT interventions?(c) What form of control group videos are reported for AOT studies in the stroke population?

Identification of studies
A comprehensive search strategy was created to search the available literature.Electronic databases (PubMed/MEDLINE, CINAHL, Scopus, Web of Science, Ovid SP, and ProQuest) were searched with various combinations of keywords and subject headings for stroke, action observation, and video parameters using Boolean operators AND, OR, and NOT as and when applicable (see Supplementary Data).
The initial search was run in January 2020 and updated on 7 October 2022.All searches were limited to human studies on the adult stroke population published in the English language.Articles in other languages were excluded as the authors' proficiency was limited to English.The identified records were extracted to the reference management software Zotero for the removal of duplicate records from multiple databases.

Selection of studies
Titles and abstracts of all unique records were scrutinised to select relevant articles for full-text screening.Randomised controlled trials (RCTs), non-RCTs (NRCTs), pre-post, single subject, cross-sectional studies, case reports, and study protocols on stroke rehabilitation were included.The criteria for inclusion were any original research focused on action observation as the sole intervention compared against a control or as a co-intervention modality.Secondary research such as narrative reviews, systematic reviews, and meta-analyses was excluded.
Two authors (AB and PR) individually scrutinised the titles and abstracts based on the inclusion criteria.A subsequent review of all relevant full-text records was done individually by the same two authors (AB and PR) and any conflict arising was resolved by a third reviewer (NM).Rayyan Systems Inc. software was used to screen and select articles [11].Figure 1 illustrates the steps involved in the study selection process (Figure 1).

Data extraction
A data charting form was created to extract information on exercises used during therapy, mode of video dissemination, video parameters considered by authors of the primary studies, and the gaps existing in current research in that domain.The data charting form was updated throughout the review process to include relevant information, as is the norm with scoping review methodology.Authors AB and PR independently undertook data extraction in duplicate, while the accuracy of the extraction process and consistency of data was cross checked by NM (Supplementary Data).

Synthesis of results
The results were reported as a descriptive numerical summary that highlighted the number and type of studies included, and the frequency of video parameters reported.The video parameters to be studied were identified by the authors a priori and the description of each parameter is provided in Table 1.

Results
A total of 37,793 records were obtained from the databases.After de-duplication 24,447 unique records were identified and their titles and abstracts were screened.Subsequently, 134 full-text articles were reviewed for eligibility and 70 studies were included in this review.

Video parameters considered in AOT interventions
Most studies (92.8%) included in this review have reported video parameters as a part of their protocol or intervention.Five studies mention action observation, but do not describe any video constituents used [24,52,53,62,78].The frequency of various video parameters reported in the included studies can be inferred from Figure 2.
used 105 eight-second-long clips, bringing their total video length to 14 min [76]; while Franceschini et al. used 40 four-second videos repeated over three sets for a total of 8 min [75].

Perspective of view
AOT videos display movements using either first-or third-person perspective or a combination of the two.First-person or egocentric perspective is the view wherein the recipient's limbs are anatomically matched with the model's limbs in the video (for example, a person with right hemiparesis watches videos of models moving their right arm/leg, giving them a sense of performing the movement themselves).Third-person or allocentric perspective is where the observer watches a mirror image of themselves in the videos (for example, a person with right hemiparesis watches videos showing models moving their left arm/leg, as if they are looking at a mirror).Perspective of view, mentioned as either camera positioning with respect to models seen in the videos or direct terms like egocentric and allocentric, was noted in 44 included studies [ [54,66], and two NRCTs [49,63].

Video quality
Seven studies described the quality of videos used in their interventions [40,45,46,57,70,74,75].Two studies used a video resolution of 1080 pixels ( [45,75].Oh et al. mention their videos being run at 30 frames per second (fps) [40].Choi et al. reported that their videos comprised 920 × 1080 pixels with a vertical refresh rate of 60 Hz [46] while Marangon et al. used a superior resolution of 1280 × 1024 pixels with a refresh frequency of 75 Hz [57].The latter gave a more detailed account of their video quality in terms of animation effects ("series of single frames, each lasting 30 s").These frames had a resolution of 720 × 576 pixels, a color depth of 24 bits, and a frame rate of 30fps.Lastly, Hsieh et al. used a webcam having a resolution of 1280 × 720 with a 60° field of view [74]; while Huang et al. provided AOT in combination with VR via HMD having a resolution of 1440 × 1600 resolution per eye [70].

Sound
Auditory assistance during video observation was mentioned in two RCTs [16,35], three pre-post studies [50,63,70], and a single-subject experimental study [67].The type of audio used included: video sound to supplement the intervention [67], "synchronized voice prompt" along with the AOT actions [50], voice instructions and movement narration, and a metronome guide via smartphone applications for initiating stepping [67,70].Finally, Motaqhey et al. specifically mention using "silent clips" in their intervention, implying the lack of auditory assist but the authors give no further explanation [63].

Speed of movement
Five studies described the speed of movements shown in AOT videos [14,22,35,36,47].While majority of videos were displayed using a combination of normal and half of normal speed of motion [14,35,36,47], Kim and Kim mention the use of fast motion, though the details regarding the degree of fast motion are not known [22].It is interesting to note that only Park HJ et al. and Park HR et al. have explained that "speed" mentioned refers to the filming speeds and not the speed at which the movements were attempted [35,36].
On the other hand, three RCTs [17,19,79] displayed sequences of geometric patterns, alphabets, and numerals as control videos while two pre-post studies on hemiparetic upper extremity function, presented images of a still hand [55,56].An EEG study looking at MNS activation used non-biological movement in the form of a rolling ball as the control condition [72].Lastly, Franceschini et al. used static pictures of nonbiological objects [18], Kim JS et al. a relaxation program consisting of a 10-min stretching routine [22], and Kang et al. a "comfort video" devoid of human movements [50].

Discussion
This review sought to synthesise the evidence on parameters of AOT videos used in post-stroke rehabilitation.As many as eight parameters were identified in this review.While parameter like the video length was reported in about 86% of the included evidence, information on others like speed of movement, video quality, and auditory assistance was scarce.
One reason for maximal reporting of video length used in AOT interventions could be that it aided in the quantification of the total intervention duration.Brennan et al. proposed that videos as short as 2.5 min were effective in the provision and retention of knowledge in older adults having a mean age of 63 years [83].This may be the reason for use of short individual clips during therapy.Among the earliest work on AOT in stroke rehabilitation, Buccino et al. have previously endorsed the use of 3-minute video clips for the observation phase of AOT [4,47].Mentioning individual clip duration makes it easier to quantify the number of repetitions of each activity observed during the intervention.However, inconsistency persists regarding the optimal video length required for AOT.
Coming to the perspective of view, the use of first-person-view or egocentric perspective during AOT is preferred because it ensures higher corticomotor excitability [15,21,57].Additionally, this view provides the impression of observers' performing the movement themselves, as the orientation of the model and movement direction are the same [20,21,58].This coherence facilitates movement understanding, especially for learning complex tasks requiring dexterity.Conversely, lower limb AOT research has used an allocentric perspective of view, perhaps owing to the gross nature of lower limb movements and low skill requirement.It may, thus, be prudent in these cases, for ease of video recording and better visualisation of joint movements during gait and balance activities.
Parameters pertaining to the video viewing device like their dimensions and distance from the recipients are important considerations for AOT.Aging influences visual acuity, accommodation, and capacity for complex visual tasks [84].Hashimoto et al. claim that larger screen sizes ensure maintenance of older adults' attention towards the video content [84].Although there exists a trend favouring shorter screen-to-eye distance for smaller-sized screens, the optimum distance was found to be around 165-200 cm irrespective of the screen size used [85].However, this research was done on normal individuals and the inference may not be extrapolated for stroke patients.AOT studies on the stroke population have used a lesser distance (average of 100 cm) which may be to emphasise the primary object in the videos more clearly and maintain participants' attention towards the desired activity.This has been corroborated in previous research where closer shots were required for smaller objects like humans in the frame [86].It may be prudent to use bigger screens in case of stroke rehabilitation to ensure better attention to training and clarity of visualisation.
The opinion on the usefulness of sound in AOT is divided in the existing literature.Motaqhey et al. emphasised the use of silent videos to focus on stimulation of the visual sensation alone [63], while others believe that auditory cueing, either therapist-guided or cueing within the videos, resulted in better movement understanding and execution [16,35,50].Evidence suggests that the use of "musical rhythm' results in meaningful motor improvements, owing to the incorporation of multiple senses and establishing a tempo that is crucial for motor execution [16,87].However, many studies mention auditory cueing as a form of instructions given to the participants during AOT, apart from Huang et al. and Jung et al. who fail to report on the same [67,70].
Speed of movement is another parameter considered for showing activities during AOT.The rationale behind playing movements at half-speed might be to improve attention and executive processing in individuals with brain damage and to aid in the registration of the correct motor patterns [14,35].Research on normal individuals has elucidated that although cortical excitability increases on observation of fast movements, there essentially is no difference in results when mere playback speed is altered [88].This could be a factor for use of normal speed of movement in most AOT studies.We believe that unless the activities are performed at different speeds while recording, it may not be necessary to process the videos to play at different speeds to yield better results.
Video quality, although an important consideration for all video-based therapies, remains an underreported parameter in AOT.Studies that reported video quality mention using 30 frames-per-second, which is as per television streaming standards [89].The resolution of videos was kept high since the studies projected their videos on monitors of liquid crystal displays.Visual perception of a video depends on factors like output resolution and screen size, in addition to other technical factors [90].Since a majority of researchers have not focused on explaining video resolution, it is difficult to comment upon the clarity of their intervention videos and the effect on recipients' visual perception.
Finally, as all AOT RCTs require a suitable placebo, the type of videos to be observed by the control group must be curated carefully.The basic objective of using videos for control groups is to maintain the homogeneity of intervention [91].AOT aims at regaining motor function through mirror neuron system (MNS) activation and any kind of MNS activity while observing control videos is undesirable.Since biological movements could have an unknown effect on the MNS and hinder the results of therapy, studies have ensured the absence of biological (human or animal) components in those videos [18,20,35,36,76].Evidence suggests that even untrained activity or motion can activate the MNS, which justifies the use of geometric patterns, letters, or symbols that do not alter MNS activity [17,79].
Quality assessment of the included studies was not undertaken as is the norm with scoping reviews.To the best of our knowledge, this is the first review of its kind to delve into the need for acquiring extensive information on parameters constituting the core of AOT for stroke rehabilitation.We believe that further clarity on these parameters will lead to crisper and concise interventions that will benefit therapists and recipients alike.Moreover, standardisation of AOT video structure may render the results of future trials more robust and generalisable.Certain parameters like the use of transitive or intransitive actions, testing of attention, and segmentation of videos according to task complexity were not considered in our review which could be considered for future research.

Conclusion
Video lengths ranging from 1 to 30 min and egocentric views were the commonly reported parameters in AOT studies, while most of the other parameters remain underreported.Future studies with comprehensive descriptions of video parameters are required for making AOT intervention robust and generalisable.Furthermore, AOT trials could explore the effects of different video parameters on patient-related outcomes.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Table 1 .
Definition of video parameters considered during aot video construction.Video parameter Description Video length time duration of the videos that the participants observed in one session.this could be individual video clips in seconds as used in single-session experimental studies or total aot video duration in intervention programs Perspective of view the angle from which the movement was recorded/ shown in the videos, either mentioned indirectly as the camera angle i.e., front, back, sideways, or directly as allocentric (third-person perspective) or egocentric (first-person perspective) screen size Dimensions of the device that displayed the videos during the intervention.any description of the size of the screen or device is accounted for (e.g., computer screen, tablet, mobile phone, etc.) screen distance Distance of the video displaying device from the eye level of the participant Control videos any description of the content used as control videos during the intervention sound any form of audio aid used in addition to the video clips in the intervention videos speed of movement the speed at which the movements in the videos are played.it could be the speed at which the movements are performed by the models in the video or the speed at which the videos are processed for the participants to watch during the intervention Video quality the quality of videos in the form of pixels, refresh frequency, animation effects, or any other descriptor of the same

Figure 2 .
Figure 2. Video parameters reported in action observation training studies.

Table 2 .
Video parameters described in aot studies.