Iterative Learning-Based Robotic Controller With Prescribed Human–Robot Interaction Force

In this article, an iterative-learning-based robotic controller is developed, which aims at providing a prescribed assistance or resistance force to the human user. In the proposed controller, the characteristic parameter of the human upper limb movement is first learned by the robot using the measurable interaction force, a recursive least square (RLS)-based estimator, and the Adam optimization method. Then, the desired trajectory of the robot can be obtained, tracking which the robot can supply the human’s upper limb with a prescribed interaction force. Using this controller, the robot automatically adjusts its reference trajectory to embrace the differences between different human users with diverse degrees of upper limb movement characteristics. By designing a performance index in the form of interaction force integral, potential adverse effects caused by the time-related uncertainty during the learning process can be addressed. The experimental results demonstrate the effectiveness of the proposed method in supplying the prescribed interaction force to the human user. Note to Practitioners—This article concentrates on developing a novel control technique to make the robot supply a prescribed interaction force to the human user in the presence of time-related uncertainties. The proposed control method is applicable to various scenarios of the human–robot interaction, e.g., it can be used for rehabilitation robots to provide assistive or resistive force to stroke patients or for exoskeleton robots to provide assistive force to human users for completing heavy-load tasks. Moreover, the desired interaction force can be tailored for different human users according to their needs and different task objectives. Consequently, the proposed controller can serve diverse users and has a promising perspective in automation.

Abstract-In this paper, an iterative learning-based robotic controller is developed, which aims at providing a prescribed assistance or resistance force to the human user.In the proposed controller, the characteristic parameter of the human upper limb movement is first learned by the robot using the measurable interaction force, a recursive least square (RLS)-based estimator, and the Adam optimization method.Then the desired trajectory of the robot can be obtained, tracking which the robot can supply the human's upper limb with a prescribed interaction force.Using this controller, the robot automatically adjusts its reference trajectory to embrace the differences between different human users with diverse degrees of upper limb movement characteristics.By designing a performance index in the form of interaction force integral, potential adverse effects caused by the time-related uncertainty during the learning process can be addressed.Experimental results demonstrate the effectiveness of the proposed method in supplying the prescribed interaction force to the human user.
Note to Practitioners -This paper concentrates on developing a novel control technique to make the robot supply a prescribed interaction force to the human user in the presence of timerelated uncertainties.The proposed control method is applicable to various scenarios of human-robot interaction, e.g. it can be used for rehabilitation robots to provide assistive or resistive force to stroke patients or for exoskeleton robots to provide assistive force to human users for completing heavy-load tasks.Moreover, the desired interaction force can be tailored for different human users according to their needs and different task objectives.Consequently, the proposed controller can serve for diverse users and has a promising perspective in automation.
Index Terms-Robotic controller; human-robot interaction; force tracking; iterative learning.

I. INTRODUCTION
I n recent decades, considering human participation has become prevailing in the robotics research due to the complementary strengths of intelligence and flexibility of the human, and high efficiency and repeatability of the robot [1]- [3].The common scenarios of human-robot interaction include rehabilitation [4], small-batch manufacturing [5], training [6], surgery [7], etc.Within this context, it is vital to develop The work was supported by the UK EPSRC grant EP/T006951/1.(Corresponding author: Yanan Li.) Xueyan Xing is with the Department of Engineering and Design, University of Sussex, Brighton BN1 9RH, UK. (email: xx92@sussex.ac.uk)Kamran Maqsood is with the Department of Engineering and Design, University of Sussex, Brighton BN1 9RH, UK. (email: k.maqsood@sussex.ac.uk)Deqing Huang is with the School of Electrical Engineering, Southwest Jiaotong University, Chengdu, P. R. China.(email: elehd@home.swjtu.edu.cn)Chenguang Yang is with Bristol Robotics Laboratory, University of the West of England, Bristol, BS16 1QY, UK. (email: cyang@ieee.org)Yanan Li is with the Department of Engineering and Design, University of Sussex, Brighton BN1 9RH, UK. (email: yl557@sussex.ac.uk) efficient robotic controllers to enable robots to cater to various human needs [8]- [10].
The robotic control objectives for human-robot interaction can be generally classified into two types: trajectory tracking and force tracking.In some scenarios, such as the early stage of rehabilitation training, the robotic controller is required to guide the human arm to move along a desired trajectory.In [11], regarding the force applied by the patient as a disturbance, a nonlinear sliding mode controller is designed to make the upper limb exoskeleton robot track a given trajectory.Based on [11], a hybrid control strategy combing PID control and sliding mode control is developed in [12] for the upper limb exoskeleton robot to help patients with improved trajectory tracking performance in the presence of environmental disturbances.In [13], using the physically interactive trajectory deformation and traditional impedance control, both the desired and actual trajectories of the robot are modulated to comply with the human intention without constant human guidance.
Compared with trajectory tracking, force tracking also plays a significant role in human-robot interaction in many medical and industrial applications.For instance, in the latter stage of rehabilitation training, the patient is encouraged to be actively involved and the rehabilitation robot is expected to provide an assistive or resistive force for the impaired limb.Besides, the upper limb exoskeleton robot, which is used for human assistance in some heavy-load industrial tasks, needs to adapt its movement to provide the human limb with a prescribed interaction torque [14].In [15], an admittance control is proposed based on an established Denavit-Hartenburg coordinate frame to provide proper assistance for the limb according to the need of the patient.Considering the non-parametric uncertainty and the position constraint, an iterative control with varying parameters is developed in [16] to supply a prescribed force to the impaired upper limb.In these works, an assistive/resistive force is provided by the robot to the human with a predefined reference trajectory.However, a fixed reference trajectory may be in conflict with different patients' movement characteristics, which reduces the initiative of human participation and thus decreases the interaction effect as anticipated.Therefore, it is essential to develop a robotic controller that ensures the desired interaction performance (usually defined by a desired interaction force) without a predefined reference trajectory.
Iterative learning control (ILC) is an effective technique used for achieving desired tracking performance of the control system with uncertain dynamics by repetitive operation [17]- [19].It can be considered as a multipass control process where the transient performance is repetitively improved over fixed iteration durations.To overcome the limitation of fixed learning periods, the ILC approach is further improved for various control systems with time-varying trial lengths [20]- [22].Since ILC is independent of the dynamic model of the control system, it has been widely used for the humanrobot interaction control design [23]- [25].Based on ILC, the robot can adapt its trajectory in [26] by minimizing the interaction force error so that it can correctly infer the human intention and efficiently cooperate with the human user.In [27], an iterative learning scheme is developed, with which the robot can learn the desired trajectory of the human user in the presence of uncertain iteration periods.The methods in [26], [27] update the robot's parameters point to point so are inherently limited by the capability to deal with uncertainties.
In view of the need for achieving a given interaction force, a novel robotic controller is developed in this paper to provide the human user with a prescribed interaction force, without the need to predefine a fixed reference trajectory.Using the measured interaction force, a recursive least squares (RLS)-based estimator and the Adam optimization method, the robot adjusts its controller iteratively according to the human user's dynamic characteristic parameter, so that the prescribed desired force between the robot and the human user can be achieved.As the proposed controller does not presume a fixed reference trajectory, it allows the human user to be actively involved in the interaction task with his/her own movement pattern.Before estimating the desired trajectory of the robot, the dynamic characteristic parameter of the human user is first estimated, so the proposed iterative learning-based robotic controller enables the robot to provide prescribed interaction forces to different human users without knowledge of their dynamics.
Besides, due to human-in-the-loop in human-robot interaction, uncertain human movements have to be taken into account when designing the robotic controller.The traditional ILC applied to the human-robot interaction generally updates the reference trajectory of the robot in a point-to-point manner [28]- [30] so that the learning speed of the robot, i.e. the iteration period, needs to be fixed and the signal used for updating, i.e. the interaction force, needs to be free from uncertainty to ensure the good learning performance.However, these two conditions are hard to be met in practice.The human user is subjected to hand tremor or environmental disturbance, so the interaction force signal is typically noisy.Moreover, the learning speed is also hard to keep constant due to the varying speed of the human user.Motivated by these considerations, the performance index function in an integral form is used for iterative learning and our proposed iterative learning-based robotic controller updates the reference trajectory of the robot in a cycle-to-cycle manner instead of the traditional pointto-point manner in [28], [29].With our proposed controller, potential adverse effects caused by the time-related uncertainty during the movement process (e.g.due to hand tremor and varying learning speed) are avoided.As a result, the proposed control method has better control performance and robustness compared with the traditional methods in [28], [29], which is verified by experiments.
The rest of this paper is organized as follows.Section II gives a dynamic model of human-robot interaction and formulates the control problem studied in this paper.In Section III, an iterative learning-based robotic controller is developed to provide the human user with a desired force.Simulation results are presented in Section IV, and experimental results and analysis are given in Section V. Brief concluding remarks and future work are listed in Section VI.

II. PROBLEM FORMULATION
In this section, the dynamic model of the human-robot interaction system is given and the objective of interaction force control is briefly introduced in order to draw forth the proposed learning controller in Section III.

A. Dynamic Model
In rehabilitation of human patients with motor dysfunction due to, e.g.stroke and cerebral, the human patients cannot complete a (repetitive) training task on their own.In a repetitive industrial process, e.g.loading and off-loading, human operators can save effort and improve working efficiency with the help of the robot.In tasks alike, it is essential for the robot to provide proper assistive/resistive force/torque to the human.
As shown in Fig. 1, we consider a simplified human-robot interaction scenario, where a human holds the handle of a planar robot and moves his/her upper limb to complete a certain task.The task can be defined as following a trajectory or reaching a target position.Without loss of generality, the human user is allowed to follow his/her own motion planning and move freely in a n-dimensional space.The robot's actual trajectory is x(t) ∈ R n , its reference trajectory is x r (t) ∈ R n , the human's desired trajectory is x h (t) ∈ R n and x d (t) ∈ R n is the robot's desired trajectory that will be defined later.A n-degree-of-freedom (n-DOF) robot interacting with a human's upper limb is modeled as M (q (t)) q (t) +C (q (t) , q (t)) q (t) + G (q (t)) +D ( q (t)) where q (t) ∈ R n denotes the robot's joint position vector; M (q (t)) ∈ R n×n is the symmetric positive definite inertia matrix; C (q (t) , q (t)) ∈ R n×n is the centrifugal force matrix; G (q (t)) ∈ R n represents the force vector caused by the gravity; D ( q (t)) ∈ R n is the friction term; F (t) ∈ R n is the robot's control input vector; J (t) ∈ R n×n is the ndimensional Jacobian matrix; f h (t) ∈ R n is the interaction force between the robot and human.
In order to facilitate the following analysis, the dynamics of the robot in Eq. ( 1) is rewritten in Cartesian space as the following where x (t) denotes the robot trajectory in Cartesian space and The interaction force that can be measured using a force sensor can be modeled as [31] where k h ∈ R n×n is the non-singular stiffness matrix of the human arm.Note that both k h and x h (t) are unknown to the robot and they are not used in the robot's controller.The model Eq. ( 8) will be used for analysis purpose only.

B. Problem Statement
For different humans or the same human in different tasks, they may move their limbs in different ways even if they are asked to follow the same trajectory or reach the same target position.In this regard, a predefined reference trajectory for the robot cannot ensure desired interaction.In the following, we formulate this problem by defining the prescribed interaction with a desired interaction force f d (t) ∈ R n , which can be set according to different individual needs.Then, we explain how the robot should automatically update its reference x r (t) to achieve f d (t).
The desired interaction force set by the designer (and thus known to the robot) can be modeled as where x d (t) is the desired trajectory corresponding to f d (t).
It is noted that Eq. ( 9) is similar to Eq. ( 8), with x(t) replaced by x d (t).In other words, Eq. ( 8) is a general model of interaction force f (t) while Eq. ( 9) is a special case when the robot's trajectory is x d (t) yielding a desired force f d (t).Observing Eq. ( 9), x d (t) has to vary with x h (t) to attain the desired force f d (t).If the desired trajectory x d (t) can be obtained by the robot, it can provide the desired interaction force to the human by letting x r (t) = x d (t).However, x d cannot be directly calculated based on Eq. ( 9) since human's parameters k h and x h are unknown.Therefore, in this paper we will propose an iterative learning method to make the robot learn the desired trajectory x d (t) and then set the robot's reference trajectory x r (t) as x d (t) so that the prescribed desired force f d (t) can be obtained.
The proposed learning controller is divided into two parts: first, the characteristic parameter of the human limb is obtained by using an RLS-based estimator and Adam optimization method; second, in the same way, the robot's desired trajectory x d (t) is estimated, with which the robot provides a desired interaction force.
Remark 1: The aforementioned characteristic parameter denotes the parameter that determines the human's trajectory.It is unique and different for different humans and varies with different tasks.For instance, for the minimum jerk model given in Eq. ( 39) in Section IV, the movement duration is the characteristic parameter of the human's trajectory.

III. ITERATIVE LEARNING-BASED ROBOTIC CONTROLLER
In this section, an iterative learning-based robotic controller is developed to provide the human with a certain desired interaction force known by the robot.

A. Robot Controller Design
A robot controller combining feedforward and feedback controls is given as follows where f m (t) is the feedforward component and f n (t) is the feedback component that are respectively proposed as with the tracking error defined as and the positive control parameters c d and k d that can be also regarded as the equivalent damping and stiffness of the impedance control model [28].Multiplying Eq. ( 2) by m −1 (t) yields Then substituting Eqs. ( 10)-( 12) into Eq.( 14), the impedance control model can be obtained as where f e (t) is the force tracking error defined as

B. Iterative Learning
In this section, an iterative learning method is proposed to adapt the robot controller to the needs of different humans.
Using Eq. ( 9), x d (t) can be expressed as If k h and x h (t) in Eq. ( 17) are available, x d (t) can be easily obtained, following which f d (t) can be achieved by the robot.However, due to the limitation of ILC which will be mentioned in Assumption 1 given later, k h and x h (t) cannot be simultaneously estimated and they need to be estimated separately.In view of this, Section III-A-1) mainly focuses on estimating the characteristic parameter of the limb defined in x h (t) and the estimation of x d (t) will be given in Section III-B-2).

1) Estimation of Characteristic Parameter of Upper Limb:
The first step of iterative learning is to estimate the characteristic parameter of the upper limb using an RLS-based estimator and the Adam optimization method. Denote where η h ∈ R n is a vector representing the characteristic parameter of the upper limb.
Setting f d (t) = 0 and examining Eq. ( 17) yield x d (t) = x h (t).Define η d ∈ R n as the target parameter that needs to be learned by the robot and then x d (t) can be parameterized by For η d -learning, let the reference trajectory of the robot Then the robot's reference trajectory used in the jth iteration is denoted by x r,j (t, Substituting Eqs. ( 8) and ( 9) into Eq.( 16) yields From Eq. ( 18), it can be observed that when the tracking error e (t) is regulated to be small enough using the controller proposed in Eq. ( 10) with properly selected control parameters c d and k d , f e (t) can be regarded as an η-related function denoted by f e (t, η).Similarly, f h (t) can also be denoted as f h (t, η) since it is an x (t)-related function that is determined by x r (t, η).If the force error f e (t, η) decreases to zero in the case of f d (t) = 0, x h (t, η h ) = x r (t, η) approximately holds due to Eqs. ( 17) and ( 18) and the corresponding value of η at this moment can be regarded as being equivalent to η d .As a result, f d (t) = 0 is the objective for η d -learning.
To reduce the undesirable effect caused by the time-related uncertainty, such as the hand tremble of humans, a parameterized performance index function H (η) ∈ R + is defined in an integral form as and then the learning task in this paper can be mathematically generalized as Remark 2: In this part, to estimate the characteristic parameter of the upper limb, the prescribed interaction force is set as zero, i.e. f d (t) = 0, such that f e (t, η) = f h (t, η) holds.If H (η) is minimized by making η equal to η d , f e (t, η), which is also f h (t, η), equals to zero.As a result, when the limb moves in this force-free condition, η d is learned by the robot.
Remark 3: Since H (η) is in an integral form in this paper, the adverse effect caused by the time-related uncertainty, such as the hand tremble occurring in all iteration periods with similar amplitude and frequency, can be attenuated, which endows the proposed controller with better robustness and stability.
Based on Eq. ( 19), the performance index function H (η) used for the jth interaction is given as where f e,j (t, holds in this part and f h,j (t, η j ) is obtained by making the robot track its reference trajectory x r,j (t, η j ).
After designing the performance index function, the RLSbased estimator will be designed.Minimizing the performance index function described in Eq. ( 20) can be viewed as an optimization problem which has been already well covered in the literature.However, most of the optimization methods require the gradient of the index function [32]- [34] to iteratively update η j .This part introduces an RLS-based estimator that has the same structure as the one designed in [35] to estimate the gradient of H (η j ) and the Adam optimization method is applied to iteratively find the optimal solution of Eq. (20).
Before introducing the estimator, the following two assumptions are needed to guarantee its performance.
Assumption 1: The initial trajectory tracking error e (0) is irrelevant to iterations and the performance index can be expressed by where χ (η) ∈ R m is the regressor vector and σ ′ ∈ R m is a coefficient vector.Assumption 2: The designed performance index H (η) in Eq. ( 19) is differentiable and convex and there exists a positive constant ς making its gradient Lipschitz continuous and satisfy where With Assumptions 1 and 2, an RLS-based estimator is proposed to estimate H (η j ), which is re-expressed as an explicit function of η j , so that its gradient can be directly calculated.
The estimator used in the jth iteration is given as follows Ĥ (η j ) = χ T (η j ) σ j (24) where χ (η j ) ∈ R m is the regressor vector for the jth iteration and σ j ∈ R m is obtained using RLS with a positive constant forgetting factor ε as the following where ε ≤ 1 is a constant, κ j ∈ R m×m is the inverse of the weighted sample covariance matrix whose initial value is positive definite.
Then the following theorem holds.Theorem 1: If Assumptions 1-2 are satisfied, the RLSbased estimator in Eqs. ( 24)-( 26) can estimate H (η j ) and its gradient when the number of iterations is large enough, which means that equations lim j→∞ Ĥ (η j ) − H (η j ) = 0 and The proof of Theorem 1 can be found in Appendix.In the RLS-based estimator in Eqs. ( 24)-( 26), χ (η j ) should be properly selected according to the form of the designed performance index function.
The gradient of Ĥ (η j ) can be calculated as Using the estimated gradient in Eq. ( 27), the Adam optimization method [36] is used as follows to find the optimal η by iteratively minimizing H (η) where ω j ∈ R n and ϱ j ∈ R are moment vector and constant, respectively updated by positive constant decay rates p ω and p ϱ ; ωj and ρj are bias-corrected estimations of ω j and ϱ j ; γ is a positive constant to avoid singularity; υ is a positive constant representing the learning rate.
Compared with the conventional gradient descent method, the Adam optimization method has faster convergence and better learning performance [36], [37].Moreover, it can effectively avoid the undesirable problems existing in the conventional gradient descent method, such as decay of learning rate.With the Adam method in Eqs. ( 28)-( 32) whose convergence analysis is given in detail in [37], H (η) can be iteratively minimized to zero and lim j→∞ η j can be regarded as the optimal solution of Eq. ( 20).Then we have ηd = ηh = lim j→∞ η j (33) where ηh = ηd is the estimation of the learning parameter η d , i.e. the characteristic parameter of the limb η h .With ηh , H ( ηh ) is reduced to a small neighborhood of zero such that ηh = η d = η h and x h (t, η h ) = x h (t, ηh ) approximately hold, which means that xh (t, ηh ) can fully reflect the limb's movement characteristics.
2) Estimation of Desired Trajectory of Robot: With estimation result ηh in Eq. ( 33) and the estimation method in Section III-B-1), the desired trajectory x d (t) of the robot is then estimated to make the robot reach the prescribed interaction force.
According to Eq. ( 17), the desired trajectory is rewritten as where is unknown to the robot so it needs to be estimated, and ] T is defined using ηh .
To learn the parameter µ h of x d (t, µ h ) in Eq. ( 34), the reference trajectory of the robot is redefined as which is denoted by in the jth iteration.This part intends to estimate µ h in Eq. (34) with which the desired trajectory x d (t, µ h ) is available.Particularly, the prescribed interaction force allows to be time-varying which is in line with various needs in different tasks.
To estimate µ h , the learning parameter should be redesigned as where µ i h is the ith diagonal element of µ h whose off-diagonal elements can be regarded as zero, and the vector of iteration variable η j in x r,j (t, η j ) needs to be updated to approach η d in each iteration.Observing Eq. ( 18), as f h (t) → f d (t) and f e (t) → 0 are realized by iterative learning with small tracking error e (t), x r (t, η) → x d (t, η d ) and η → η d are satisfied, which implies that the desired trajectory will be learned by minimizing H (η j ) in Eq. ( 21) as described by Eq. (20).As a result, the design of the performance index function used in each iteration, the RLS-based estimator, and the Adam optimization method in this part are the same as the ones in Eq. ( 21), Eqs. ( 24)-( 26), and Eqs. ( 28)- (32) in Section III-B-1).
Remark 4: The difference between estimating η d = η h in Section III-B-1) and 2) is that the definition of x r,j (t, η j ) is different.To estimate η h that represents the characteristics of the upper limb, x r,j (t, η j ) is defined to have a similar form as x h (t, η h ) in Section III-B-1).To estimate µ h , aiming at obtaining the desired trajectory, x r,j (t, η j ) defined in Eq. ( 36) has the same form as x d (t, µ h ) in Eq. (34).By tracking different x r,j (t, η j ), different f h,j (t, η j ) is achieved to calculate the performance index H (η j ) that can correctly reflect the estimation effect to find the optimal estimation values.Denote in this part and then x d (t, µ h ) can be accurately estimated by where x r,j (t, η j ) is defined using x i r,j (t, η j ) in Eq. ( 36).Since x d (t, µ h ) is effectively estimated in Eq. ( 38), the robot can supply the human with a prescribed interaction force by tracking xd (t, μh ).Fig. 2 reflects the realization of the iterative learning process, which can be regarded as the realization of the highlevel controller of the human-robot interaction system.The realization of the iterative learning process can be concluded into two main steps: Step 1 (learning the characteristic parameter of the upper limb): Set f d (t) = 0 and iteratively estimate η h using the performance index function H (η j ) in Eq. ( 21) that is obtained by making the robot track x r,j (t, η j ), the RLS-based estimator in Eqs. ( 24)- (26), and the Adam optimization method in Eqs. ( 28)- (32).As j → ∞, the value of η j in Eq. ( 32) is the estimation value of η h and denoted by ηh that is obtained by Eq. (33).
Step 2 (learning the trajectory to provide the human with prescribed interaction force): Set f d (t) as the prescribed interaction force and iteratively estimate µ h that is regarded as a diagonal matrix with ηh estimated in Step 1, the performance index function in Eq. ( 21) obtained by making the robot track x r,j (t, η j ) defined in Eq. ( 36), the RLS-based estimator in Eqs. ( 24)- (26), and the Adam optimization method in Eqs. ( 28)- (32).When j → ∞, the elements of η j in Eq. ( 32) can be considered as the estimation of diagonal elements of µ h and denoted by μh .Then the estimated desired trajectory xd (t, μh ) = lim j→∞ x r,j (t, η j ) is achieved using Eq.(38).
Let the robot iteratively follow x r,j (t, η j ) and eventually the robot reaches the prescribed interaction force by iteratively interacting with the human.
Fig. 3 illustrates the proposed iterative learning-based robotic controller, which includes both the high-level controller that generates the reference trajectory of the robot and the low-level controller that makes the robot follow the reference trajectory.
Remark 5: For each human, x d (t, µ h ) only needs to be estimated once in each session of the movement process.Once the robot learns the desired trajectory, xd (t, μh ) will be recorded and the robot just needs to track xd (t, μh ) until a new prescribed interaction force is set.
Remark 6: Since the learning method in this paper can be regarded as an optimization problem, even if ηh significantly deviates from the actual value, we can still find corresponding μh and xd (t, μh ) to let the robot provide the desired interaction force with a deviation.In practice, due to the physical limits such as the limb length and movement speed of the limb, the learning error is generally limited.
In the proposed strategy, the desired force f d (t) in Eq. ( 21) should be set according to the specific need of the human user and is usually different for different individuals.Since only the interaction force f h (t) and the current position of the system x (t) are needed for the robotic controller, which can be respectively measured using a force sensor and a position sensor, the developed iterative learning-based robotic controller is easy to implement and is robust against human's uncertainty as discussed in Remark 3.

IV. SIMULATIONS
To validate the proposed iterative learning-based robotic controller, a scenario of latter stage rehabilitation using a 1-DOF rehabilitation robot is simulated in this section, where n = 1 and the robot is expected to offer a resistive interaction force to the human.The human's trajectory x h (t, η h ) is defined using the following minimum jerk model [38], [39] x h (t, where x 0 and x f are the start and terminal positions of the training that are known to the robot, τ h (t) = t η h , and η h denotes the movement duration.
For the 1-DOF robot, we let m = 3 and design χ (η j ) in Eq. (24) as σ j in Eq. ( 24) is denoted as so that the gradient of Ĥ (η j ) can be calculated as  24)-( 26) and Adam optimization method in Eqs. ( 28)-( 32) are designed as ε = 0.1, p ω = 0.3, p ϱ = 0.3, γ = 0.1, υ = 0.2.During the learning process, each iteration period is set as 1s and the uncertainty of the hand tremor of the human is considered and simulated by adding Gaussian noise with mean value of 0 and variance of 0.25 N to the interaction force f h (t).

A. Estimating Characteristic Parameter of the Upper Limb
As stated in Section III-B-1), to estimate the characteristic parameter of the upper limb, the prescribed interaction force is set as zero, i.e. f d (t) = 0 N.
For η h -learning, let the reference trajectory of the robot x r (t, η) have the same form as x h (t, η h ) in Eq. ( 39) as where τ (t) = t η .Then the robot's reference trajectory used in the jth iteration is denoted by and τ j (t) = t ηj and η 0 = 1.5 m/N.Then using the RLS-based estimator designed in Eqs. ( 24)-( 26) and iterative Adam optimization method in Eqs. ( 28)- (32), the simulation results are shown in Figs. 4 and 5, where Fig. 4 represents the reference trajectory of the robot in three iterations and interaction force in the 8th iteration, and Fig.  5 illustrates the performance index function and the estimated characteristic parameter of the upper limb during iterations.In Fig. 4(a), the robot iteratively adjusts its reference trajectory x r,j (t, η j ) to estimate the desired training trajectory x d (t, η d ) that equals x h (t, η h ) in the case of f d (t) = 0 N as indicated by Eq. ( 17).After 8 iterations, it can be observed that the reference trajectory of the robot coincides with the desired trajectory of the human, i.e. approximately satisfying x r,j (t, η j ) = x h (t, η h ) for j ≥ 8, which is in line with the theoretical result.The zero interaction force is achieved subject to vibrations caused by the hand tremor as shown in Fig. 4(b).With the proposed learning method, the performance index function H (η j ) decays with iterations in Fig. 5(a) and the estimation of the characteristic parameter converges to 0.9991 s with the estimation error ∆η h = η h − ηh = 9 × 10 −4 s.From Fig. 5(b), it is observed that when the interaction force f h (t) of the limb reaches zero after 8 iterations, η h is accurately estimated by the proposed method.

B. Estimation of the Desired Training Trajectory of the Robot
With the estimated ηh = 0.9991 s obtained in Section IV-A, the desired trajectory of the robot is estimated to let the human achieve prescribed interaction force.Two cases of f d (t) are considered as follows: case 1: f d (t) = 5 sin 0.5t N; case 2: f d (t) = 5 sin π 2 t N. The simulation results of cases 1 and 2 with η 0 = 0 m/N are respectively illustrated by Figs.6-7 and Figs.8-9.From Figs. 7 and 9, it shows that the proposed robotic controller makes the estimated value η j converge after 7 iterations in case 1 and 10 iterations in case 2 with the decreased performance index H (η j ).By tracking the ultimately convergent reference trajectory denoted by blue lines in Figs.6(a) and 8(a), i.e. x r,7 (t, η 7 ) in case 1 and x r,9 (t, η 9 ) in case 2, the robot can always provide the prescribed interaction force to the human as indicated by Figs.6(b) and 8(b) using the trajectory difference between the human and robot.By inspecting Figs.4-9, it is confirmed that with the proposed iterative learning-based robotic controller, the robot can supply the prescribed interaction force to the human.Besides, since the prescribed interaction force can be achieved in the presence of the hand tremor of the human, the proposed robotic controller preserves robustness against uncertainties that are inevitable in human-robot interaction.

V. EXPERIMENTS
In this section, experiments are used to evaluate the performance of the proposed iterative learning-based robotic controller.Experimental setup is first introduced in Section V-A based on complete descriptions and detailed parameters of the experimental platform; Section V-B describes the robot's learning results of estimating the characteristic parameter of the upper limb; the estimation of the desired trajectory of the robot is implemented in Section V-C, where the desired interaction force is achieved, and the comparison with another two existing control methods is made to illustrate the advantage of the proposed control method.

A. Experimental setup
Experimental setup shown in Fig. 10 comprises of a H-Man robot, force transducer, handle mounted force sensor and graphic user interface (GUI).The H-Man robot is a 2-DOF planner robot and it is suitable to perform experiments of human-robot interaction scenario because of its flexible workspace.This robotic platform has a movement range of 342mm × 330mm with real time and command resolutions of 0.1 mm and 1 mm respectively.A maximum 20 N static force can be applied to the platform through the handle, which is mounted on a flat surface and moves on Hshaped linear slider driven by rotatory motors and cable mechanism.Additionally, a 6-DOF ATI force/torque sensor is equipped at the bottom of the handle to measure user's forces in two directions.A C-sharp based GUI is connected to the H-Man clients and the force transducer through RJ45 connection.Initial reference trajectory, impedance parameters and coordinate calibration can be performed through GUI.Using the above-mentioned experimental setup, the proposed approach unfolds in two parts: the estimation of human's upper limb characteristics and the estimation of the desired trajectory of the robot.
The human user was asked to repetitively move in the forward direction along X-axis in a comfortable manner with random hand tremor and normal speed to test the effectiveness of the proposed robotic controller.Values for stiffness and damping of the robot are given as k d = 1000 N/m and c d = 400 Ns/m respectively.The parameters of the RLS-based estimator in Eqs. ( 24)-( 26) and the Adam optimization method in Eqs. ( 28)- (32) are respectively set as ε = 0.03, p ω = 0.2, p ϱ = 0.2, γ = 0.01, υ = 0.75.χ (η j ) in Eq. ( 24) is designed as Eq. ( 40) for this experiment so that the calculated ∇ Ĥ (η j ) Fig. 10.
Experimental setup for the human-robot interaction.A human user holds the handle mounted (including a force sensor) on a H-shaped planar robotic platform.The movement of that handle is directed by the force from human's upper limb and the force generated by embedded motors.A desired interaction force is expected to produce through changing the reference trajectory of the robot.
has the same form as Eq. ( 42).

B. Estimation of Characteristic Parameter of the Upper Limb
At this stage, the dynamic behaviour of the involved upper limb can be captured by estimating the characteristic parameter η h .The human's desired trajectory is described using the general minimum jerk model [38], [39] in Eq. ( 39) with the movement duration η h as the characteristic parameter of the upper limb.The robot's reference trajectory in each iteration can be represented by Eq. (44) with η j iteratively updated from η 0 = 0 s.x 0 and x f in the minimum jerk model in Eq. ( 39) are set as x 0 = 0 m and x f = 0.2 m.
The performance of the estimation is reflected by Figs.11 and 12.The iterative desired trajectories of the robot are shown in Fig. 11 and the interaction forces in iterations are given in Fig. 12.To verify whether the proposed robotic controller is applicable to the human user with unstable periodic characteristics, the disturbance is exerted to the interaction force by the hand tremor of the human user.Such kind of disturbance is common in the scenario of rehabilitation.Moreover, 0.3 − 0.7 N noise normally exists during the measurement of the force.Therefore, when the human-robot interaction force in Fig. 12 is around a threshold of 0.5 N, it is considered zero interaction force is achieved, denoted by the red dotted line.Figs.11 and 12 depict that the reference trajectory of the robot converges in the 7th iteration, and the robot enables the human user to move freely along the X axis with a small interaction force.Consequently, as explained in Section III-1), the robot successfully learns the characteristic parameter of upper limb of the human in the 7th iteration.
The estimated characteristic parameter of the limb and the performance index H (η j ) are respectively illustrated by Figs.13(a) and 13(b).According to Fig. 13, H (η j ) iteratively decreases and converges to a range close to zero and the estimation value is ηh = 3.511 s.
In the 7th iteration, the human realizes free movement with approximate zerointeraction force where the vibration is intentionally added by the hand tremor of the human suject.

C. Estimation of the Desired Trajectory of the Robot
The estimated characteristic parameter ηh = 3.5111s obtained in the previous section is used in the update law in Eq. (44) with η 0 = 0 m/N to perform the estimation of the desired trajectory of the robot.In Fig. 11, the reference trajectory x r,7 of the robot in the jth iteration represented by the dotted dark brown line can be regarded as xh (t, ηh ) according to Eq. ( 17), Eq. ( 18), and the approximate zero-interaction force f d (t) in Section V-B.In this subsection, we will compare three control methods used for generating the desired interaction force: the impedance control in [28], the adaptive force control in [29] and the proposed control strategy in this paper.
The existing impedance control in [28] is designed as Eq. ( 10) without updating the reference trajectory of the robot and with ) , τh = The adaptive force control in [29] is used for comparison.Its low-level control is the same as the controller given in Eq. (10) and its high-level control, i.e. the iterative learning law of x r,j (t), is given as where γ is a constant coefficient representing the learning speed which is designed as γ = 0.5.
There are two cases considered in this section.The first case is to achieve an assistive force, which is useful when robot assists human user to complete a repetitive task, e.g.sawing.The second case is to achieve a resistive force, which is useful in the robot-based rehabilitation, where the robot should resist the human motion, so that the human user has to actively contribute to completing the task.
1) Assistive mode: The desired interaction force in the assistive mode is designed as The estimation results of the desired robot's reference trajectory and human-robot interaction force using the abovementioned impedance control, adaptive force control and proposed control methods are shown in Figs. 14 and 15 respectively.
In Fig. 14, the estimated x h obtained in the last subsection serves as the initial trajectory of the robot needing to be followed.With the proposed control and the existing adaptive force control, x h is iteratively updated since these two kinds of controls both belong to iterative control.As a result, the finally obtained desired trajectory xd is different from x h with these two controls.In contrast, with the traditional impedance control that is used for comparison, only the impedance of the robot is updated and the reference trajectory of the robot is maintained.Consequently, the estimated x h is also the final desired trajectory xd with the existing impedance control and they overlap with each other.
From Fig. 15, it is found that the impedance control in [28] has the worst performance since it lacks the capability of adjusting the robot's reference trajectory and the desired interaction force cannot be obtained no matter what impedance parameters of the robot controller are chosen.Compared with the pink line representing xh (t, ηh ) estimated in Section V-B, there is an obvious upward rise of the estimated robot's desired trajectory xd (t, ηd ) with the proposed control method in Fig. 14, which is in line with Eq. (34).With the desired trajectory difference between the robot and human, the prescribed interaction force is generated in the assistive mode.Compared with the adaptive force control method in [29], the estimated desired trajectory of the robot obtained using our proposed method is more smooth without any vibration since xd (t, ηd ) iteratively updates according to the estimated parameter η j and the integral performance index function H (η j ).Consequently, the adverse periodic effect from the interaction force f h,j (t) is canceled using our proposed method.In contrast, the adaptive force control method in [29] updates the robot's reference trajectory x r,j (t) in a point-to-point manner according to the noisy interaction force, so that the obtained xd (t) is with undesirable noises which will iteratively accumulate and ultimately lead to a large interaction force error as shown in Fig. 15.From Figs. 14 and 15, it is clearly seen that, despite intentionally added human tremor, the interaction force can still reach the desired level with our proposed control strategy, which indicates its robustness.
Fig. 16 shows the performance index function of the upper limb H (η j ) in the assistive mode.The performance index function in Fig. 16 illustrates the level of the interaction force tracking error.The decrease of H (η j ) in iterations means that the designed robot controller successfully adjusts the robot's reference trajectory towards the force error decreasing direction and effectively achieves the desired interaction force.As observed, H (η j ) converges after 8 iterations.
2) Resistive mode: The desired interaction force is designed as In this case, the resistive mode is taken into account to test the proposed control method, where the robot tries to provide resistance during the motion, so that the human user can be actively involved in the task.
Similar to the assistive mode, the robot's estimated desired trajectory is reflected by Fig. 17 and the interaction force in the resistive mode is illustrated by Fig. 18.Compared with the estimated xh (t, ηh ), there is a downward deviation of the estimated desired trajectory xd (t, ηd ) with the proposed control in Fig. 17, which confirms Eq. (34).Observing Figs. 17 and 18, the estimated desired trajectory using our proposed control provides a desired resistive interaction force.The performances of the existing impedance control in [28] and adaptive force control in [29] are also revealed by Figs. 17 and 18.It can be seen that the traditional impedance control engenders an unsatisfying interaction force far beyond our expectation.Besides, the adaptive force control shows vibration in the estimated desired trajectory xd (t) of the robot and a non-negligible interaction force tracking error caused by the vibrating interaction force exerting by the trembling human hand.It can be seen from Fig. 18 that the tracking performance of the interaction force is clearly better achieved using the proposed control approach as compared to the aforementioned two existing control methods.
Besides, the performance index function H (η j ) is stabilized at a value near zero in Fig. 19 and converges after 7 iterations in the resistive mode.Fig. 14.Estimated desired trajectories of the robot using the impedance control in [28], the adaptive force control in [29] and the proposed control in the assistive mode.Due to lack of capability of adjusting the desired trajectory of the robot, x d (t) of the impedance control is designed to be as same as xh (t) obtained in Section V-B in both assistive and resistive modes.Fig. 15.Interaction force f h in the last iteration using the impedance control in [28], the adaptive force control in [29] and the proposed control in the assistive mode.

D. Repetitive Experiments
To further test the performance of the proposed controller, the whole experimental process including estimating η h and x d (t, η d ) are repeated for five times under the same condition.
Errors between the actual and the desired interaction forces are computed for five trials in both estimation of the characteristic parameter of the upper limb in Section V-B and estimation of the desired trajectory of the robot in Section V-C, including asssistive and resistive modes.To clearly display the performance of the proposed control, a universal filter [40] is used to process the data and eliminate the vibration of the interaction force tracking error caused by the hand tremor.For the estimation of η h , the final interaction force Fig. 17.Estimated desired trajectories of the robot using the impedance control in [28], the adaptive force control in [29] and the proposed control in the resistive mode.Fig. 18.Interaction force f h in the last iteration using the impedance control in [28], the adaptive force control in [29] and the proposed control in the resistive mode.tracking errors in five trials are reflected in Fig. 20, where different trials are shown in different colors.From Fig. 20, it is clear that the force errors are maintained around zero, which illustrates that η h can be effectively estimated in five trials.Similarly, interaction force tracking errors in five trials for the assistive and resistive modes are presented in Figs.21(a) and 21(b), which it is clearly seen that the force errors finally converge to small ranges around zero indicating the satisfying interaction force tracking performance.From Fig. 22, it is seen that the estimated characteristic parameter ηh is relatively consistent at 3.5 s and shows similar values in five repetitive trials.Average values of the final interaction force tracking errors in five trials for both assistive and resistive modes are shown in Fig. 23, which shows that the average values of f e (t) can keep less than 0.8 N even in the presence of the hand tremor of the human user in both assistive and resistive cases.The residual interaction force tracking error is mainly caused by the force measurement noise of around 0.3 − 0.7 N.

VI. CONCLUSIONS
In this paper, an iterative learning-based robotic controller is developed to provide a certain prescribed interaction force to the upper-limb of human.The implementation of the proposed controller includes two steps: estimating the characteristic parameter of the upper limb using the designed RLS-based estimator and Adam optimization method; with the estimated characteristic parameter, estimating and updating the robot's reference trajectory yields the prescribed interaction force aligning with specific requirements in different tasks.Since the proposed iterative controller updates the trajectory with an integral performance index function, it preserves robustness against the time-related uncertainty, as illustrated in comparative experiments.In this paper, although the proposed method has the advantage of better robustness and better performance, it needs two steps to help the human user to achieve the interaction force, which make this method somehow complicated to implement.In the future study, we will focus on developing a learning method, which enables the robot to simultaneously estimate the characteristic parameter and the desired trajectory, such that the learning efficiency will be further improved.Besides, the consideration of input saturation of robotic actuators used for human-robot interaction [41], [42] will also be our focus in the future work.APPENDIX Using Assumption 1, Eq. ( 26) can be equivalently stated as σ j − σ j−1 = κ j χ (η j ) χ T (η j ) ε + χ T (η j ) κ j χ (η j ) (σ ′ j − σ j−1 ) .where I m ∈ R m×m is an identity matrix.Since the following equations hold κ j χ (η j ) χ T (η j ) = χ T (η j ) κ j χ (η j ) = χ T (η j ) κ j χ (η j ) (52) where the Euclidian 2-norm is used, it follows that I m − κ j χ (η j ) χ T (η j ) ε + χ T (η j ) κ j χ (η j ) < 1.
Consequently, Theorem 1 has been completely proved.

Fig. 1 .
Fig.1.A simplified scenario of human-robot interaction, where the human user holds the handle (end-effector) of a planar robot.

Fig. 3 .
Fig. 3. Block diagram of the proposed iterative learning-based robotic controller.

)
Parameters of the robot are selected as m = 5 kg, c = 0.5 Ns/ (m), g = 50N, J = 1kgm 2 .The actual training duration and the stiffness of the human are respectively designed as η h = 1 s and k h = 100 N/m.The start and terminal positions of the training defined in Eq. (39) are set as x 0 = 0 m and x f = 0.3 m.The control parameters are selected as c d = 2000 Ns/m and k d = 3000 N/m.Parameters of the RLS-based estimator in Eqs. (

Fig. 4 .
Fig.4.Reference trajectory of the robot x r,j (t, η j ) in (a) and interaction force f h,j (t, η j ) in (b) in Section IV-A.

Fig. 5 .
Fig. 5. Performance index function H (η j ) in (a) and estimated characteristic parameter of the upper limb η j in (b) in Section IV-A.

Fig. 6 .
Fig.6.Reference trajectory of the robot x r,j (t, η j ) in (a) and interaction force f h,j (t, η j ) in (b) in case 1 of Section IV-B.

Fig. 9 .
Fig. 9. Performance index function H (η j ) in (a) and estimated parameter η j in (b) in case 2 of Section IV-B.

Fig. 13 .
Fig. 13.Performance index function H (η j ) in (a) and estimated characteristic parameter of the upper limb η j in (b).
existing impedance control estimated x d with existing adaptive force control estimated x d with proposed control

Fig. 20 .
Fig. 20.Interaction force tracking errors in five reduplicate experimental trials under the proposed control scheme in estimation of the characteristic parameter of the upper limb in Section V-B.

Fig. 21 .Fig. 22 .Fig. 23 .
Fig. 21.Interaction force error in five reduplicate experimental trials under the proposed control scheme in estimation of the desired trajectory of the robot including both assistive and resistive modes in Section V-C.