Composite Index Construction with Expert Opinion

Abstract Composite index is a powerful and popularly used tool in providing an overall measure of a subject by summarizing a group of measurements (component indices) of different aspects of the subject. It is widely used in economics, finance, policy evaluation, performance ranking, and many other fields. Effective construction of a composite index has been studied extensively. The most widely used approach is to use a linear combination of the component indices, where the combination weights are determined by optimizing an objective function. To maximize the overall variation of the resulting composite index, the combination weights can be obtained through principal component analysis. In this article, we propose to incorporate expert opinions into the construction of the composite index. It is noted that expert opinion often provides useful information in assessing which of the component indices are more important for the overall measure of the subject. We consider the case that a group of experts have been consulted, each providing a set of importance scores for the component indices, along with a set of confidence scores which reflects the expert’s own confidence in his/her assessment. In addition, the constructor of the composite index can also provide an assessment of the expertise level of each expert. We use linear combinations to construct the composite index, where the combination weights are determined by maximizing the sum of resulting composite index variation and the negative weighted sum of squares of deviation between the combination weights used and the experts’ scores. A data-driven approach is used to find the optimal balance between the two sources of information. Theoretical properties of the procedure are investigated. Simulation examples and an economic application on constructing science and technology development index is carried out to illustrate the proposed method.


Introduction
Composite index is used to provide a summary measurement of a complex subject with many different features. By measuring the features separately as the component indices, then numerically combining them into a single value as the composite index, it is often used for comparison and ranking. It is widely used in economics, finance, policy evaluation, performance ranking, and many other fields. For example, the Leading Economic Index (LEI) published by The Conference Board is composed of 10 economic component indices whose change tend to precede changes in the overall economy (Stock and Watson 1989). Market indices such as S&P 500 Index and Dow Jones Index are composite indices by combining the prices of a group of stocks (Cross 1973;Kawaller, Koch, and Koch 1987). Volatility index is a composite index constructed using information in option prices of different strikes and expiration dates (Whaley 2008). The Regulatory Indicators Assessment index, the stakeholder engagement index and the ex post evaluation index constructed by OECD are all composite indices to evaluate the regulatory policy and governance (Kaur and Lodhia 2014). College rankings are done by using a composite index combining various aspects of the university, including student life, graduation rate, funding levels to faculty research ability (Karabel and Astin 1975 The research on how to construct a composite index is vast. A common approach uses a linear combination of the individual observed component indices to construct the composite index. With such an approach, the main task is then to determine the combination weights. Two methods, the objective weighting method and the subjective weighting method, are typically used, based on the information being used. The objective weighting method relies only on the measured data and is widely studied, including the principal component analysis (PCA) approach (Alzate and Suykens 2010;Tavoli et al. 2013), the entropy weighting method (Hoskisson et al. 1993;Jing, Ng, and Huang 2007;Chen and Li 2010;Shemshadi et al. 2011), the clustering approach (Milligan 1989;Eisen et al. 1998;Yu, Yang, and Lee 2011) and others. The subjective weighting methods include ranking weighting method (Roszkowska 2013), analytic hierarchy process (AHP) (Al-Harbi 2001) and others. These subjective weighting method heavily depends on the experts' professionalism. On the other hand, the objective weighting method, which determines the combination weights solely based on the observed component indices, along with certain mathematical models and assumptions, neglects the subjective judgment information of the decision makers and may result in misleading and counter intuitive results (Aalianvari, Katibeh, and Sharifzadeh 2012). There are some recent researches on multi-criteria decision making (MCDM) methods (Zardari et al. 2015), such as the Best-Worst Method (Rezaei 2015), Analytic Network Process (Saaty 2008;Meade and Presley 2002), and context tree weighting (Willems, Shtarkov, and Tjalkens 1995;Garivier 2006). Those MCDM methods are datadriven, and are used in many fields including systems engineering studies (Kujawski 2003).
In this article, we develop a novel method that combines the objective information (data) with subjective information (expert opinion). By effectively combining both sources of information when available, it makes the construction more accurate and reduces biases introduced by either source of information. Specifically, we adopt a factor model setting to use the observed component indices, and use a least-square penalty term to incorporate the expert's opinion regarding the combination weight (importance) of each component index, along with a self-assessment of confidence from the experts and an expertise score from the composite index constructor. Such a comprehensive collection of subjective information allows for diverse opinions and different levels of expertise on different subjects. We use a penalty parameter to balance the influence of the objective information and subjective information. It can be viewed as a ratio of the noise levels in the data and in expert opinions. The optimal penalty parameter can be obtained by maximizing the overall accuracy, through a cross-validation approach.
The rest of the article is organized as follows. In Section 2, we introduce the factor model assumed for the observed component indices, and the data structure of the experts' opinion along with their confidence scores. We then introduce the objective function that combines both sets of information. The composite index is constructed by finding a set of combination weights that optimizes the objective function. In Section 3, we investigate the theoretical properties of the construction. Finite sample properties of the developed procedure are investigated in Section 4 through a simulation study. An economic application on constructing a composite index on science and technology development is shown in Section 5. Section 6 concludes.

Data and Model Setting
Suppose we have K candidate component indices {x ki } for k = 1, . . . , K with N observations i = 1, . . . , N, to be included in the construction of the composite index. We will use a linear combination of the component indices to construct the composite index. Specifically, the composite index is in the form where the combination weight w = (w 1 , . . . , w K ) , normalized so that ||w|| 2 2 = K k=1 w 2 k = 1, needs to be determined. Traditional composite index construction using PCA approach (Li et al. 2012;Nardo et al. 2008) finds the combination weight w so that the resulting component indices have the largest variance among all possible such linear combinations -the first principle component. Specifically, letŵ be the normalized eigenvector corresponding to the largest eigenvalue ( 2 ) The composite index is then constructed asf i =ŵ x i . The variance off i is the largest among all possible such combinations by the construction ofŵ in Equation (2). PCA estimation is usually done using singular value decomposition, though the original optimization formulation can be useful when additional information is used as it allows modifications of the objective function.
We note that the solution of the PCA approach above is the same as fitting a single-factor model where w = (w 1 , . . . , w K ) is the loading vector, and f i is the latent common factor. The noise i = ( 1i , . . . , Ki ) is assumed to be iid with zero mean and covariance matrix . The estimator of the latent factor f i isŵ x i , under a general condition on Var(f i ) and .
In addition to observing the K component indices, we also assume that we have surveyed total J experts who have provided their direct assessments of the combination weight w for the construction of the composite index, along with a confidence score on each of the combination weights provided. Let (s kj , γ kj ), k = 1, . . . , K, j = 1, . . . , J, be the importance score and its corresponding confidence score provided by the jth expert. The importance score to each component index reflects the experts' opinion on how much combination weight should be assigned to a candidate component index in the construction. The score s kj is normalized so that K k=1 s 2 kj = 1. We will assume that the experts' scores are proportionally unbiased, with E[s kj ] = δw k where w k are given in Equation (3) and δ > 0 is a scalar. The reason for the proportional unbiasedness assumption used here instead of simple unbiasedness assumption is due to the fact that the two conditions K k=1 s 2 kj = 1 and K k=1 w 2 k = 1 make the simple unbiasedness assumption impossible.
The confidence score γ kj reflects the expert's assessment on his/her own expertise level on the subject, possibly with different levels of expertise among the K component indices. The larger the γ kj is, the higher the experts' confidence is on the k's component. If the expert knows one component index very well, then he/she will assign a large confidence score. Otherwise, a small score will be assigned. Jiang, Liu, and Zhu (1996) considered the situation that an expert will provide a range (interval) for each of the combination weights. Corresponding to our setting, the center point of their interval would be the importance score and the inverse of the interval width would be the confidence score in our case. In this article, the confidence score γ kj is restricted to [0, 1].
Furthermore, the constructor of the composite index may assign an "expertise" score c j to the jth expert. This provides a ranking among the experts in terms of their relative expertise on the construction of the index of interest. We restrict the value of c j to (0, 1].
It is the aim of this article to construct the composite index by combining both the observed component indices and the expert opinion. Statistically speaking, our construction is based on a model with two parts: a factor model on the observed indices, and IID scores which are "proportionally unbiased" from the experts. To use both sources of information, we use a combined least-square objective function in the form subject to 0 ≤ w k ≤ 1 and ||w|| 2 2 = 1, and δ > 0 is a simple scalar, where j = diag(c j γ 1j , . . . , c j γ Kj ).
The first term in Equation (4) is the original criterion of estimating the optimal combination weight using PCA without expert input. It is a classical quadratic maximization problem. The second term is the weighted least-square term for fitting the expert opinions, adjusted by their confidence scores and expertise scores.
The constant Q is a penalty parameter which balances the variance of the linear combination in the first term and the error variance in fitting the expert opinions in the second term. It is an important parameter. When Q is large, the objective function g * N,J (w) in Equation (4) puts more combination weights on the second term related to the expert opinion. Hence, the solutionŵ N,J would be closer to the optimizer of the second term. Similarly, when Q is small, the solution would be closer to the PCA solution that maximizes only the first term, without the expert opinions. In fact, the optimal Q should reflect the comparison between the noise level in the observed data and the noise level in the expert opinion. When the noise level in the observed data is larger than that in the expert opinion, we would trust the experts more, hence using larger Q.
Remark 1. If we assume normality on x i and s j , then it is also possible to estimate w using the maximum likelihood method. In this article, we choose to use the weighted least-square criterion so that we do not need to specify and estimate the error covariance matrix.
Remark 2. If we treat the expert information as prior information, then a Bayesian approach can be used as well. However, it would require to specify the expert score distribution as well as noise distribution in the factor model. We do not use the Bayesian approach in this article.
Optimizing g * N,J (w) in Equation (4) is equivalent to optimize Remark 3. Note that if we do not have the observed data, then s * =¯ −1 Js J would be the solution of the second term in Equation (4) without the w w = 1 constraint. It provides a summary of the expert scores, adjusted by the confidence scores and expertise scores. In particular, if all γ kj are the same, then¯ J is in a form of a scalar matrix. Then the solution to the second term in Equation (4) with the w w = 1 constraint would bē s J / s Js J , a normalized average of the expert scores.
Solution to Equation (5) can be obtained through quadratic programming, under quadratic equality constraints. We also note that the objective function is a difference of two convex functions in a constrained space. Optimization of such a function is easy and fast, with good properties (Markowitz 1956).
Once we obtainŵ N,J through optimizing (5), the composite index can be constructed with C i =ŵ N,J x i and the fitted value of x i can be obtained withx i =ŵ N,Jŵ N,J x i .

Geometry Interpretation
The quadratic term involving w in Equation (5) can be written as w * w, where * = a N,Jˆ N − b N,J Qδ 2¯ J . Therefore, different from the PCA approach in which the covariance matrix is always positive semidefinite, the combined objective function g N,J (w, δ) is in a quadratic form with the "covariance" matrix * , which can be positive definite, negative definite or indefinite, depending on the penalty parameter Q. When Q is small, the matrix * is more likely to be positive definite; when Q is very large, the matrix would be negative definite.
To illustrate, suppose K = 2. The surface in Figure 1 shows the quadratic function g N,J (w, δ) for a fixed δ. The unit circle constraint of w is represented by the cylinder space. The constraints restrict the quadratic maximization problem in a lower dimensional constrained space which is also compact. Note that there is no "corners" in the lower dimensional space, hence the function g N,J (w, δ) is also continuous in the reduced space, hence an optimization solution always exists, no matter whether the matrix * is positive definite, negative definite, or indefinite.

Determination of the Penalty Parameter
The penalty parameter Q is an important tuning parameter in practice. It has significant impact on the constructed composite index as discussed in Section 2.1. Here, we propose to use a combined M-fold (M = M 1 M 2 ) cross-validation for its determination in practice. Specifically, the original observed sample is randomly partitioned into M 1 equal size subsets D 1 , . . . , D M 1 . The experts scores are divided into M 2 equal size subsets D * 1 , . . . , D * M 2 . We use M 1 − 1 observed data sample subsets and M 2 − 1 experts scores subsets as the training data for estimating w. Thenŵ N,J is used to predict the validation subset under the factor model setting and the prediction sum of squares of errors is obtained. Specifically, define is the optimal w estimated using Q and without using data in D m 1 and D * m 2 . Optimal Q is the one with the smallest cross-validation error CV(Q). Here C is a tuning parameter that balanced the two sources of errors. For small sample cases, leave-one-out cross-validation is used.

Large-Sample Properties
In this section, we investigate the large-sample properties of the estimator proposed in the preceding sections. We consider the rate of convergence when the number of involved component indices K is fixed or grows with the sample sizes N and J.
In addition, we also establish the central limit theorem ofŵ when K is fixed. In the high dimensional setting, we let K and J be constants depending implicitly on N, and consider the asymptotics as N → ∞.
We first list the assumptions for the fixed dimensional case. For the rest of this article, we use ||·|| to denote the spectral norm of a matrix, and the Euclidean norm of a vector. The symbol ⇒ denotes the convergence in distribution. Assumption 1. Assume K > 0 is fixed, and x i are iid with mean zero and covariance matrix 0 . Let λ 1 be the largest eigenvalue of 0 , and w 0 the corresponding eigenvector. Assume that the second largest eigenvalue λ 2 of 0 is strictly smaller than λ 1 . We also assume that var(w x i x i w) is bounded for all w with ||w|| 2 2 = 1.
where 0 is a constant diagonal matrix, with positive diagonal elements, and˜ s is a constant positive-definite matrix.
For the high-dimensional case, we replace Assumptions 1 and 4 with the following: Assumption 1(*): We assume K → ∞ as N, J → ∞. Assume that x i are iid with mean zero and covariance matrix 0N . Let λ 1N be the largest eigenvalue of 0N , and w 0N the corresponding eigenvector. Let λ 2N be the second largest eigenvalue of 0N , assume lim inf N→∞ (λ 1N − λ 2N ) > 0. We consider the optimization problem (5) with a covariance matrix estimatorˇ N Assumption 4(*): Assume that the smallest diagonal element of¯ J is positive and bounded away from zero.
The following remarks provide some comments on the assumptions.

Remark 4. Assumption 1 is typical for principle component analysis. The iid assumption of {x
satisfy certain mixing conditions. A more widely used but more restricted assumption (typical for a factor model) that 0 = (λ 1 − λ 2 )w 0 w 0 + λ 2 I can be used here as well, as it also guarantees that w 0 maximizes w 0 w. Here we choose to allow the noise term x i − w 0 w 0 x i to have nonzero correlation.
Remark 5. Assumption 2 assumes that all experts make their assessments independently. Assumption 3 assumes that the experts do not make their assessments based on the observed data.
Remark 6. Assumption 4 is needed to derive the central limit theorem for fixed K. We do not make assumptions on the confidence scores and expertise scores. We only need to assume that as J goes to infinity,¯ J converges to a finite limit. If J is much  larger than N (when N/(N + J) → 0), we assume the smallest diagonal element 0 is non-zero so that all component indeces receive sufficient input from the experts.
Remark 7. Assumptions 1(*) is needed to handle the highdimensional setting with K → ∞, which is on the convergence rates ofˆ . Since it is not the focus of this article to consider the covariance matrix estimation, we list it as a high level condition. Such convergence rates often require structural assumptions on 0 , and have been extensively studied and are widely available in the literature; see, for example, Fan, Liao, and Liu (2016) and Vershynin (2018) and references therein, among many others. We also note that when K is fixed, Assumptions 1 guarantees that Assumption 1(*) is fulfilled withˇ N =ˆ N and 1N = N −1/2 . We first establish the convergence rate ofŵ. For this result, we allow K to grow with N and J. Based on Remark 7, a fixed K is a special case of this scenario.
Remark 8. As discussed earlier, the parameter Q balances the two sources of information. Theorem 1 requires Q = Q N = νN 2 1N to allowŵ to follow the faster convergence rate. For an arbitrary Q N , let R N = √ (JQ N )/N, the convergence rate can be shown as This is a slightly stronger result, but requires a more tedious proof.
When K is fixed, we can further have the central limit theorem forŵ :=ŵ N,J .
Theorem 2. Under Assumptions 1 to 5, it holds that The proof is shown in the appendix.
Remark 9. In the unbalanced cases (i.e., N/(N + J) goes to 0 or 1), the combined estimator has the same asymptotic variance as that using the dominant source of information only. In the balanced case, the estimator is more efficient than using only one source.
Remark 10. The penalty coefficient Q should be chosen to minimize the trace of the asymptotic variance matrix. However, it is quite involved as Q appears in both 0 and the middle term (a 1 + (1 − a)Q 2 2 ) in the asymptotic variance. In practice, we use cross validation procedure to choose the optimal Q.

Simulation Studies
In this section, we present some empirical studies to illustrate the performance of the proposed estimatorŵ N,J under different N and J combinations. The impact of the penalty parameter Q and the performance of the cross-validation method are investigated as well.
For each of simulation, we assume x i ∼ N(0, 0 ) where 0 = w 0 w 0 + σ 2 I, hence, it can be written as a factor model N(0, 1) and i ∼ N(0, σ 2 I). All f i and i are iid and independent to each other. The expert scores s j are generated according to the distribution of s described as follows. We first generates through the spherical representatioñ s 1 = cos(e 1 ), s 2 = sin(e 1 ) cos(e 2 ), . . . ,s K = sin(e 1 ) · · · sin(e K−2 ) sin(e K−1 ), where the spherical coordinates e k are IID N(0, σ 2 s ). Then, we choose a K × K-dimensional orthogonal matrix U whose first column is w 0 , and generate s as s = Us. It holds that ||s|| = 1, E(s) = δ 0 w 0 , where δ 0 = E[cos(e 1 )] = exp(−σ 2 s /2). Exercise (i). In this experiment, we investigate the impact of the penalty parameter Q. Specifically, we use K = 4, N = 400, J = 40, γ kj = 1 and c j = 1 for k = 1, . . . , K, j = 1, . . . , J. We set θ 0 = (π/6, π/4, π/3) with corresponding w 0 = (0.866025, 0.353553, 0.176777, 0.306186). The expert scores s j are generated as described above. Here, we use σ = 0.2, σ s = 0.2. We vary Q from 0 to 100 in the estimation. Figure 2 shows the solution paths of the estimateŵ as the penalty parameter Q changes. It is seen that the solution paths are continuous. The three horizontal lines mark the value of the PCA solution (using only the observed component indices), the estimates base on the experts score only, and the true value w 0 in the factor model. The relationship between the estimated optimal combination weights and penalty parameter is clear seen. When Q = 0, the estimated combination weights are equal to the PCA estimates as the solution of the first term of the objective function (6). When Q increases, the estimated combination weights become closer to that of the experts. It is noted that the four solution paths cross the true combination weight line at different Q values and in the case of w 1 it does not cross at all. Table 1 lists some of the values.
Four performance criteria are used: mean squared error (MSE), root of mean squared error (RMSE), the relative of MSE (MSEr), and mean angle error (MAE). They are defined as   follows: where L is the number of replications. Here, RMSE gets the root mean square error of each component w k first, then averages over k. Figure 3 shows the four performance measures as functions of Q in the case of (N, J) = (400, 40). All of them show a "U"-shape function, with minimum values corresponding to an optimal Q under different criteria. They demonstrate that by selecting an optimal Q, one can effectively combine both the observed data and the expert opinion for the construction of the composite index. Table 2 shows the optimal Q and its corresponding performance under each criteria for the four sample size combinations. It is seen that, the optimal Q is larger when the number of experts J is larger, hence the combined experts' scores provides a more accurate estimate of the combination weights. When the sample size N is larger, the data provide more information, hence Q will be smaller.
Exercise (iii). In this experiment, we investigate the relationship between the optimal penalty parameter Q and the noise levels of the observations and expert opinions. Using J = 10, K = 2 and N = 100, we set f i ∼ N(0, 1), i ∼ N(0, σ 2 I), all γ 's and c are set to 1. Let θ 0 = π/4, or w 0 = ( √ 2/2, √ 2/2), and we use different levels of σ s to simulate expert scores s j .
Simulation is repeated 100 times to obtain MSE for each Q value and obtain the optimal Q (0 ≤ Q ≤ 100) under each σ and σ s combination. Table 3 reports the optimal Q where σ and σ s are chosen to be the arithmetic sequence from 0.1 to 1 and from 0.05 to 0.50, separately.
There are some interesting observations. First, for a fixed σ s , the optimal Q increases as σ increases. It confirms our conclusion that the optimal estimator (ŵ N,J ) should depend more on the expert opinions if the noise in the observed component indices is large. Second, for a fixed σ , the optimal Q decreases as σ s increases. This means that if the expert opinion is less reliable, the estimate (ŵ N,J ) will have a higher relevance on the observed dataset. Hence, the optimal Q reflexes a balance between the noise level of the observed data and noise level of expert opinion information. If the σ and σ s are provided, then theoretical optimal Q can be chosen.  Exercise (iv). In this exercise, we investigate the performance of the proposed cross-validation procedure for the determination of the optimal Q and its corresponding performance on the construction of the composite index. In this experiment, we assume K = 4, N = 400, J = 40, f i ∼ N(0, 1) and i ∼ N(0, 0.2 2 ). We set θ 0 = (π/6, π/4, π/3) as Exercise (i). s j 's are generated using with σ s = 0.2, γ kj = 1, c j = 1. We restrict Q in [0, 10], and we use 10-fold cross-validation for N and 4-fold for J. Then we obtain CV(Q) in Equation (7), with C = 1. Table 4 shows the performance of the estimator using the estimated optimal Q under cross-validation. The performance measures are obtained using 100 simulated datasets under each sample size setting. It is clearly seen that the combined construction with the optimal Q obtained from cross-validation outperforms that using data alone or using expert opinion alone. Exercise (v). In this exercise, we investigate the convergence ofŵ N,J under the high dimensional setup. We fix J/N = 0.2, let K N = 0.5N and generate f i ∼ N(0, 1), i ∼ N(0, σ 2 I). We choose σ 2 = σ 2 ,N = 5/K N so that the relative rank of 0N , which is tr( 0N )/|| 0N || = 6/(1 + 5/K N ), stays roughly at a constant, and ||ˆ N − 0N || = O P (N −1/2 ) according to Koltchinskii and Lounici (2017), see also Chapter 9 of Vershynin (2018). The expert scores s j are generated as Exercise (i) with all γ kj 's and c j setting to 1, θ 0 = π * (1, 2, . . . , K N − 1) , and σ s = 0.2. We plot the MSE ofŵ N,J against N in Figure 4, for different choices of Q. The convergence ofŵ N,J to w 0 is clearly seen from the plot.

An Application
In this section, as an empirical application, we use the proposed method to construct a composite index for scientific and technological activity output of provinces and province-level municipalities in China. It is important for the policy makers to be able to evaluate the output of scientific and technological activities which in turn provide guidance for generating scientific and technological investment policies. In this example, we use five component indices (indicators) from National Scientific and Technological Progress Statistical Monitoring Database maintained by the Ministry of Science and Technology of China. The indicators include Number of Scientific Papers per capita (in 10,000 people), Number of National Scientific and Technological Achievements Awards per capita (in 10,000 people), Number of Invention Patents per capita (in 10,000 people), Technical Transfer Amounts (in 10,000 CNY) per capita (in 10,000 people), and International Technology Revenue (in USD) per Gross Domestic Product (in 10,000 CNY). We use data of year 2017. There are 31 provinces and province-level municipalities, excluding Taiwan, Hong Kong and Macao. Source of data is the 2017 National Scientific and Technological Progress Statistical Monitoring Report from Ministry of Science and Technology. Table 5 shows the detailed variable description and some descriptive statistics.
We standardize each of observed component index x k· to mean zero and standard deviation 1. Table 6 shows the correlation matrix of the five component indices. The correlations among the component indices are very close to one, except x 5· . Figure 5 shows the distribution of the five indices. There are some obvious outliers. Among this small sample, Shanghai and Beijing are two very large outliers. They hold the two largest values in each of the five indicators. This is due to the fact that Shanghai and Beijing are the political, culture, science and technology, and business centers of China. These two outliers are also the source of the extreme high sample correlations among the component indices. We exclude these two sets of observations from the estimation of the combination weights for the composite index construction. Otherwise, their features will dominate the entire composition. We obtain the value of their composite index at the end base on the combination weights estimated using the data set without these two cities, for comparison purpose.
We surveyed 13 researchers who are experts on the issues related to science and technology development. Each expert gave scores to each of the five component indices and their corresponding confidence score. We also assessed their expertise levels ranging from 0.15 to 1. Figure 6 shows the boxplot of the expert scores and their confidence levels. The scores are standardized. Descriptive statistics of expert information is given in Table 7.
Since the sample size in this application is small, we use leaveone-out cross-validation instead of K-fold cross-validation in determining the optimal penalty parameter Q. Figure 7 shows CV(Q) defined in (7). The estimated optimal Q is 2.5, when CV(Q) is the smallest. Meanwhile, the estimated optimal δ is 0.8812.
Using the optimal Q, we estimate the combination weights, shown in Table 8. Due to the small sample size, we obtain bootstrap standard error (Efron and Tibshirani 1985) of the estimated combination weights, where both observed component induces and expert opinions are bootstrapped separately.
For combined estimation, we bootstrap the component indices and the export scores separately, in over the maintain the relative sample proportion. The combined estimates have relatively smaller bootstrap standard errors than the estimates using PCA or using expert opinion alone. Table 8 also shows the estimated combination weights using the observed data alone (PCA) and using the expert scores alone. For lower correlations between x 5· and others, it is interesting to see that PCA gives a smallŵ 5 (0.3592), the combination weight for the international technology transfer, while the experts give a much larger value (0.4227). Combining both information, we assign 0.3707 to the indicator. Figure 8 shows the solution path of the estimated combination weights as a function of Q. The vertical line indicates the optimal Q used and horizonal lines corresponding to the estimated combination weights.
The estimated combination weights are used to construct the composite index on science and technology output, including Beijing and Shanghai. We also use the national average of each component indicators to obtain the national index as a benchmark. Beijing has the highest science and technology output, since it has a large number of top universities and a large number of research institutes under Chinese Academy of Sciences. Shanghai ranks the second, due to its high concentration of major corporations and their R&D centers, as well as several major universities and research institutes. There are 7 provinces or province-level municipalities above the national benchmark: Beijing, Shanghai, Tianjin, Jiangsu, Shaanxi, Guangdong, and Zhejiang. Except Shannxi, these are the most industrial and developed regions of China. Figure 9 shows the constructed composite index for each provinces with their geographical locations. It is seen that some western provinces such as Shaanxi, Chongqing, Sichuan, Gansu, and Qinhai score high, although traditionally their economic developments are slower than provinces on the east coast line. This is partially due to the recent strategic Western Development policy of the central government and the road-belt initiative (Liu and Dunford 2016;Démurger et al. 2002).

Conclusion
This article proposes a penalized optimization approach to incorporate expert opinion information with the principal component analysis of observed component indicators in a factor model framework for the construction of composite index using linear combination of the component indicators. The combination weights are determined by objective data and subjective expert opinion. The approach involves a penalty parameter Q that balances two sources of noises, one from the observations and the other from the expert opinions. It can be chosen through a data-driven cross-validation approach.
The proposed approach can be naturally and technically easily extended to construct multiple indices, similar to finding multiple factors or principle components. However, index construction often has specific target and interpretations -the reason that a group of experts would generally agree on the importance of each series. A second (and maybe orthogonal) index would be very difficult to interpret. In addition, it would be almost impossible to ask the experts to provide their weights on the second index that may or may not be orthogonal to the first one. Despite of the difficulty in defining and interpreting multiple indices, it is worth further exploration for practical uses.
Proof: Since ||s j || = 1 and all diagonal elements of j are less than or equal to 1, Therefore, E||s J − δ 0¯ J w 0 || 2 ≤ 1/J, and the conclusion follows.

A.2. Proof of Theorem 2
Proof of Theorem 2: The Lagrangian form of optimizing g N,J (w, δ) with w w = 1 constraint is a N,J w ˆ N w − b N,J Qδ 2 w ¯ J w + 2b N,J Qδs J w − a N,J λ(||w|| 2 2 − 1).

Funding
Financial supports in part by China Natural Science Foundation Grants 71773078, 71803134. We are also supported by the Innovative Research Team of Econometrics in Shanghai Academy of Social Sciences.