Multiview Subspace Clustering via Low-Rank Symmetric Affinity Graph

Multiview subspace clustering (MVSC) has been used to explore the internal structure of multiview datasets by revealing unique information from different views. Most existing methods ignore the consistent information and angular information of different views. In this article, we propose a novel MVSC via low-rank symmetric affinity graph (LSGMC) to tackle these problems. Specifically, considering the consistent information, we pursue a consistent low-rank structure across views by decomposing the coefficient matrix into three factors. Then, the symmetry constraint is utilized to guarantee weight consistency for each pair of data samples. In addition, considering the angular information, we utilize the fusion mechanism to capture the inherent structure of data. Furthermore, to alleviate the effect brought by the noise and the high redundant data, the Schatten p-norm is employed to obtain a low-rank coefficient matrix. Finally, an adaptive information reduction strategy is designed to generate a high-quality similarity matrix for spectral clustering. Experimental results on 11 datasets demonstrate the superiority of LSGMC in clustering performance compared with ten state-of-the-art multiview clustering methods.


Multiview Subspace Clustering via Low-Rank Symmetric Affinity Graph
Wei Lan , Tianchuan Yang, Qingfeng Chen , Shichao Zhang , Senior Member, IEEE, Yi Dong, Huiyu Zhou , and Yi Pan , Senior Member, IEEE Abstract-Multiview subspace clustering (MVSC) has been used to explore the internal structure of multiview datasets by revealing unique information from different views.Most existing methods ignore the consistent information and angular information of different views.In this article, we propose a novel MVSC via low-rank symmetric affinity graph (LSGMC) to tackle these problems.Specifically, considering the consistent information, we pursue a consistent low-rank structure across views by decomposing the coefficient matrix into three factors.Then, the symmetry constraint is utilized to guarantee weight consistency for each pair of data samples.In addition, considering the angular information, we utilize the fusion mechanism to capture the inherent structure of data.Furthermore, to alleviate the effect brought by the noise and the high redundant data, the Schatten p-norm is employed to obtain a low-rank coefficient matrix.Finally, an adaptive information reduction strategy is designed to generate a high-quality similarity matrix for spectral clustering.Experimental results on 11 datasets demonstrate the superiority of LSGMC in clustering performance compared with ten state-of-the-art multiview clustering methods.

NOMENCLATURE
Data matrix, error term, and coefficient matrix of the vth view, where d v is a dimension specific to X v and n is the number of samples.Nuclear norm and Schatten p-norm.
For the arbitrary matrix A, A i j denotes its (i, j)th entry, Tr (A) is the trace of A, and A T is the transpose of A.

I. INTRODUCTION
W ITH the rapid development of information technology, multiview data are becoming increasingly common in many realistic scenarios with multiple collection sources.For example, images can be described with different features, such as color, texture, and shape.The features of different views have their specific attributes and provide complementary information to each other.Considering these views individually is often insufficient or incomplete.Therefore, it is critical to effectively integrate the unique information of each view.In the clustering problem, multiview clustering can explore the potential consistent information between different views and make full use of the complementary information of each view.Thus, multiview clustering methods have superiority compared with single-view methods.
Multiview clustering is often combined with subspace clustering.In addition, subspace clustering based on "selfexpressiveness" can be used to capture the complementary information and diversity in each view.The principle of self-expressiveness is that, given a set of data points extracted from a specific subspace, it is assumed that each data point can be represented as a linear combination of other data points.One of the most representative methods is the low-rank representation (LRR) [1], [2].The goal of LRR is to obtain the lowest-rank representation of the coefficient matrix by imposing low-rank constraints on the data points.The subspace clustering based on LRR can obtain a low-rank coefficient matrix to reveal the intrinsic subspace structure of data.It can efficiently transform the coefficient matrix into a similarity matrix for spectral clustering.In this way, all data points can be split into their respective subspaces, i.e., different subspaces represent different clusters.
Due to the effectiveness of subspace clustering for heterogeneous and complementary information integration, multiview subspace clustering (MVSC) [3], [4], [5], [6], [7], [8], [9], [10] has been proposed by extending subspace clustering to multiview learning.Many methods have been proposed for MVSC based on complementary information extraction.Cao et al. [11] proposed a diversity-induced MVSC (DiMSC) method.It used the Hilbert-Schmidt independence criterion (HSIC) to measure the diversity of different views.Ding and Fu [12] presented a multiview low-rank common subspace clustering method.It imposed the nuclear norm constraint on the projection matrix and learned a low-rank common subspace from the multiple view-specific projections.Yin et al. [13] proposed to model different views as different relations in a knowledge graph for view-specific embedding learning.Luo et al. [14] proposed a consistency-specificity multiview clustering method, which divides the coefficient matrix of each view into consistency and specificity matrices.The consistency matrix has a low-rank structure shared by all views, and the specificity matrix preserves the content unique to each view.Besides complementary information, consistent information across views can also improve the performance of MVSC.Tang et al. [15], [16] constructed a predefined matrix for each view and then fused these matrices into a consensus affinity graph by using collaborative training to obtain the clustering results of different views consistent with each other.Wang et al. [17] imposed the consensus loss term to minimize the divergence among all latent data-cluster matrices to achieve consistency.Nie et al. [18] proposed a Laplacian rank-constrained graph as the centroid of each view with different confidences.Wang et al. [19] attempted to address multiple view clustering via maximizing alignment between the consensus clustering matrix and weighted base partitions.Many impressive methods obtain the final clustering results by using spectral clustering.Constructing a high-quality graph is crucial to obtain good spectral clustering performance.Sun et al. [20] proposed projective multiple kernel subspace clustering (PMKSC) to obtain multiple high-quality similarity graphs and describe individual underlying clustering structures.Wu and Bajwa [21] presented the latent MVSC (LMSC) method, which learned the affinity matrix from the latent subspace to improve robustness.Chen et al. [22] proposed a novel multiview clustering approach (MCLES) by jointly learning a latent embedding space, a robust similarity matrix, and an accurate clustering indicator matrix with a unified optimization framework.
Although these MVSC methods have achieved great success, they still have the following limitations.
Finally, standard spectral clustering [23] is used to obtain the segmentation results.However, these methods may ignore the inherent symmetry of similarity matrices and cannot effectively depict the intrinsic relationship of data points.4) The nuclear norm minimization (NNM) is the tightest convex relaxation of the original rank minimization problem, which is often involved in the optimization process of many MVSC methods of its convexity and simplicity.However, this convex relaxation may result in inferior performance due to the effect of noise, and the solution may seriously deviate from the original solution of the rank minimization problem.
In this article, in order to address the above problems, we propose a new MVSC method low-rank symmetric affinity graph (LSGMC) based on low-rank consistency and a symmetric affinity graph.Fig. 1 shows the proposed method.In the selfexpressiveness framework, the three-factor decomposition with the orthogonal constraint of the coefficient matrix is utilized to ensure the consistency of the clustering structure.Then, the Schatten p regularization constraint and symmetry constraint are employed to learn a symmetric and low-rank coefficient matrix, which can maintain the inherent subspace structure of the high-dimensional data with noisy.In addition, the coefficient matrix is fused to obtain a high-quality similarity matrix.Furthermore, the adaptive information reduction strategy on the coefficient matrix is used to retain valuable content and remove irrelevant information.Finally, the clustering results are obtained by using spectral clustering.Extensive experiments on 11 benchmark datasets are conducted to demonstrate the effectiveness of LSGMC.
The main contributions of our work are summarized as follows.
1) The self-expressiveness and matrix three-factor decomposition are integrated into a unified framework to explore complementary information and maintain the consistency of clustering structure among different views.2) An adaptive information reduction strategy is designed to reduce the impact of redundant information and noise.The fusion mechanism is further used to capture the inherent structure (i.e., angular information of the principal directions) of data points in the original matrix.Then, a high-quality similarity matrix with clear block-diagonal structure is obtained by using these two strategies.
3) The symmetry constraint is imposed on the coefficient matrix learning process to preserve its inherent symmetry and avoid symmetric postprocessing.In addition, the Schatten p-norm is used to replace the nuclear norm to reasonably approximate the rank function.In this way, the learned symmetric low-rank coefficient matrix can accurately represent the clustering structure and encode the discriminant information.It can construct a similarity matrix that is easily separated by using spectral clustering.The remainder of this article is organized as follows.
Section II outlines related work on MVSC.Section III introduces the proposed method and its optimization algorithm.Section IV presents extensive experimental results.Finally, Section V summarizes this article.

II. PRELIMINARIES
In this section, we provide a brief review of multiview clustering, LRR-based MVSC, and Schatten p-norm.Moreover, most of the notations used in this article are described here.

A. Notation
For straightforward representation, most notations used in this article are shown in the Nomenclature.

B. Multiview Clustering
Numerous multiview clustering methods are proposed in recent years [24], [25], [26].Some of these representative methods are reviewed herein.Several methods are devoted to solve the extensive computation generated by large-scale datasets.Li et al. [27] presented a scalable and parameter-free graph fusion framework for multiview clustering.In this method, they try to seek a joint graph compatible across multiple views in a self-supervised weighting manner.Wang et al. [28] proposed a new parameter-free MVSC method.They jointly conduct anchor selection and subspace graph construction with linear time complexity.Some methods utilize tensors to capture the high-order consistent information among multiple views.Wu et al. [29] proposed a novel method to jointly learn optimal affinity matrices in the projected subspace and its intrinsic low-rank tensor for multiview clustering.Wang et al. [30] proposed a new tensor-based multiview clustering by preserving the local affinities of all views and penalizing a Laplacian rank on a learned common subspace.Li et al. [31] introduced a weighted tensor nuclear norm to adaptively assign different weights on singular values of the tensor.This strategy improves the flexibility of the tensor nuclear norm in tensor low-rank approximation.Recently, due to the powerful ability to tackle high-dimensional data and feature abstraction, deep neural networks have inroads into multiview clustering.Du et al. [32] presented a new deep MVSC method based on multilevel representation information in multiview data.It uses multiple deep autoencoders to model nonlinear structure information of multiview data and designs a universal discriminator based on adversarial training to enforce the output of each encoder.Wang et al. [33] proposed a structured multipathway convolutional neural network method for MVSC.It explicitly learns the subspace representations of each view in a layerwise way and explores the consensus information among views through a common connection matrix.However, designing an effective view relation exploration strategy for deep multiview learning that differs from traditional methods is still challenging.In addition, existing deep multiview learning methods fail to provide an explanation for the decision-making of the model.

C. Multiview Subspace Clustering Based on LRR
High-dimensional data, such as images and videos, can often be described as lower dimensional representations.It means that a few parameters can represent the complex structure.Conventional methods, such as the principal component analysis (PCA), assume that high-dimensional data belong to a single low-dimensional space.However, the reality is that high-dimensional data can often be better represented by multiple low-dimensional subspaces.Samples in the same low-dimensional structure often have high similarities.Therefore, clustering results can be obtained by dividing subspaces.
Given a data matrix X, suppose that the sample points are extracted from multiple subspaces, respectively.Subspace clustering aims to divide these sample points into their respective subspaces correctly.Subspace clustering can learn the representation matrix Z (also called the coefficient matrix) via Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
"self-expressiveness." "Self-expressiveness" means that each data point can be represented by a linear combination of other data points in the same subspace; it can be denoted as where D is a dictionary used to represent data linearly, which is usually replaced by X. E is the error term representing the noise in the original data X.A general framework of subspace clustering based on self-expressiveness can be summarized as where (•) and (•) indicate specific regularization strategies.λ > 0 is a parameter that balances the two regularizers.
For a multiview dataset consisting of V different views, X v ∈ R d v ×n represents the data of the vth view, where n is the number of samples and d v is a dimension specific to X v .Equation ( 2) can be naturally extended to MVSC, and its objective function can be defined as min where E v and Z v represent the error term and the coefficient matrix of the vth view, respectively.Different kinds of MVSC methods often adopt different regularization strategies [14].
LRR is one of the well-known strategies, which can recover the subspace structure from the data containing errors (e.g., noise and outliers) [1].This strategy uses the rank function in the regularization term (•).Then, the low-rank minimization can recover the underlying row space to reveal the proper segmentation of data [34].However, the rank function is hard to obtain a solution due to its discreteness.In this case, the nuclear norm can be used as its convex relaxation.Furthermore, for reducing noise interference, the ℓ 2,1 norm can be used in regularization term (•) to enhance column sparsity.Based on the above two strategies, optimization problem (3) can be defined as min After obtaining the optimization result Z v of each view, the similarity matrix S ∈ R n×n is typically constructed by In the end, S is used for spectral clustering [23] to get the final clustering result.

D. Schatten p-Norm
The nuclear norm is one of the most frequently used regularizers to replace the rank function, which implements low rank by encouraging the sparsity of singular values.
The nuclear norm is actually the ℓ 1 norm of the singular value vector, while the rank function is actually the ℓ 0 norm of the singular value vector.The ℓ 1 norm is loose approximation of the ℓ 0 norm, which overpenalizes large values.In other words, the nuclear norm overpenalizes larger values than the rank function [34].
The Schatten p-norm (0 < p < 1) is one of the techniques that replace the nuclear norm to better approximate the rank function.The Schatten p-norm of matrix M ∈ R m×n is defined as where δ i is the ith largest singular value of M. The Schatten p-norm of M to the power p is when p = 1, the Schatten 1-norm is equal to the nuclear norm.If p = 0, we define 0 0 = 0 [see, (7)] as the rank function of M. Therefore, when p → 0, ( 6) is an approximation of the rank of M.

III. PROPOSED METHOD
This section introduces the LSGMC algorithm and its optimization process.First, inspired by the successful application of nonconvex regularization, we introduce the Schatten p-norm to replace the nuclear norm to reduce the rank of the coefficient matrix.Then, we perform three-factor factorization on the coefficient matrices to pursue a consistent low-rank structure among different views.In addition, we enforce the symmetry constraint on the coefficient matrix to preserve its inherent properties during learning.Finally, we design an adaptive information reduction strategy and use the fusion mechanism to reduce noise.In this way, a compact representation of multiple views is obtained, and a high-quality similarity matrix is further constructed from the perspective of spectral clustering.

A. Low-Rank Consistency
The Schatten p-norm can reasonably approximate the rank function [35].Theoretically, the Schatten p-norm can guarantee the accurate recovery of information.Empirical results also show that the Schatten p-norm outperforms the standard NNM [36] and always generates a lower rank solution.Therefore, we replace the nuclear norm with the pth power of the Schatten p-norm.The LRR-based MVSC framework (4) is redefined as where X v , Z v , and E v are the data matrix, the coefficient matrix, and the error matrix of the vth view, respectively.The complementarity and consistency among different views are important to improve the performance of MVSC [37].
Complementarity means that different views provide rich complementary information from different perspectives.In (8), self-expressiveness is utilized to explore the complementary information.Consistency means the consensus among multiple views, i.e., the shared representation structure of different views.Considering that the low-rank structure can recover the underlying data clustering structure to reveal the proper segmentation of data [34], the coefficient matrix (Z v ) of each view can share the same low-rank structure.Inspired by RC-MSC [38], it decomposes Z v into the product of three matrices to maintain consistency where k×k can be regarded as the same representation structure, which is shared by the coefficient matrix (Z v ) of all views.In this way, C ′ can preserve the primary information and effectively promote structural consistency among different views.The purpose of parameter k is to provide a close rather than arbitrary upper bound for the true rank of the coefficient matrix, achieving more accurate representation performance.The Schatten p-norm can also achieve this effect, especially in the case of a larger rank.In addition, the parameter k is manually adjusted in an extensive range without appropriate empirical criteria, which is clumsy and time-consuming [38].In general, a larger k should be chosen, which should be larger than the true rank at least [39], [40], [41].In this case, we can directly use the number of samples n instead of the parameter k to avoid tuning.Therefore, we apply the modified equation ( 9) in (8) and convert it to where L v , R v , and C ∈ R n×n .The orthogonal constraint on L v and R v is intended to prevent trivial solutions.In the first term of (10), for optimization, we use ∥C∥

B. Symmetry Constraint
In MVSC, Z (v)  i j and Z (v) ji represent the similarity of vectors x (v) i and x (v) j in the vth view.Z (v)  i j and Z (v) ji should be consistent from the perspective of graph theory.For this reason, many MVSC methods typically construct similarity matrices for the spectral clustering step by symmetric postprocessing.
However, in practice, the contributions of x (v) i and x (v)  j are not necessarily equal to those of x (v) j and x (v)  i ; simple symmetric postprocessing may lose some inherent characteristics [2].To overcome this problem, we enforce a symmetry constraint on the low-rank coefficient matrix Z v , i.e., Z v = Z T v .In this way, symmetry is maintained throughout the learning process to explicitly obtain a symmetric matrix [42].Moreover, symmetry is beneficial to the consistency of angular information between the row principal directions and the column principal directions [2].The learned symmetric LRR is beneficial for subspace segmentation and improves the clustering performance.The final objective function is defined as follows: where λ > 0 is used to balance the effects of low rank and noise.The low-rank constraint effectively captures the global structure of X and ensures that the coefficients of samples from the same subspace are highly correlated.By decomposing Z v into C, the consistency among views is maintained.The symmetry constraint of Z v guarantees the consistency of coefficients for each pair of data samples.After obtaining the optimized solution Z * v of ( 12), we use it to construct a similarity matrix and then perform spectral clustering to obtain the final clustering result.

C. Information Fusion and Reduction
The key step of spectral clustering is to construct a high-quality similarity matrix.It is often not satisfied to construct directly from the symmetric coefficient matrix Z * v .The reason is that angular information of its principal directions is not utilized.There are few errors in rows or columns in the coefficient matrix as low-rank so that the angular information remains largely unaffected.Therefore, angular information can be used to construct the similarity matrix.To this end, we use the fusion mechanism of the coefficient matrix [2].Specifically, we calculate , and then use the angular information of all row vectors of matrix M to define the similarity matrix W where m i and m j denote the ith and jth rows of matrix M, respectively.The square ensures that each value is positive in W for spectral clustering [43].However, this fusion mechanism directly performs subsequent processing after accumulating each coefficient matrix (i.e., Z * = V v=1 Z * v ), which may ignore the redundant information after the accumulation and result in poor clustering performance.In practice, a certain proportion of the information in the coefficient matrix usually characterizes the most significant structural information embedded in the matrix.Therefore, we design an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
adaptive information reduction strategy on Z * to reduce redundant information and retain meaningful structural information.In general, the more the samples, the more the redundant information.An intuitive way for setting the information retention proportion is that the ratio should be inversely proportional to n.Thus, we define it as follows: where η is the retained ratio.n is the number of samples in the dataset.α 1 , α 2 , and α 3 are constants, which are set to the same for different datasets.Specifically, we accumulate the items in each column of Z * in descending order of value until it reaches η% of the sum of the current column.These η% items will be retained, and the remaining items are set to 0. This strategy prunes the weak connections, increasing the sparseness while maintaining connectivity.At the same time, the affinity graph quality is less affected by redundant information and inaccurate similarity measures.

D. Optimization
For convenience, we introduce an auxiliary variable J v , and the objective function in (12) can be converted to the following form: Inspired by the inexact augmented Lagrange multiplier (ALM) [44], we rewrite (15) as minimizing the following augmented Lagrangian function: where are the Lagrange multipliers (v = 1, 2, . . ., V ), and µ > 0 is actually the penalty parameter.We first update C by fixing other variables; the optimization problem related to C can be defined as min where B is entirely independent of C. B can be eliminated, and problem ( 17) is equivalent to the following form: where To analyze its solution, we first introduce the following theorem.
Theorem 2: The optimal solution C * to problem ( 18) is U V T with = diag(δ 1 , . . ., δ r ), where U and V are the left and right singular vector matrices of C, respectively.δ i is obtained by the optimal solution of the following problem [45]: where σ i is the ith singular value of C. Problem ( 19) can be effectively solved by the generalized soft-thresholding (GST) algorithm [46].The proposed algorithm for updating C is outlined in Algorithm 1.
Note that the other variables in formula ( 16) are independent for each view.Therefore, we can solve them separately, and formula ( 16) is redefined as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
To solve problem (20), we separate it into five subproblems that can be alternatively optimized by fixing the other variables.
1) Subproblem Z v : By taking the derivative of ( 20) with respect to Z v and setting it to zero, we have where , and I denotes the identity matrix.2) Subproblem J v : The optimization of ( 20) with respect to J v is min where ) and B are entirely independent of J v .Taking the derivative with respect to J v and setting it to zero, then we obtain It is easy to see that J v tends to be more symmetrical with such an update rule.
3) Subproblem E v : The optimization of (20) with respect to E v can be reformulated as min Mathematically, problem (24) can be easily solved via [47], and the ith column of 4) Subproblem L v and R v : The optimization of ( 20) with respect to L v and R v is defined as min We first consider L v individually.Similar to the process of (17), problem (26) can be easily constructed in the following form: min where In fact, this is the classical orthogonal Procrustes problem, and the solution exists in the closed form as follows [48]: where M v and N v are the left and right singular value matrices obtained by the SVD of L v , respectively.The optimization of
22 Apply W to perform spectral clustering; Output: Clustering results R v is similar to the optimization of L v .Thus, (26) with respect to R v can be redefined as follows: min where where M ′ v and N ′ v T are the left and right singular value matrices obtained by the SVD of R v , respectively.
5) Subproblem Multipliers: Three Lagrangian multipliers , and Q 3 v and the penalty parameter µ are updated by where ρ controls the rate of convergence.The stop criterion is ∥Z v − C∥ ∞ < ε.
The optimization procedure of the proposed LSGMC method is summarized in Algorithm 2.

E. Complexity and Convergence Analysis
There are several computation-intensive steps in Algorithm 2. Specifically, the primary complexity of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
updating C is the SVD operator on the n × n matrix, and its computational complexity is O(n3 ).In addition, the formulas for updating L and R are similar, so they both require O(V n 3 ) for all views.For some datasets with d v > n, the Woodbury formula [49] is unnecessary.Therefore, the complexity of Z for all views is O(V n 3 ) due to the matrix inverse operator.The update of E has a closed-form solution, and its complexity is for all views.The last steps include the SVD of Z * and spectral clustering for W, whose complexity is O(n 3 + n 2 ).Note that V ≪ min(n, d max ), and the total complexity of our algorithm is (15) is not a joint convex problem for all variables, it is not easy to guarantee the convergence in general.We solve problem (15) in Algorithm 2. The convergence analysis of Algorithm 2 is provided in the Supplementary Material.In addition, we empirically verify the convergence of Algorithm 2 in Section IV.

IV. EXPERIMENTS
In this section, we perform clustering experiments on 11 benchmark datasets and compare LSGMC with ten stateof-the-art methods.We first introduce the datasets, comparison methods, and parameter settings.Then, we evaluate the clustering performance, computational efficiency, and parameter sensitivity of LSGMC.The source code can be downloaded at https://github.com/lanbiolab/LSGMC.

A. Datasets
We select 11 widely used public datasets in these experiments.The statistics of these datasets are summarized in Table I.A brief description of each dataset is provided in the following.
1) 3Sources 1 : This dataset contains 169 articles that were manually sorted into six categories.Each article is collected from three online news sources (BBC, Reuters, and Guardian), and each news source is treated as a single view of an article.

B. Competitors
To evaluate the performance of the proposed method, we compare it with the following ten state-of-the-art multiview clustering methods.
1) LMSC [55]: This method clusters data points with latent representation and simultaneously explores underlying complementary information from multiple views.2) CSMSC [14]: A novel MVSC method, where consistency and specificity are jointly exploited for subspace representation learning.3) LT-MSC [3]: A low-rank tensor-constrained MVSC model, which introduces a low-rank tensor constraint to explore the complementary information from multiple views.4) SMVSC [56]: It proposes a scalable MVSC method that combines anchor learning and graph construction in a unified optimization framework.5) LMVSC [57]: A large-scale MVSC algorithm with linear order complexity.

C. Parameter Setting and Evaluation Metrics
Two parameters need to be tuned in the proposed algorithm, i.e., λ and p.The regularization parameter p is chosen within the range of [0.01, 0.1, 0.2, 0.3, . . ., 0.9], and the tradeoff parameter λ is selected from [10 −5 , 10 −4 , . . ., 10 2 ].We use the grid search method to select the optimal combination for different datasets.In addition, we set the constants of the information reduction strategy as α 1 = 0.032, α 2 = 0.018, and α 3 = −1.42 for all datasets.Experiments are implemented in MATLAB on a PC with a 3.6-GHz CPU and 32-GB RAM.
In the experiments, all MATLAB implementations of the comparison algorithm are downloaded from the authors' Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
website.The hyperparameters are set according to the recommendations of the corresponding paper.To compare the performance of different methods, the clustering results are measured five commonly used evaluation metrics: ACC (accuracy), normalized mutual information (NMI), precision, F-score, and adjusted rand index (ARI).For these metrics, the higher value indicates better clustering performance.The best results are highlighted in bold, and the second results are underlined.

D. Performance Comparison
The clustering results of LSGMC and all comparison methods for eight datasets are reported in Table II.We run each algorithm ten times with the optimal parameters and report the means and standard deviations of the evaluation metrics.In particular, for all algorithms using k-means, we repeat the k-means process 200 times under random initialization to be fair and reduce randomness.From the experimental results in Table II, we can make the following noteworthy conclusions.In all cases, the LSGMC method achieves the best clustering performance compared with other state-of-the-art multiview clustering methods.For example, ACC increased by at least 4.32%, 4.20%, 4.99%, 7.62%, 9.37%, 5.58%, 3.16%, and 2.31% compared with the second-best methods for the 3Sources, Caltech101-7, Handwritten, MSRC-v1, COIL-20, Flowers, ProteinFold, and Extended YaleB datasets, respectively.It can be found that the NMI improvements of LSGMC over the second-best method are 3.28%, 5.64%, 2.99%, 10.85%, 4.97%, 7.08%, 0.18%, and 2.42%, respectively.It is worth noting that LSGMC achieves 100% for all metrics in the COIL-20 dataset.Compared with other methods except for the second-best method, LSGMC has marked improvement in the YaleB dataset.
In addition, we also evaluate the performance of LSGMC in three large-scale datasets.As the limitation of computational resources, the top-five best methods in the previous experiment have been selected as compared methods, and all methods run one time.The experimental results are shown in Table III.It can be observed that LSGMC still maintains excellent clustering performance on these large datasets; the ACC improvements of LSGMC over the second-best method are 5.36%, 3.81%, and 3.58%, respectively.It demonstrates that LSGMC is more robust to outliers, which can accurately capture the structure of highly redundant data.These results also show that the LSGMC can construct a high-quality affinity graph with a clear discriminant structure by maximizing the consistency of the underlying clustering structure and fully utilize a variety of information embeddings in each view.
To study the possible reasons for the better performance of LSGMC, we compared the similarity and differences with the comparison methods as follows.MvCSD, RC-MSC, CSMSC, and our method are based on subspace clustering, and all reveal the underlying consensus structure between views through a shared consistent matrix.However, they use the nuclear norm, while LSGMC uses the Schatten p-norm, which can more accurately approximate the rank function.In addition, MvCSD uses a parameter to control sparsity,

E. Results Visualization
The ideal similarity matrix usually has a clear blockdiagonal structure, which means that the samples in the subspace are closely connected in the affinity graph.In contrast, the samples outside the subspace are not connected to each other.We test the difference between matrix Z * and similarity matrix W with information fusion and reduction.The visualization result is shown in Fig. 2. From the top of Fig. 2, it can be seen that Z * (without information fusion and reduction) has a roughly block-diagonal structure with noise.From the bottom of Fig. 2, it can be found that W (with information fusion and reduction) has a clear block-diagonal structure with little noise.It verifies the effectiveness of information fusion and reduction.Moreover, this result demonstrates that the symmetry constraint and the low-rank constraint are crucial for clearly revealing the underlying clustering structure and obtaining a high-quality similarity matrix.
To describe the clustering performance more intuitively, we use t-SNE [60] to visualize the 3Sources and Caltech101-7 datasets, which is shown in Fig. 3.The results show that the LSGMC is better than MLRR and RC-MSC on these two Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

F. Time Cost and Convergence
To evaluate the efficiency of the LSGMC algorithm, we perform ten times on each method and record the average computation time in Table IV.As shown in the table, NESC is the fastest algorithm because it uses an efficient optimization to reduce computational costs and has a time complexity near O(n).Fortunately, the time cost of LSGMC is comparable in most cases, which is less than many methods.
The LSGMC also converges quickly in practical applications.As shown in Fig. 4, we set the right y-axis, the left y-axis, and the x-axis as the reconstruction error, Acc, and the iteration number, respectively.The reconstruction error is defined as The result shows that LSGMC can usually converge within 15 times, and the number of optimization iterations is generally between 1 and 30.Some datasets (Handwritten, COIL-20, and YaleB) can yield the best performance in the first iteration; this also demonstrates the good empirical convergence of the proposed LSGMC.

G. Parameter Sensitivity
In LSGMC, there are two tuning parameters: λ is used to balance the influence of noise, and p is the parameter of the Schatten p-norm.We use the grid search technique to search for λ and p, and Fig. 5 shows the variation of ACC with these parameters on four datasets.In the figure, (a) and (b) show that the λ cannot be too large or small in general.In addition, we also find that the proposed method is not sensitive to the selection of p, and ACCs are stable within a range of p. Generally, the Schatten p-norm can get close to the rank as decreasing p.Although the parameter λ and p have an important effect on performance, as shown in (c) and (d), LSGMC still outperforms its compared methods on some datasets, which also demonstrates the strong stability of the proposed model.Through intensive parameter tuning, we empirically found that λ and p are located within the ranges [1e − 5, 1e − 4, . . ., 100] and [1, 0.01, 0.1, 0.2, . . .], respectively.

H. Ablation Analysis
Several important strategies affect performance in our model.Here, we experimentally evaluate the impact of four Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.factors, and the clustering results in terms of ACC on eight datasets are shown in Fig. 6: 1) LSGMC-N indicates that the nuclear norm is used as the convex relaxation of the rank function instead of Schatten p-norm; 2) LSGMC-A represents LSGMC without the adaptive information reduction strategy; 3) LSGMC-B represents LSGMC without the coefficient matrix fusion mechanism but uses the general similar matrix construction method as (5); and 4) LSGMC-C represents using C instead of Z * for subsequent processing in LSGMC.For all three methods, the two parameters λ and p are searched in the same range as the LSGMC.The result shows that these three strategies are indispensable to ensure the effectiveness of the proposed method.The Schatten p-norm can approach the low rank more accurately.The adaptive information reduction strategy can remove redundant information and inaccurate similarity measures.The coefficient fusion mechanism can obtain a compact representation shared by multiple views and comprehensively describe the membership of multiview data.

V. CONCLUSION
In this article, we propose a new LSGMC model for MVSC by learning a low-rank symmetric coefficient matrix.The method fully exploits the complementarity and consistency among views and maintains angular information and symmetry during optimization.Then, more comprehensive sample affinity information is extracted by using the fusion mechanism and information reduction strategy.We conduct performance comparison, parameter sensitivity, ablation experiment, complexity, and convergence analysis on benchmark datasets, and the results highlight the superior performance of the proposed method.A large number of experimental results show that the method runs fast and consistently outperforms competitors in multiple clustering indicators.In future work, we plan to consider using tensors to maintain high-order information.In addition, we would like to focus on integrating the clustering process into a unified framework and investigating adaptive weights for promising performance.

Fig. 1 .
Fig. 1.Framework of proposed LSGMC method.(a) Coefficient matrices {Z v } V v=1 are generated from multiview data by self-representation.(b) In the optimization step, the low-rank symmetric coefficient matrix Z * v

p
Sp to replace ∥Z v ∥ p Sp equivalently and prove it in Theorem 1.Theorem 1: Let L, R, and C be matrices with compatible dimensions.Both L and R have orthogonal constraints, i.e., L T L = I and R T R = I.Then, we have Let the SVD of C be L R T ; then,LCR T = L(L R T )R T = (LL) (RR) T ,which is actually the SVD of LCR T .It can be seen that the singular values of C and LCR T are the same.Thus, the p-power of their singular values is also the same.According to the definition of the p-power of the Schatten p-norm, we have ∥C∥ δ represents the singular value.

Fig. 6 .
Fig.6.Ablation analysis experiments of the LSGMC: 1) LSGMC-N indicates that the nuclear norm is used as the convex relaxation of the rank function instead of Schatten p-norm; 2) LSGMC-A represents LSGMC without the adaptive information reduction strategy; 3) LSGMC-B represents LSGMC without the coefficient matrix fusion mechanism but uses the general similar matrix construction method; and 4) LSGMC-C represents using C instead of Z * for subsequent processing in LSGMC.
Three matrices obtained by matrix decomposition of Z v , where C is the consensus of all Z v .W ∈ R n×n Affinity matrix.v and V vth view and number of views.λ Balance parameter.η Ratio of information preservation.∥•∥ 2,1 and ∥•∥ F ℓ 2,1 − norm and Frobenius norm.∥•∥ * and ∥•∥ Sp 8, µ max = 10 8 , ε = 0.01 31); 15Check the convergence conditions: ∥Z v − C∥ ∞ < ε 18 Perform the information reduction strategy on Z * ; 19 Calculate the skinny SVD: Flowers 4 [53]: This dataset contains 1360 samples with 17 flower categories.Each category has 80 images and seven views.7) ProteinFold 5 : This dataset contains 694 protein domains with 27 classes and 12 views.8) Extended YaleB 6 : This is a three-view dataset of face images.It has ten categories with 64 near-frontal images under different lighting conditions.9) Caltech101-all [50]: It contains images of the objects with 101 categories.10) SUNRGBD [54]: It contains 10 335 indoor scene images in 45 classes.11) YoutubeFace: It is produced by YouTube, and we randomly select 10 000 instances with 31 categories.

TABLE II CLUSTERING
PERFORMANCE (%) OF DIFFERENT CLUSTERING ALGORITHMS ON EIGHT DATASETS (MEAN ± STANDARD DEVIATION)

TABLE IV COMPUTATIONAL
TIME (IN SECONDS) OF EACH METHOD ON DIFFERENT DATASETS