Factorized graph matching

Graph matching plays a central role in solving correspondence problems in computer vision. Graph matching problems that incorporate pair-wise constraints can be cast as a quadratic assignment problem (QAP). Unfortunately, QAP is NP-hard and many algorithms have been proposed to solve different relaxations. This paper presents factorized graph matching (FGM), a novel framework for interpreting and optimizing graph matching problems. In this work we show that the affinity matrix can be factorized as a Kronecker product of smaller matrices. There are three main benefits of using this factorization in graph matching: (1) There is no need to compute the costly (in space and time) pair-wise affinity matrix; (2) The factorization provides a taxonomy for graph matching and reveals the connection among several methods; (3) Using the factorization we derive a new approximation of the original problem that improves state-of-the-art algorithms in graph matching. Experimental results in synthetic and real databases illustrate the benefits of FGM. The code is available at http://humansensing.cs.cmu.edu/fgm.


Introduction
Graph matching plays a central role in solving many correspondence problems in computer vision such as shape matching [4], object categorization [15], feature tracking [23,33], symmetry analysis [10,22] and action recognition [6,20]. Mathematically, pair-wise graph matching is formulated as the quadratic assignment problem (QAP) [27]. Unlike the linear assignment problem, which can be efficiently solved with the Hungarian algorithm [7], QAP is known to be NP-hard [19] and an exact optimal algorithm can only work for very small graphs. Therefore, the main body of research in QAP has focused on devising more accurate and faster algorithms to solve it approximately.
Although extensive research has been done for decades, graph matching is still a challenging problem mainly due to two reasons: (1) In general, the objective function is nonconvex and prone to local minima; (2) The constraints that the solution has to satisfy are combinatorial. While there Figure 1. Matching two coffee mugs with 5 and 6 features respectively. The original pair-wise affinity matrix is of size 30×30. Our algorithm exploits the particular structure of the affinity matrix and is able to factorize it as a Kronecker product of four smaller matrices. The top two matrices of size 5 × 7 and 6 × 8 represent the structure of the graphs in each image. The lower two matrices encode the affinities for nodes (5 × 6) and edges (7 × 8).
In this paper, we show that for most pair-wise graph matching problems the affinity matrix can be factorized as a Kronecker product of smaller matrices. Based on this fact, we proposed factorized graph matching (FGM), a novel framework for interpreting and optimizing graph matching problems. The benefits of our approach are three fold: (1) It avoids the computation of the cumbersome affinity matrix and hence potentially allows for a more efficient implementation, especially for large graphs; (2) Many graph matching methods can be understood as an instance of this factorization. This allows understanding commonalities and differences among many pair-wise graph matching problems; (3) The factorization leads to a new approximation of the graph matching problem that improves state-of-theart approaches. Fig. 1 illustrates an example of matching two coffee mugs using FGM. Note that FGM factorizes the large 30 × 30 affinity matrix into four smaller ones.

Previous work
This section reviews the problem formulation of graph matching and discusses recent advances in solving the QAP in graph matching.

Problem formulation of graph matching
We denote (see notation 1 ) a graph by G = {P, Q, G}, where P = [p 1 , · · · , p n ] ∈ R dp×n and Q = [q 1 , · · · , q m ] ∈ R dq×m are the feature matrices computed for nodes and edges 2 respectively. The topology of G is specified by a node-edge incidence matrix G ∈ {0, 1} n×m , where g ic = g jc = 1 if the i th and j th nodes are connected by the c th edge, and zero otherwise. For instance, Fig. 2a shows a pair of synthetic graphs and Fig. 2cd illustrate their incidence matrices.
Suppose that we are given a pair of graphs, We compute two affinity matrices, K p ∈ R n1×n2 and K q ∈ R m1×m2 , for measuring the similarity of each node and edge pair respectively. More specifically, κ p i1i2 = φ p (p 1 i1 , p 2 i2 ) measures the similarity between the i th 1 node of G 1 and the i th 2 node of G 2 , and κ q c1c2 = φ q (q 1 c1 , q 2 c2 ) measures the similarity between the c th 1 edge of G 1 and the c th 2 edge of G 2 . The problem of graph matching consists in finding a correspondence between the nodes of G 1 and G 2 that maximizes the following score of global consistency: where X ∈ {0, 1} n1×n2 denotes the node correspondence, i.e., x i1i2 = 1 if the i th 1 node of G 1 corresponds to the i th 2 node of G 2 . In most cases, X is constrained to be a one-toone matching, i.e., X1 n2 ≤ 1 n1 and X T 1 n1 ≤ 1 n2 . It is more convenient to write J gm (X) in a quadratic form, x T Kx, where x = vec(X) ∈ {0, 1} n1n2 is an indicator vector and K ∈ R n1n2×n1n2 is computed as follows: For instance, Fig. 2e-g illustrates the composition of the affinity matrices. With these notations, the goal of graph matching is to optimize the following QAP: the scalar in the i th row and j th column of the matrix X. All non-bold letters represent scalars. 1 m×n , 0 m×n ∈ R m×n are matrices of ones and zeros. In ∈ R n×n is an identity matrix. x p = p |x i | p denotes the p-norm. X 2 F = tr(X T X) designates the Frobenious norm. vec(X) denotes the vectorization of matrix X. diag(x) is a diagonal matrix whose diagonal elements are x. X • Y and X ⊗ Y are the Hadamard and Kronecker products of matrices. {i : j} lists the integers, {i, i + 1, · · · , j − 1, j}. eig(X) computes the leading eigen-vector of X. 2 In general, the edge feature can be asymmetrical, i.e., the feature used However, the symmetrical edge feature can express a wide range of graph matching problems. For instance, the pairwise distance and the absolute angle from the horizontal line both belong to this class of edge feature.

Advances in graph matching
Over the past three decades, a myriad of approximations to solve the QAP in graph matching have been proposed in computer vision and machine learning (see [12,30] for a survey). These methods can be broadly categorized in two types based on the objective to be maximized: tr(A 1 XA 2 X T ) and x T Kx.
The first case corresponds to maximizing a trace-form objective function, tr(A 1 XA 2 X T ), where A 1 , A 2 ∈ R n×n are the weighted adjacency matrices of the graphs and X ∈ {0, 1} n×n is a permutation matrix. In the literature of operation research [27], this is known as Koopmans-Beckmann's QAP, which is a particular case of Lawler's QAP maximizing x T Kx when K = A 2 ⊗ A 1 . In the past two decades, various continuous relaxations have been proposed to solve this type of problems. Umeyama [36] proposed the first spectral algorithm by computing the eigenvectors of the adjacency matrices. Almohamad and Duffuaa [3] proposed to optimize an l 1 -norm objective function by linear programming. The most related work to ours is the one from Zaslavskiy et al. [39], in which a convex-concave approach was proposed to estimate the correspondence in an iterative manner. Despite its successfulness for matching characters and other visual objects with relatively simple structure, the graphs used by these methods still lack flexibility to match complex structures encountered in realistic computer vision problems.
In the more general case, the problem is formulated as the maximization of a quadratic cost x T Kx, where K ∈ R n1n2×n1n2 encodes the pair-wise similarity between nodes and edges. In the past decade, much effort has been devoted to the development of approximate methods to solve the more general QAP. Gold and Rangarajan [21] proposed the graduated assignment algorithm to iteratively solve a series of linear approximations of the cost function using Taylor expansions. Leordeanu and Hebert [24] proposed an efficient approximation using an spectral relaxation. Cour et al. [13] presented a more general scheme that incorporates affine constraints in the spectral relaxation, thereby obtaining better approximation of the original problem. Van Wyk and van Wyk [37] proposed to iteratively project the approximate correspondence matrix onto the convex domain of the desired integer constraints. Torresani et al. [35] designed a complex objective function which can be efficiently optimized by dual decomposition. As a general tool for approximating combinatorial problems, semi-definite programming [34,31] was also used to approximate graph matching. Recently, Leordeanu et al. [25] proposed an integer projection algorithm to optimize the objective function in an integer domain. In addition to optimization-based work, probabilistic frameworks [11,40] were shown to be useful for interpreting and solving graph matching problems. In our work, we concentrate on solv- ing the most general type of graph matching problem using optimization techniques.

Factorized graph matching (FGM)
It is well known that the QAP (Eq. 1) is one of the most difficult combinatorial optimization problems. In general, instances of size n > 20 cannot be exactly solved in practical time. Many methods have been proposed to compute an approximate solution. In particular, most efforts focus on maximizing J gm (X) by relaxing the binary constraints. For instance, a popular relaxation is to constrain X as a doubly stochastic matrix [11,21,37,40], which is the convex hull of permutation matrices. Though the constraint can be relaxed to be convex, we still need to tackle a hard non-convex quadratic programming since K is not necessarily negative definite.
To be able to derive a better optimization scheme for addressing the non-convex issue, this section exploits the underlying structure of K. In particular, K can be factorized into smaller matrices. With this new factorization of K, many graph matching methods can be re-interpreted in a coherent manner. Consider the synthetic graph shown in Fig. 2. Our main intuition relies on two observations. First, the large affinity matrix, K ∈ R n1n2×n1n2 is divided into n 2 -by-n 2 smaller blocks K ij ∈ R n1×n1 . Some of K ij s contain only zero-value elements and their positions are indexed by : m 2 } is the index of the edge connecting the i th and j th nodes of G 2 (i.e., g 2 ic = g 2 jc = 1). Based on these two observations, and after some linear algebra, it can be shown that K can be factorized as: where H1 = [G1, Observe that this factorization decouples the graph struc-ture (H 2 ⊗ H 1 ) from the pairwise feature (L). To the best of our knowledge, Eq. 2 is the first time that K is factorized as products of G 1 , G 2 , K p and K q . As we will see in the rest of the paper, this will have important implications for our graph matching algorithm. This closed-form paves the way to approaching the graph matching problem by manipulating the smaller and denser L instead of the very large and sparse K. Plugging the factorization of K into J gm (X) leads to an equivalent trace-form objective function: Observe that L can always be factorized (e.g., SVD) as . Substituting it into Eq. 3 yields 3 an equivalent trace form of J gm (X): where At this point, it is important to notice that Eq. 3 and Eq. 4 can represent many graph matching methods in a unified manner.
Spectral relaxation: Suppose L has a rank-1 structure, i.e., c = 1 and L = uv T . Then the kernel matrix K can be factorized as K = A 2 ⊗ A 1 . Therefore, the solution of spectral matching algorithm using eigen-decomposition [24] can be efficiently computed as eig(K) = eig(A 2 ) ⊗ eig(A 1 ). In addition, we can use Umeyama's spectral algorithm [36] to find the approximate solution by maximizing tr(A 1 XA 2 X T ) subject to XX T = I.
Edge matching: Observe that both H T 1 XH 2 and L have a 2-by-2 block structure and their top-left components are G T 1 XG 2 and K q respectively. Recall that G T 1 XG 2 ∈ {0, 1} m1×m2 encodes the correspondence between edges. Intuitively, the goal of maximizing Eq. 3 is to seek for the edge-edge correspondence matrix G T 1 XG 2 such that G T 1 XG 2 • G T 1 XG 2 is as correlated as possible with K q . The idea of matching edges has also been used in the probabilistic matching algorithm [40], where Zass and Shashua proposed to maximize the correlation between X and G 1 K q G T 2 . Unified view: Eq. 4 reveals the connection between two types of graph matching problems, the less general one [3,36,39] that maximizes tr(A 1 XA 2 X T ), versus the more general one [11,13,21,24,25,37,40] that maximizes x T Kx. In particular, maximization of x T Kx can be equivalently cast as the maximization of the sum of c traces tr(A 1 i XA 2 i X T ), where A 1 i and A 2 i can be interpreted as adjacency matrices. In the special case when c = 1, the two types of problems are equivalent.

Optimization for factorized graph matching
Due to its combinatorial nature, Eq. 1 is usually approached by a two-step scheme: (1) solving a contiguously relaxed problem and (2) rounding the approximate solution to a binary one. Conventional methods perform these two steps independently. As mentioned in [25,39], however, this kind of separate treatment will inevitably cause accuracy loss, especially in the rounding step which is independent of the cost function (Eq. 1). Inspired by [29,39], we address these two issues in a coherent manner by iteratively optimizing an interpolation of two relaxations. This new scheme has three theoretical advantages: (1) The optimization performance is initialization-free; (2) The final solution is guaranteed to converge at an integer one and therefore no rounding step is needed; (3) The iteratively updating procedure resembles the idea of numerical continuation methods [2], which have been successfully used for solving nonlinear systems of equations in decades.

A convex relaxation
In this section, we introduce a convex relaxation for Eq. 1 assuming X is orthogonal and using the properties of the new factorization.
Strictly speaking, the X satisfying the constraint in Eq. 1 is not a permutation matrix when n 1 = n 2 . However, we can always slightly change the problem setting by introducing n 2 − n 1 dummy nodes in 4 G 1 . As a strict permutation matrix, X must also be an orthogonal matrix, i.e., X T X = XX T = I n2 . This fact motivates the following relaxation: Observe that due to the orthogonal constraints, C(X) can be considered constant. In addition, maximizing J vex (X) is a convex problem because its Hessian with respect to vec(X), is always negative semi-definite.

A concave relaxation
In this section, we introduce a concave relaxation for Eq. 1 assuming X satisfies the integer constraint.
From Eq. 2, we know that L is composed by four parts 4 Without loss of generality, let's assume n 1 ≤ n 2 .
Therefore, J gm (X) can be expanded in the following way: The integer constraint in Eq. 1 implies X, G T 1 X and XG 2 are all binary matrices, from which we know that it is equivalent [28,39] to replace the quadratic terms X • X, G T 1 X • G T 1 X and XG 2 • XG 2 by the linear ones X, G T 1 X and XG 2 respectively. This fact leads to the following relaxation: Maximizing J cav (X) is a concave problem because its Hessian, (G 2 ⊗ G 1 ) diag(vec(K q ))(G 2 ⊗ G 1 ) T , is positive semi-definite if the edge affinity is positive (i.e., K q ≥ 0).

A path-following strategy
In this section, we describe a path-following strategy for optimizing Eq. 1. Inspired by [39], we approach the nonconvex QP by iteratively optimizing a series of the following sub-problems: where α ∈ [0, 1] is a tradeoff between the convex relaxation J vex (X) and the concave one J cav (X). When α = 0, the problem is a convex optimization problem which has a global optimal solution no matter the choice of the initialization. When α = 1, the problem is a concave optimization problem which always leads to an integer solution [5,28]. The process starts with α = 0 and successively increasing α until 1. Fig. 3 illustrates the procedure of optimizing a graph matching problem using this strategy. In Fig. 3a, we demonstrate the objective functions J α and J gm with respect to the change of α. Note that there is a turning point around α = 0.12 in the curve of J α . This is because at this point the two relaxations achieve the same value, i.e., J vex = J cav . As α → 1, the values of J gm , J α and J cav are getting closer to each other, and meanwhile, X is turning into a binary matrix (Fig. 3c). For a specific α, we optimize J α (X) taking the Frank-Wolfe's algorithm (FW) [17,25,39], a simple yet powerful method for nonlinear programming. FW successively update the solution as X * = X 0 + λY given an initial X 0 . At each step, it needs to compute two components: (1) the optimal direction Y ∈ R n1×n2 and (2) the optimal step size λ ∈ [0, 1]. To compute Y, we solve the following linear programming using the Hungarian algorithm: where the gradients can be efficiently computed using matrix operation: And the line search for the optimal λ can be found in closed form by solving:

Other implementation details
A similar path-following strategy was proposed in [39] and its performance over the state-of-the-art methods has been therein demonstrated for solving a less general graph matching problem (i.e., tr(A 1 XA 2 X T )). We performed an extensive study of using this strategy for solving the most general graph matching problem (i.e., x T Kx) and we empirically found that it can be improved with the following steps: Convergence: Although the FW algorithm is easy to implement, it converges sub-linearly. To get faster convergence speed while keeping its advantages in efficiency and low memory cost, we adopt a modified Frank-Wolfe (MFW) [18] to find a better searching direction Y by a convex combination of previously obtained solutions. As it is shown in Fig. 3b, MFW converges much faster than FW.
The concave-convex structure: J α (X) is naturally divided into a concave part and a convex one. To take the advantage of this structure, we adopt the concave-convex procedure (CCCP) [38] that approximates a non-convex objective function by a series of linearizations of the concave part given the current solution. In practice, we found that CCCP outperformed an individual FW in the case when J α (X) is close to a convex one, i.e., α is small. However, the performance of CCCP would downgrade as α gets larger due to the increasing loss in the approximation of the concave part. For instance, Fig. 3b compares CCCP with FW and MFW for optimizing J α (X). CCCP outperforms MFW when α = 0.08. However, MFW converges fastest for α = 0.30. Therefore, we adopt CCCP only in the beginning steps when α is smaller than a manually defined threshold η.
Local vs global: Although the path-following strategy returns an integer solution by smoothly tracking the local optima in a convex space, it does not guarantee to obtain the global optimal of the non-convex objective function. An important reason is that at each step, it locally optimizes over J α (X) instead of the global one J gm (X). And it is possible that J α (X) gets improved while J gm (X) gets worse. In order to escape from this phenomenon, we keep increasing the global score of J gm (X) during the optimization by discarding the bad temporary solution that worsens the score of J gm (X) and computing an alternative one by applying one step of FW for optimizing J gm (X). This refinement is analogous to the usage of FW in [25]. As shown in Fig. 3a, the performance of the path-following algorithm can be greatly improved by only optimizing over J gm three times.
Algorithm 1: Factorized graph matching input : K p , K q , G 1 , G 2 , δ, η output: X 1 Initialize X to be a doubly stochastic matrix; 2 Factorize L = UV T with SVD; 3 for α = 0 : δ : 1 do Path-following 4 if α ≤ η then 5 Optimize Eq. 5 via CCCP to obtain X * ; 6 else 7 Optimize Eq. 5 via MFW to obtain X * ; 8 if J gm (X * ) < J gm (X) then 9 Optimize Eq. 1 via one step of FW to obtain X * ; 10 Update X ← X * ; Algorithm 1 summarizes the workflow of our algorithm. The initial X can be an arbitrary doubly stochastic matrix. The complexity of our algorithm can be roughly calculated as O T (τ hun + τ ∇ + τ λ ) + τ L , where T is the number of iterations for the FW and MFW, and τ L = (n 1 + m 1 )(n 2 + m 2 ) 2 is the cost of computing the SVD of L. The Hungarian algorithm can be finished in τ hun = max(n 3 1 , n 3 2 ). The gradient of ∇J α and the line search of λ incur the same computational cost, τ ∇ = τ λ = (n 1 + m 1 )(n 2 + m 2 ).

Experiments
This section reports experimental results on three datasets (one synthetic and two real) and compares our method against seven state-of-the-art algorithms: Graduated assignment (GA): GA [21] performs gradient ascent on a relaxed Eq. 1 driven by an annealing schedule. At each step, it maximizes a Taylor expansion of the non-convex QP around the previous approximate solution. The accuracy of the approximation is controlled by a continuation parameter, β t+1 ← αβ t ≤ β max . In all experiments, we set α = 1.075, β 0 = .5 and β max = 200.
Spectral matching (SM): SM [24] optimizes a relaxed problem of Eq. 1 that drops the affine constraints and introduces a unit-length constraint on x, that is: The globally optimal solution of the relaxed problem is the leading eigenvector of K.
Spectral matching with affine constraints (SMAC): SMAC [13] adds affine constraints to the SM problem maximizing: The solution is also an eigenvalue problem.
Integer projected fixed point method (IPFP): IPFP [25] is based on FW. It can take any continuous or discrete solution as inputs and iteratively improve the solution. In our experiments, we implemented two versions: (1) IPFP-U, that starts from the same initial X as our method; (2) IPFP-S, that is initialized by SM.
Probabilistic graph matching (PM): PM [40] designs the following convex objective function that can be globally optimized by applying the Sinkhorn's algorithm [32]: where D(Y X) denotes the relative entropy error and Y ∈ R n1×n2 is calculated by marginalizing K. It is worthwhile pointing out that with our notation, Y can be computed in a matrix form as Y = G 1 K q G T 2 . Re-weighted random walk matching (RRWM): RRWM [11] introduces a random walk view on the problem and obtains the solution by simulating random walks with re-weighting jumps enforcing the matching constraints on the association graph. We fixed its parameters α = 0.2 and β = 30 in all experiments.
We used existing code from the author's websites for all methods. Notice that all methods need a post-processing step to discretize X. To make a fair comparison, we applied the Hungarian algorithm to make this discretization in all methods. The parameters for our method were fixed to δ = 0.01 and η = 0.1 in all experiments. The code was implemented in Matlab on a laptop platform with 2.4G Intel Core 2 Duo and 4G memory. FGM was able to obtain the solution within a minute for graphs with 50 nodes.
We evaluated both the matching accuracy and the objective score for the comparison of performance. The matching accuracy, tr(X T alg Xtru) tr(1n 2 ×n 1 Xtru) , is calculated by computing the consistent matches between the correspondence matrix X alg given by algorithm and ground-truth X tru . The objective score, Jgm(Xours) , is computed as the ratio between the objective values of our method and other algorithms.

Synthetic dataset
This experiment performed a comparative evaluation of seven algorithms on randomly synthesized graphs following the experimental protocol of [11,13,21]. For each trial, we constructed two identical graphs, G 1 and G 2 , each of which consists of 20 inlier nodes and later we added n out outlier nodes (in both graphs). For each pair of nodes, the edge is randomly generated according to the edge density parameter ρ ∈ [0, 1]. Each edge in the first graph was assigned a random edge score distributed uniformly as q 1 c ∼ U(0, 1) and the corresponding edge q 2 c = q 1 c + in the second graph is perturbed by adding a random Gaussian noise ∼ N (0, σ 2 ). The edge-affinity matrix K q was com- ) and the node-affinity K p was set to zero.
The experiment tested the performance of GM methods under three parameter settings. For each setting, we generated 100 different pairs of graphs and evaluated the average accuracy and objective score. In the first setting (Fig. 4a), we increased the number of outliers from 0 to 20 while fixing the noise σ = 0 and considering only fully connected graphs (i.e., ρ = 1). In the second case (Fig. 4b), we perturbed the edge weights by changing the noise parameter σ from 0 to 0.2, while fixing the other two parameter n out = 0 and ρ = 1. In the last case (Fig. 4c), we verified the performance of matching sparse graphs by varying ρ from 1 to 0.3. Under varying parameters, it can be observed that in most of cases, our method achieves the best performance over all other algorithms in terms of both accuracy and objective ratio. RRWM is comparable to our method. In particular, it slightly outperforms ours in the case when the graph edges contain large deformation (Fig. 4b). This is because the stochastic scheme adopted by RRWM can update the correspondence matrix more robustly than other optimization-based methods.

CMU house dataset
The CMU house image sequence [1] is commonly used to test the performance of graph matching algorithms [8,11,14,35]. This dataset consists of 111 frames of a house, each of which has been manually labeled with 30 land- marks. We used Delaunay triangulation to connect the landmarks. The edge weight q c is computed as the pairwise distance between the connected nodes. Given an image pair, the edge-affinity matrix K q was computed by ) and the node-affinity K p was set to zero. We tested the performance of all methods as a function of the separation between frames. We matched all possible image pairs, spaced exactly by 0 : 10 : 90 frames and computed the average matching accuracy and objective ratio per sequence gap. Fig. 5a demonstrates an example pair of two frames.
We tested the performance of graph matching methods under two scenarios. In the first case (Fig. 5b) we used all 30 nodes (i.e. landmarks) and in the second one (Fig. 5c) we matched sub-graphs by randomly picking 25 landmarks. It can be observed that in the first case (Fig. 5b), RRWM, IPFP-S and our method almost obtained perfect matching of the original graphs. As some nodes became invisible and the graph got corrupted (Fig. 5c), the performance of all the methods degrades. However, our method consistently achieved the best performance.

Pascal image dataset
The third experiment used the dataset from [26]. This dataset consists of 30 pairs of car images and 20 pairs of motorbike images selected from Pascal 2007 [16]. Each pair contains 30 ∼ 60 ground-truth correspondences. We computed for each node the feature, p i , as its orientation of the normal vector at that point to the contour where the point was sampled. We adopted the Delaunay triangulation to build graphs and each edge was represented by a couple of values, q c = [d c , θ c ] T , where d c is the pairwise distance between the connected nodes and θ c is the absolute angle between the edge and the horizontal line. Thus, for each pair of images, we computed the node affinity as k p ij = exp(−|p i − p j |) and the edge affinity as Fig. 6a and Fig. 6b demonstrate example pairs of car and motorbike images respectively.
To test the performance against noise, we randomly selected 0 ∼ 20 outlier nodes from the background. In the case when no outliers exist, our method achieves above 80% matching rate in both datasets (Fig. 6bc), which is higher than 75% presented in [26]. From Fig. 6c, it is interesting to see that RRWM performs better in terms of accuracy for particular level of outliers, whereas our method obtains a higher objectives. This is because the ground-truth correspondence may not be always the optimal solution to the problem.

Conclusions
This paper presents FGM, a new graph matching algorithm that exploits the properties of the factorized affinity or graph matrix. Three main benefits follow from factorizing the affinity matrix. First, there is no need to explicitly compute the affinity matrix. Second, it provides a unified approach to frame several graph matching algorithms. Third, using the factorization, a new optimization based on FW and CCCP is proposed. Experimental results on synthetic and real datasets illustrate the performance of the new method.
In the paper we have illustrated the advantages of factorizing the pair-wise affinity matrix of typical graph matching problems. The most computationally consuming part of the algorithm is the large number of iterations needed for FW method to converge when J α is close to a convex function. Therefore, more advanced techniques (e.g., conjugate gradient) can be used to speedup FW. In addition, we are currently exploring the extension of this factorization methods to other higher-order graph matching problems [9,14,40] as well as learning parameters for graph matching [8,26].