T-HyperGNNs: Hypergraph Neural Networks Via Tensor Representations

,


I. INTRODUCTION
M ACHINE learning on graphs has drawn much attention in the last few years as graphs can represent non-Euclidean relations in data.Graph neural networks (GNNs), in particular, have shown promise in various domains, such as social networks [1], [2], computer vision [3], [4], knowledge graphs [5], [6], and anomaly detection [7].These graph structures modeled by GNNs, however, are assumed to be pairwise relationships.In other words, each relational edge connects exactly two entities as shown in Fig. 1(a).In realworld applications where polyadic relationships among multiple objects are important, regular GNNs become insufficient to capture all useful features [8].For example, biomedical reactions often contain more than two substances [9], a coauthorship network can involve more than two authors for each paper [10], and traffic flows may be determined by more than two locations [11].This brings up the concept of a hypergraph, a more general data abstraction in which each hyperedge binds a group of nodes simultaneously (see Fig. 1 (b, c)).One convenient way to study hypergraphs is to map them into regular graphs and adopt simple graph convolution to approximate high-order relationships.This approach of reducing hypergraphs is called hypergraph expansion, which includes clique expansion [12] and star expansion [13], among several others [14].Since the graph convolution operation is originally derived in the spectral domain [12], we call them spectral HyperGNNs.Despite the simplicity, these methods could cause topological distortion and difficulty in downstream tasks since the mapping from a hypergraph to its corresponding simple graph is not one-to-one [15], [16].For example, if we consider the clique expansion that connects any two nodes in a hyperedge, it is easy to verify that hypergraphs G 1 in Fig. 1 (b) and G 2 in Fig. 1 (c) have the same pairwise connections, which is the simple graph in Fig. 1 (a).Other types of HyperGNNs, such as HNHN [13] and HyperSAGE [17], defined by a twostage spatial message-passing rule that gathers information from the neighboring nodes of each central node, utilize more advanced deep learning architectures but are still limited to matrix-based hypergraph representations.In addition, neither the spectral nor the spatial HyperGNNs use higher-order interactions among nodes while decomposing hypergraph information mainly through matrix representations.
Recently, approaches that do not require the use of hypergraph expansions have been proposed to fully exploit polyadic relationships.In particular, tensor-tensor multiplications (t-products) [18] were introduced to better understand hypergraph operations such as signal shifting and spectral filtering, thus offering powerful tools to formulate spectral convolutions [19], [20].Given these tensor representations and operations, several intriguing questions naturally arise: (1) Can we efficiently describe hypergraph structures in a high-dimensional space without information loss?(2) Can we model node interactions to represent their joint effects within a hyperedge?(3) Is it possible to generalize common graph neural network architectures, such as spectral convolution, spatial convolution, and message passing under the tensorial setting?To address these questions, instead of collapsing hypergraphs to simple graphs and representing reduced graphs in matrix forms, we study the hypergraph representation learning by a tensor-based framework.For simplicity, We call this new framework T-HyperGNNs.The contribution of this paper includes the following main aspects: • We design HyperGNNs via tensor representations to make use of higher-dimensional data in hypergraph representation learning.We encode hypergraph structures in adjacency tensors and model mutual interaction among nodes via cross-node interaction tensors, which allow a HyperGNN to learn higher-order functions beyond nodewise summation.• From hypergraph tensor representations and cross-node interaction tensors, we formulate T-spectral convolutions under the t-product scheme that connects to loss-free hypergraph Fourier transforms [20].Since the T-spectral convolution is a global operation, we localize the Tspectral convolution to form the T-spatial convolution.• To address the scalability and inductivity of tensor-based convolutions, we further propose tensor message-passing hypergraph neural networks (T-MPHNs) by storing and computing the adjacency tensor in a compressed manner (which is referred to as the compressed adjacency tensor).
• The proposed T-HyperGNNs show promising performance in comparison to state-of-the-art benchmarks over a wide range of datasets.The T-MPHN, in particular, is capable of processing large hypergraphs with efficient space and computational complexity comparable to matrix representation-based HyperGNNs.The rest of this paper is organized as follows.We introduce the necessary background and related work in Section II.We then define the cross-node interaction tensor and T-spectral convolution in Section III.To tackle the complexity of Tspectral convolution, in Section IV, we first localize the Tspectral convolution to form T-spatial convolution, and then propose an inductive and scalable tensor message passing neural network (T-MPHN).Connections between our three methods and other existing HyperGNNs are illustrated in Section V.The numerical experiments are summarized in Section VI.And a brief conclusion is given in Section VII.

A. Hypergraph and Algebraic Descriptors
A hypergraph G is defined as a pair of two sets G = (V, E), where V = {v 1 , v 2 , ..., v N } denotes the set of N nodes (or vertices) and E = {e 1 , e 2 , ..., e K } is the set of K hyperedges whose elements e k (k = 1, 2, ..., K) are nonempty subsets of V.The maximum cardinality of edges, or m.c.e(G), is denoted by M , which defines the order of a hypergraph.Apart from the hypergraph structure, there are also features x v 2 R D associated with each node v 2 V, which are used as row vectors to construct the feature matrix X 2 R N ⇥D of a hypergraph.
A hypergraph structure G can be encoded in either a matrix or a tensor form.We refer to these algebraic descriptors as S. In matrix representation, a hypergraph is described as a vertex-to-hyperedge incidence matrix H 2 R N ⇥K .As shown in Fig. 2 (c), entries of the incidence matrix are h nk = 1 if node v n lies in hyperedge e k , and h nk = 0, otherwise.While the incidence matrix representation is straightforward, it is a rectangular operator without a dimension-preserving property.Another matrix descriptor known as the adjacency matrix of a hypergraph is defined as A = HH T , which projects out the hyperedge dimension but leads to clique expansion (see Fig. 2 (b)) that causes distortion of hypergraph structures [15], [16]. with where the indices p 1 , p 2 , ..., p M for adjacency entries are chosen from all possible ways of {k 1 , k 2 , ..., k c }'s permutations with at least one appearance for each element of the hyperedge set, and ↵ is the sum of multinomial coefficients with the additional constraint r 1 , r 2 , ..., r c 6 = 0.In addition, other entries not associated with any hyperedge are all zeros.The example below demonstrates this definition.We will revisit and explain this definition in Section IV when we define the compressed adjacency tensor.Example 2.1: Given the hypergraph in Fig. 2 (a), its three hyperedges e 1 , e 2 , e 3 are represented by the adjacency cube in Fig. 2 (d) with nonzero entries specified on the right-hand side.For e 1 that contains nodes v 1 and v 2 , the adjacency indices are assigned according to their length-3 permutations {(121), (112), ( 211), (122), (212), (221)}, where 1 and 2 appear for at least once.Then, the values of adjacency coefficients are computed as 2/6, where the numerator c = 2 is the degree of e 1 , and the denominator ↵ = 6 is the number of index permutations.For the other two hyperedges e 2 , e 3 with c = |e 2 | = |e 3 | = M = 3, the indices of their corresponding adjacency entries are direct permutations of their node indices (e.g., {(345), (354), (435), ( 453), (534), (543)} for edge e 3 ).And the numerical value is computed by the quotient of hyperedge cardinality and the number of index permutations, giving the coefficient value 3/6.

B. Problem Definition
Formally, given a descriptor S of a hypergraph and the associated node features in X, the goal of HyperGNNs is to identify a representation map (•) between the feature X and the target representation t = (X, S, {W}) that incorporates the hypergraph structure, where {W} contains the weight parameters learned by the model.To learn the representation map, we consider a cost function J(•) and a training set The cost function can be chosen based on downstream tasks (e.g., node classification [21]).

C. Related Work
The research of HyperGNNs can be briefly categorized into two main approaches: 1) spectral methods that define convolution in the spectral space and 2) spatial methods that aggregate neighboring messages and combine with selfembedding for each node.
Matrix-based Spectral HyperGNNs.The earliest attempt to build HyperGNNs includes HGNN [12] and HCHA [22], which can be considered as spectral HyperGNNs built on the adjacency matrix A of a hypergraph.From the adjacency matrix, the hypergraph Laplacian is defined to construct the hypergraph spectral space that is formed by the eigendecomposition of the Laplacian.After applying spectral filters, the spectral convolution is formulated as Z = A norm XW, where A norm 2 R N ⇥N is a normalized adjacency matrix, X 2 R N ⇥D is the feature matrix, and W 2 R D⇥D 0 is a learnable filter weight matrix.Although A is a squared matrix, it is geometrically equivalent to the clique expansion, in which a hypergraph is reduced to a simple graph by connecting any two nodes that are in a hyperedge.For instance, the simple graph in Fig. 2 (b) is the clique expansion of the hypergraph in Fig. 2 (a).With such reduction, the small edge e 1 contained in e 2 is ignored.Thus the hypergraph expansion is not a one-to-one mapping, which could cause node-level and edgelevel ambiguities [16].Other methods such as HyperGCN [10] and LEGCN [23] are developed following similar ideas with different variants of matrix descriptors.
Matrix-based Spatial HyperGNNs.In contrast to spectral HyperGNNs, spatial HyperGNNs focus on the local connectivity of each node without going to the spectral domain.By defining the incident-edge set of node v as E v = {e 2 E | v 2 e}, UniGNN [24] proposes a spatial message-passing process with two steps: ( where 1 and 2 are two permutation-invariant functions for node-to-edge and edge-to-node aggregations, respectively.Specifically, the first step aggregates information from all nodes that are in each incident edge, thus forming a node-toedge propagation.The edge embedding x e is then combined with the target node embedding x v and passes through 2 to produce a new node embedding z v .Such node-edgenode embedding scheme remains to be matrix-based since it is a generalization of Z = H(H T X), where H is the incidence matrix.In addition to UniGNN, current methods including HNHN [13], HyperSAGE [17], and AllSet [15] are all under such node-edge-node propagation paradigm, but with more advanced architectures such as an attention mechanism.Compared to spectral HyperGNNs, spatial message-passing does not require the construction of a hypergraph algebraic descriptor and can be applied to previously unseen nodes during testing.However, it remains unclear if an appropriate higher-order descriptor (i.e., a tensor) can be employed to accommodate hypergraph structures.
In summary, spectral HyperGNNs require a dimensionpreserving hypergraph descriptor in order to define the spectral space, and current methods are mostly focused on matrix representations that correspond to hypergraph reductions.On the other hand, spatial HyperGNNs with the node-edge-node aggregation process stem from spectral convolution but are implemented in a two-step manner to adopt deep learning techniques.The following issues remain unsolved for these existing HyperGNNs.First, they are based on matrix descriptors with possible information loss.For example, the adjacency matrix A corresponds to a clique-expanded simple graph, which could not encode all intrinsic higher-order structures.Second, they do not take into account possible high-order feature interactions among multiple nodes.Indeed, the salient characteristic of hypergraphs compared to simple graphs is that hyperedges depict joint effects of a group of nodes.Lastly, spectral and spatial HyperGNNs are studied separately in the literature, while a more unified study connecting both approaches would be desirable.
To overcome the aforementioned issues, we propose the tensorial descriptor of the hypergraph structure and further construct the hypergraph signal tensor by modeling cross-node interactions.Using these two tensors, we design hypergraph spectral convolution under the t-algebra framework and then localize the spectral convolution to form spatial convolution that only propagates to neighbors of each node.Spatial message-passing HyperGNNs (T-MPHNs) are then built upon the compressed adjacency tensor to address tensor complexity for developing computationally efficient algorithms.

D. Tensor Notations and Operations
For ease of presentation, we first describe the notation for 3 rd order tensors as 3 rd order tensors are the base case of higher-order tensors.For a 3 rd order tensor, indices i 2 {1, 2, ..., N 1 }, j 2 {1, 2, ..., N 2 }, k 2 {1, 2, ..., N 3 } are used to specify the height, width, and depth-direction of the cube in Fig. 3(a).Breaking down a 3 rd order tensor along the third mode, we obtain frontal slices in Fig. 3(b).The k th frontal slice is A (k) = A(:, :, k) 2 R N1⇥N2⇥1 .When it comes to M th -order tensors A 2 R N M , we can view the last (M 2) orders as flattened frontal slice indices along the third order, that is,

III. T-SPECTRAL CONVOLUTION ON HYPERGRAPHS
In this section, we introduce the hypergraph interaction tensor by modeling cross-node interactions.Then based on the hypergraph adjacency tensor and the hypergraph interaction tensor, we propose the hypergraph T-spectral convolution using t-products.

A. Modeling Cross-node Interactions
To begin with, we present a cross-node interaction (CNI) tensor to model higher-order interactions among nodes.The CNI is designed as the (M 1)-time outer product of features along each feature dimension.Given the feature (or signal) matrix X 2 R N ⇥D as the input, with N being the number of nodes in a hypergraph and D being the dimension of features for each node, the d-th dimensional interaction among all nodes where denotes the outer product (also known as elementary tensor product), and [x] d 2 R N represents the d-th dimensional feature vector of all N nodes.For example, given Here we unsqueeze the outer-product tensor to generate the additional second mode for the dimension index of different features.Then by computing CN I([x] d ) for all D features and stacking them together along the second-order dimension, we obtain an M th -order interaction tensor X 2 R N ⇥D⇥N (M 2) .The resulting interaction tensor can be viewed as a collection of D tensors, each depicting node interactions at one feature dimension.The formulation of the cross-node interaction tensor has the following unique properties: 1) Interactions capture features that cannot be decomposed into sums of subfunctions of node features; 2) Interactions are applied across different linked nodes, as opposed to different features; 3) The order of interactions grows naturally with increasing order of complexity of the hypergraph.
Although widely used in many applications such as recommendation systems (e.g., Deep & Cross Network (DCN) [25], eXtreme Deep Factorization Machine (xDeepFM) [26]), interactions are mostly defined to be cross-channel or crossattribute for a node.Cross-channel interactions are also wellknown in high dimensional regression [27].Here we design the interactions to be cross-node (as opposed to cross-channel) based on the intrinsic node interactions depicted in hyperedges, which could contain additional information across linked nodes beyond linear summations of individual nodes.

B. Hypergraph T-spectral Convolution
The hypergraph T-spectral convolution is inspired by the tensor operations in the hypergraph signal processing framework known as t-HGSP [20], where the hypergraph spectrum is defined via t-product decompositions [19].With the adjacency tensor A and the cross-node interaction tensor X , we first enlarge them to their symmetric version s , where N s = (2N + 1) according to the symmetrization operation in Appendix B. The motivation of symmetrizing tensors is to obtain a symmetric block circulant matrix like bcirc(A s ) (to be introduced in Eq. ( 9)) so as to allow proper alignment of the adjacency tensor and the CNI feature tensor.After the symmetrization, to further ensure bounded spectra of the adjacency tensor, we normalize adjacency entries of A s by scaling them using the degrees of relevant nodes to generate A norm s .Since the symmetrization and the normalization are both known operations for tensors, for brevity, we leave their detailed description to Appendix B and Appendix C, respectively.The T-spectral convolution is then formulated as where s is a learnable weight tensor with DD 0 weights parameterized in the first frontal slice and all the remaining frontal slices being zeros.The multiplication operation ⇤ denotes the tensor t-product [20].Specifically, for the 3 rd order case (M = 3), given A s 2 R N ⇥N ⇥Ns and X s 2 R N ⇥D⇥Ns , we have X (2)  . . .
where the operator bcirc(A s ) converts the set of N s frontal slice matrices (in R N ⇥N ) of the tensor A s into a block circulant matrix.Specifically, the first row of bcirc(A s ) is the frontal slices of A s , i.e., [0, A (1) , A (2) , ..., A (N ) , A (N ) , ..., A (2) , A (1) ] and the next row is simply the one-step cyclic shifting of the previous row.
The operation unfold(X s ) stacks vertically the set of N s frontal slice matrices (in R N ⇥D ) of X s into a N s N ⇥ D matrix.The operator fold() is the reverse of the unfold() process so that fold(unfold(A s )) = A s .The t-product of higher order tensors is more involved with recursive computation with 3 rd order base cases.To maintain presentation brevity here, the details of this known t-product procedure are relegated to Appendix A for technical completeness.
The reason for constructing the T-spectral convolution as we defined above in Eq. ( 6) is partly due to the connection between the t-product and the Fourier transform [28].Since circulant matrices are diagonalized by the discrete Fourier transform, as shown in Algorithm 1, the t-product above can be efficiently computed by recursively applying the Fast Fourier Transform to both tensors, followed by conducting regular matrix product between flattened tensors and eventually performing Inverse Fast Fourier Transform.The derivation of the end for for p = M, ..., T-spectral convolution is drawn from the t-eigendecomposition of the Laplacian tensor, and we include the technical details in Appendix H.After constructing the spectral space from the t-eigendecomposition of the Laplacian tensor, a filtering function is applied to the frequency of the hypergraph, i.e., the eigen-tuples of the Laplacian tensor.When the filtering function is defined as the commonly-used first-order Chebyshev polynomial [29], [30], the T-spectral convolution in Eq. ( 6) is obtained.

C. Complexity Analysis
As the order and the number of nodes of a hypergraph increase, the time and space complexity of the T-spectral convolution becomes a major concern.Indeed, the computation in a one step t-convolution of the tensors A s ⇤ X s ⇤ W s in Eq. ( 6) can be shown to be O(DN 2M ), which is practically difficult given any moderate M .Even though the computation of the t-product could be reduced to O(DN M ) using Algorithm 1, it is still not sufficiently fast in large hypergraph learning.In addition, considering the space complexity of a M th -order hypergraph, the memory allocated for the adjacency tensor is O(N M ).Since tensor-based convolutions require that the full hypergraph adjacency tensor is known during model training process, a direct implementation is usually not feasible for large hypergraphs.We address these limitations in the next section.

IV. T-SPATIAL HYPERGRAPH NEURAL NETWORKS
To scale up the T-spectral convolution, two improvements are proposed in this section.First, we localize the T-spectral convolution to form a T-spatial convolution that only propagates to connected neighbors of each nodes.Second, to alleviate the space complexity of tensors, we introduce the compressed adjacency tensor that takes little memory usage.Based on the compressed adjacency tensor, a two-step message-passing framework is proposed, within which the Tspatial convolution is subsumed.

A. T-spatial Convolution
In the vertex domain, convolution is viewed as a weighted sum of neighboring information.As a result, the main idea of developing spatial convolution is to localize the spectral convolution, that is, only connected nodes are propagated through during a shifting operation.
To this end, recall that 2) is the (normalized) adjacency tensor defined from Eq. (1), and X 2 R N ⇥D⇥N (M 2) is the CNI signal tensor as defined in Eq. ( 5) from the feature matrix X.Here, we view the last (M 2) orders of these tensors as indices along the third order so that matrices and X , respectively.After applying the 3 rd -order symmetrization to A norm and X according to Eq. (32) in Appendix B, the Tspatial convolution is defined as where A norm s and X s are the corresponding symmetrized tensors, (A norm s ⇤ X s ) (1) is the first frontal slice of the shifted hypergraph signal A norm s ⇤X s , and W 2 R D⇥D 0 is a learnable weigh matrix.In this sense, the form of T-spatial convolution defined above can be viewed as a localized version of a Tspectral convolution as it keeps only the first frontal slice of the form of Eq. (6).
Also, note that (A norm s ⇤X s ) (1) can be computed as the sum of the corresponding frontal-slice products between A norm and X , that is, Equivalently, for an individual node v i with 1  i  N and where For example, in the 3 rd order case with M = 3, the first frontal slice of the shifted signal is computed as k) , which is equivalent to computing P N j=1 P N k=1 a ijk x jd x kd for each node v i (i = 1, • • • , N).Then if three different nodes v i , v j , v k are in the same hyperedge, we have a ijk 6 = 0 and the interaction with the neighboring nodes v j and v k are used to compute the shifted signal for v i ; otherwise, a ijk = 0 and the respective interaction term makes no contribution for the shifted signal.
In general, by the adjacency tensor definition and its sparse nature, the entries a iji3•••i M are the indicators to determine whether a corresponding set of nodes is connected to the target node v i through a hyperedge.Therefore, Eq. ( 13) implies that only the features/signals from neighboring nodes contribute to computing the shifted signal of the target node under the T-spatial convolution, which can lead to efficient computing algorithms to be introduced in the next subsections.
In addition, the outcome of the T-spatial convolution does not depend on the node ordering for adjacency tensor generation.On the other hand, the other frontal slices of A norm s ⇤ X s (except the first one) would involve more than the neighbors of a target node and may not be computed without prior node ordering information.Therefore, these frontal slices apart from the first one are not included in the T-spatial convolution.
These two desirable properties of the T-spatial convolution discussed above are summarized in the following propositions.

B. Compressed Adjacency Tensor Representation
If the use of the T-spatial convolution had to require complete construction and loading of large tensors, the time and space complexity would remain too large for most applications.In the following, to avoid direct tensor constructions, we decompose the adjacency tensor into two tables: the adjacency value table and the neighborhood table.These two tables will play an important role in formulating a T-spatial message passing algorithmic framework called T-MPHN in Section IV-C.Since the T-MPHN algorithm is designed from the T-spatial convolution based on the adjacency tensor and the cross-node interaction tensor, it remains to be a tensor-based approach.Nonetheless, T-MPHN can be shown to require much less computing time than that of using the T-spatial convolution by taking advantage of the sparsity of the adjacency tensor to store adjacency values and node connectivity in a compressed manner.
Returning to the hypergraph adjacency tensor introduced in Section II-A, from Example 2.1, we can see that the construction of the adjacency tensor can be divided into two sequential steps: 1) Spanning every edge into M th -order hyperedge; 2) Permutating indices of each spanned M th -order hyperedges.
Step 1. Spanning every edge e 2 E into M th -order hyperedges: Since hyperedges with |e| = M are in M th -order already, only hyperedges with |e| < M need to be spanned.
Definition 1 (M th -order Hyperedge): Given a hypergraph G(V, E) with the order M , for any hyperedge e 2 E, its M thorder hyperedge set e M is given by Here span M (e) is the set of M th -order sub-hyperedges spanned from e with |e| < M: where unique(e 0 ) = e means the distinct elements in e 0 is the same as e, and |e 0 | is the number of (possibly nonunique) elements in e 0 .It is not hard to see that the size of the sub-hyperedge set | span M (e)| is exactly the total number of combinations for choosing (M |e|) elements with replacement from the set e: For example, given the hypergraph of Fig.  Step 2. Permutating M th -order hyperedges: After obtaining M th -order hyperedges e M for every e 2 E, we permutate elements contained in e M (denoted by a sequence permutation function ⇡(•)), which in turn specifies the set of permuted index sequences corresponding to the adjacency entries associated with hyperedge e.Specifically, given any ), the entry value in Eq. ( 1) can be equivalently written as where the cardinality of permutated M th -order hyperedge |⇡(e M )| = ↵ is given in Eq. ( 2).As we can see from Eq. ( 17), two types of information are associated with nonzero adjacency entries: the adjacency value corresponding to the hyperedge e, and the indices capturing node connectivity.We then introduce two lookup tables to encode the information of the adjacency tensor: the adjacency value table and the node neighborhood table.These tables represent the compressed adjacency tensor, and an illustrative example is shown in Fig. 5.
For the adjacency value table, we first discover that they can be computed efficiently as a function of the edge cardinality |e| and the order M of the hypergraph.where Proof.The proof is given in Appendix E. ⇤ Given Theorem 4.1, the adjacency value table is easily constructed, in which the first column lists the cardinalities of hyperedges ranging from 2 (the minimum) to M (the maximum), and the second column refers to the corresponding adjacency value a e 's computed from Eq. ( 18).Note that the computation of the adjacency values a e 's does not rely on specific hyperedges, and hyperedges sharing the same cardinalities have the same adjacency values.So the adjacency table as shown in Fig. 5 (c) is typically very short.
Next for the node neighborhood table, we introduce the concept of M th -order neighborhood of a node.The parentheses in the neighborhood table represent the nodes forming a hyperedge with the target node in the first column.
Definition 2 (M th -order Neighborhood of a Node): Given a hypergraph G = (V, E) with order M , for any node v 2 V, its M th -order incidence edge set is where e M is the M th -order hyperedge set defined in Eq. ( 14).
Then we can define the M th -order neighborhood of v that basically excludes one target node v in each hyperedge from E M (v): where e M ( v) deletes exactly one node of v from each M thorder hyperedge in e M , and ⇡(•) represents permutation of the remaining nodes.
Consider node v 1 in Fig. 2(c) as an example.The 3 rd order incidence edge set for v Correspondingly, the M th -order neighborhood is ) of E 3 (v 1 ) contains repeated v 1 's since it results from the edge spanning, and the subsequent node deletion for generating N M (v 1 ) should only remove one node of v 1 .
From the M th -order neighborhood definition, the neighborhood table (see, e.g., Fig. 5(c)) is constructed with every node as the first column and their M th -order neighborhood N M (v) as the second column, so that it represents the hyperedge connectivity information carrying indices of nonzero adjacency entries.By specifying any target node v i from the first column of the neighborhood table, we can quickly search for nonzero adjacency entries required in computing the shifted signal corresponding to v i in Eq. ( 13).For example, as shown in Fig. 5, the nonzero adjacency entries for v 1 with its index fixed at the first mode are a 1:: = {a 121 , a 112 , a 122 , a 123 , a 132 }, which is consistent with the permutations in N 3 (v 1 ) from the neighborhood table.The neighborhood table together with the adjacency value table therefore forms the compressed sparse adjacency tensor to provide an efficient representation for higher-order hypergraph.

C. Inductive Learning with T-MPHN
With the compressed adjacency tensor representation, we propose the algorithm called the tensor message-passing hypergraph neural network (T-MPHN) in this subsection.Given any node v 2 V, let x v 2 R D be the input feature associated with node v. Given any ordered sequence of nodes to be the Hadamard (element-wise) product of their node features along each feature dimension d (1  d  D).
From the node-wise perspective, we then use the M th -order neighborhood N M (v) defined from Eq. ( 19) and Eq. ( 20) to compute the neighborhood embedding where AGGREGATE denotes permutation invariant aggregation functions such as summation and average, a e is the adjacency value computed by Theorem 4.1, and is the edge embedding for M th -order hyperedge set e M by aggregating cross-node interactions of each permutated sequence of neighborhood nodes from ⇡(e M ( v)).As the edge embeddings from Eq. ( 23) are aggregated for all hyperedges of a node with weights by adjacency values a e , the twostep process proposed in Eq. ( 22) essentially aggregates all the cross-node interactions generated by the ordered node sequences from N M (v) to infer the neighborhood embedding.
Using the hypergraph example of Fig. 2 for illustration, consider the case of D = 1 such that Then if we set node v 1 in Fig. 2(c) as the target node and let AGGREGATE be the summation operation, we obtain , m e 3 2 (v1) = (x 2 x 3 + x 3 x 2 ) by looking up the neighborhood table in Fig. 5(c).Since the coefficients a e1 = 1/3 and a e2 = 1/2 can be directly retrieved from the adjacency value table in Fig. 5(b), the neighborhood embedding of v 1 in the example is computed by m ).As we see from the above example, the neighboring aggregation first finds the connected nodes to a target node, and then sums up their cross-node interactions within corresponding hyperedges, which follows the same procedure as the shifting operation in the T-spatial convolution.Therefore, we summarize the connection between the T-spatial convolution and the t-message passing in the following theorem.
Theorem 4.2: Given any node v i 2 V, its shifted signal 13) is equivalent to the neighborhood embedding N M (vi) computed by Eq. ( 22) up to a tensor normalization factor.Proof.See Appendix F for proof.⇤ In particular, if the adjacency tensor is normalized by scaling the entries with the degrees of relevant nodes (see Eq. ( 34) in Appendix C) and the AGGREGATE operations are defined to be average and summation in Eq. ( 22) and Eq. ( 23), respective, then the neighborhood embedding becomes exactly the same as the shifted signal, that is,

Algorithm 2 T-MPHN Forward Propagation
Input: Hypergraph G(V, E); node features {x v | v 2 V}; number of layers L; hypergraph order M ; the adjacency value table; the neighborhood table, linear layers MLP (l) , l = 1, 2, ..., L; aggregation function AGGREGATE; combine operation COMBINE; nonlinear activation .Output: Node embeddings z v , 8v 2 V. x x end for end for Based on the computing scheme proposed above for the shifted signals, we next describe the T-MPHN algorithm, which is summarized in Algorithm 2. Let {x v | v 2 V} be the input node features.To begin with, we first initialize these node feature with one linear layer of regular multilayer perceptron (MLP) and project them into a latent space to obtain the initial hidden embedding features {x This step is particularly helpful when the input features have very high dimensions (e.g., one-hot-encoding features) to avoid potential gradient vanishing issues.
The T-MPHN algorithm then performs multi-layer operations as follows.Given the current layer l (l = 1, • • • , L), let {x (l 1) v | v 2 V} be the hidden embedding features from the previous layer, and let X(l 1) = (x v N ) T be the corresponding design matrix.For a given target node v 2 V, we first perform the efficient two-step aggregations to generate shifted hidden features m (l) N M (v) as shown in Eq. (24), where m (l) e M (v) are computed by the aggregation scheme of Eq. ( 23) using the previous embedding features in X(l 1) .
Subsequently, the step in Eq. ( 25) essentially integrates the proposed T-spatial convolution by concatenating x N M (v) to obtain an augmented node-specific vector followed by one regular linear layer (MLP (l) ) operation.Indeed, by Theorem 4.2, this step can be equivalently viewed as a weighted linear combination between the transformed hidden features X(l 1) by a simple linear layer and the T-spatial convolution from the CNI signals of X(l 1) by the convolution operation of Eq. (11).
Lastly, the resulting hidden features of node v are fed into a nonlinear (e.g., RELU) activation followed by a normalization step in Eq. ( 26) to generate x (l) v , which is used as the node's new hidden embedding features for the next layer l + 1.The process described above is repeated for L layers and finally leads to the output node embeddings z v for all v 2 V from the T-MPHN algorithm.

D. Design Variations of T-MPHN
Under the T-MPHN framework proposed above, it is conceivable that several variations may be formulated for practical use.We next illustrate some examples of its variations.Comprehensive investigation of other possible variations will be left to future work.
In the aggregation of Eq. ( 22), one can set the hypergraph order M as a fixed value so that any hyperedge with more than M nodes will be uniformly down-sampled to M degree.This down-sampling strategy is especially useful for datasets with only a few extremely large edges but many small-sized edges.Furthermore, the order M of the hypergraph at different layers can be set to be different: this variation is motivated by noting that the l-th layer of HyperGNNs aggregates information from the l-th hop neighbors.As the aggregation propagates to neighbors that are multiple hops away from the central target node, less neighboring nodes may be considered.By decreasing the order M as the layer l goes deep, the model performance can often be improved, and we will provide further discussion in Sec.VI.
In addition to M , one may also change the aggregation function.If a dataset contains "hub" nodes that lie in many hyperedges, a normalization strategy is to set the edge AG-GREGATE function in Eq. ( 22) to be the mean function.That is, where d v is the degree of node v that counts the number of edges that v lies in.Other AGGREGATE functions such as max-pooling and LSTM [31] may also be considered in accordance to a study's learning task.

E. Complexity Analysis
Unlike the T-spectral convolution that requires the use of the entire sparse adjacency tensor, the T-MPHN algorithm employs the compressed adjacency tensor to design an efficient aggregation scheme for hypergraph to avoid excessive space and time complexity.Let d m = max v2V d v be the maximum degree of all nodes and let D (l 1) be the dimension of the embedding features generated from the previous layer l 1. Suppose M is fixed (which is typically much smaller than N ).Then since the adjacency value table and the neighborhood table are both stored in dictionary format, the space complexity of T-MPHN is O(Nd m ) and the time complexity for each layer l is O(Nd m D (l 1) ).Therefore, rather than having the polynomial order of N M for the T-spectral convolution as shown in Section III-C, both the space and time complexities of T-MPHN are only linearly increasing with N , which is practically comparable to the state-of-the-art HyperGNNs such as UniGCN [24] and HNHN [13].

V. CONNECTION TO RELATED WORK
Here we first point out certain connections between the three proposed HyperGNNs: T-spectral convolutional HyperGNN, T-spatial convolutional HyperGNN, and T-MPHN.Then we show the relationship between our work and other closely related work under some special cases.
Connection between T-spectral and T-spatial convolutions.As shown in Section IV, the T-spatial convolution is obtained by localizing (or taking the first frontal slice of) the T-spectral convolution.Alternatively, a connection can be viewed from Eq. ( 10) in Algorithm 1: under the hypergraph order M = 2, the pre-Fourier transform and the post-Inverse Fourier transform in Algorithm 1 can be omitted since they are applied only to orders higher than 2; the computation of T-spectral convolution then becomes the T-spatial convolution of Eq. (11).Therefore, if a hypergraph is reduced to a simple graph (M = 2), the T-spectral convolution is the same as the T-spatial convolution.
Connection between T-spatial convolution and T-MPHN.In Theorem 4.2, we shown that the neighborhood embedding m N M (vi) is equivalent to the shifted signal [Y] i in the Tspatial convolution.Aside from the algorithmic perspective, the difference of the T-spatial convolution and the T-MPHN also lies in the way of combining the neighborhood embedding m N M (vi) and the central node embedding x vi .In the former approach, if a self-loop-added adjacency tensor is used, the combining operation is restricted to summation; in the T-MPHN, the combining operation is more flexible, and we choose to use concatenation in the experiment.

Connection between T-MPHN and other related work.
As tensor is a generalization of matrix, certain matrix-based HyperGNNs built on hypergraph expansions are naturally subsumed in our work.For example, after applying clique expansion to a hypergraph G, we obtained a uniform order-2 hypergraph, and from the definition of the adjacency tensor with M = 2, adjacency coefficients are a ij = 1 for each edge e = (i, j), which reduces the adjacency tensor to the adjacency matrix.For the hypergraph signal that is defined as the (M 1) times outer product of the original signal X 2 R N ⇥D , it automatically becomes the same as the original signal with M = 2. Furthermore, using our definition of neighborhood with M = 2, the adjacency matrix-based neighboring aggregation rule can be written as m N 2 (vi) = P e 2 2E 2 (vi) m e 2 (vi) and m e 2 (vi) = P u2e 2 ( vi) x u , which are simplified from the two aggregation steps in Eq. ( 22) and Eq. ( 23).

VI. EXPERIMENTS The proposed T-HyperGNNs including the T-spectral convolution (T-spectral), the T-spatial convolution (T-spatial), and the T-Message-passing (T-MPHN
) are evaluated in this section.In the first experiment, we consider transductive learning in which all nodes are involved in modeling during the training process (except for true labels of testing sets).An ablation study is conducted to show the effectiveness of using the adjacency tensor and the cross-node interaction tensor.To demonstrate the scalability and conductivity of the T-MPHN, an inductive setting is applied to a 3D object recognition problem, in which the newly-added unseen nodes are evaluated during the testing process.We use the accuracy rate to be the metric.For each reported accuracy rate, 10 random data splits and 5 different parameter initialization (a total of 50 repetitions) are performed to compute the mean and the standard deviation of the accuracy rates.We use the Adam optimizer with a learning rate and the weight decay choosing from {0.01, 0.001} and {0.005, 0.0005}, and tune the hidden dimensions over {64, 128, 256, 512} for all methods.

A. Transductive Node classification
The task for tranductive node classification is to predict the label associated with each node by taking the hypergraph structure and node features as input.In this experiment, we consider a transductive setting [31], in which the hypergraph structure is assumed to be the same during the training and testing processes.That is, we assume the testing node connections are known during model training.
Datasets.We use five standard hypergraph datasets in the academic network, which include two co-citation datasets (Cora and DBLP) and three co-authorship datasets (Cora, Cite-Seer and PubMed).The hypergraph structure is obtained by viewing each paper as a node and each co-citation or co-author relationship as a hyperedge.The node features associated with each paper are the bag-of-words representations summarized from the abstract of each paper, and node labels are classes of papers (e.g., algorithm, computing, etc).The raw datasets [10] are further downsampled to smaller hypergraphs such that the T-spectral and the T-spatial convolution HyeprGNNs can be applied to compare with the proposed T-MPHN.The descriptive statistics of these five hypergraphs are summarized in Table I.Setup and Benchmarks.To classify the labels of testing nodes, we feed the whole hypergraph structure and node features to the model.The training, validation and testing data are set to be 50%, 25%, and 25% for each complete dataset, respectively.Following the convention of HyperGNNs, we set the number of layers for all HyperGNNs to be 2 to avoid over-smoothing except for the T-spectral HyperGNN.For the T-spectral HyperGNN, we use only one layer because it is considered as a global approach that propagates to all nodes within just one-step T-spectral convolution.In this experiment, we choose regular multi-layer perceptron (MLP), HGNN [12], HyperGCN [10], and HNHN [13] as our benchmarks since these methods are originally designed for transductive settings.Here HGNN and HyperGCN utilize hypergraph reduction approaches to define the hypergraph adjacency matrix and Laplacian matrix such that spectral convolutions can be built up, whereas HNHN formulates a two-stage spatial propagation rule using the incidence matrix.
Results and Discussion.The testing results of the five academic networks are summarized in Table II.Overall, the tensor-based approaches achieve satisfactory performance compared to all the benchmarks, indicating the importance of effectively utilizing high-order tensor representation for learning hypergraphs.In particular, the T-spectral HyperGNN constructed with the t-product shows the best results on all these data examples except for the PubMed dataset.This observation coincides with our theoretical anticipation that the T-spectral model is the most robust approach as it contains the richest high-order information.Built on the localized Tspectral convolution, the T-spatial approach with only the first frontal slice of the t-product unsurprisingly shows somewhat reduced accuracy rates compared to the T-spectral approach, but still achieves competitive results to the benchmarks.The T-MPHN, on the other hand, maintains very competitive results across all the datasets compared to the T-spectral approach (e.g., for the PubMed dataset, the average accuracy rate is even 7.68% higher than that of the T-spectral approach).Comparing these two proposed approaches, we tend to view the T-MPHN as the more capable one to model various datasets and tasks; such capability is partially attributable to the concatenation of the neighborhood embedding and the central node embedding (that is, Concat( which forms a "skip-connection" between the input and the output of an aggregation step (see, e.g., GraphSAGE [31]).
In addition, it is worthwhile to note that the three proposed HyperGNNs themselves already demonstrate an ablation study among the full t-product, the simplified t-product (with only the first frontal slice), and the node-wise message passing with concatenation.Through the comparison between the T-spectral and the T-spatial approaches, we can see that the full t-product captures more information than only its first frontal slice; from the T-spatial approach to the T-MPHN, we can further see that such information loss can be partially compensated from the concatenation of the neighborhood embedding and the central node embedding.To gain additional insights into the model architecture of the T-MPHN, we conduct an ablation study in the next subsection to examine the adjacency value computation and the cross-node interaction.

B. Ablation Study for T-MPHN
On the same academic networks, an ablation study is designed by "turning off" the adjacency values in Eq. ( 18) and the cross-node interactions in Eq. ( 21) separately and testing their corresponding performance.We consider three modeling scenarios: 1) the full T-MPHN model; 2) the T-MPHN model with the cross-node interaction but not the adjacency values; 3) the T-MPHN model with the adjacency values but not the cross-node interactions.In the second scenario, when the adjacency values are "turned off", we fill in all ones instead.In the third scenario, when the cross-node interaction is "turned off", we replace the Hadamard product of node features with their summation.The results of the ablation study are shown in Fig. 6.We can observe that the performance of the two

C. Inductive 3D Object Recognition
In this experiment, we apply T-MPHN to one of the important tasks in computer vision: 3D object recognition.The goal of 3D object recognition is to classify 3D objects into different categories.To better adapt practical circumstances, we assume the 3D object datasets are evolving in which unseen objects are added during testing.This setting is called inductive learning [31], as opposed to the transductive setting in the previous experiments.To create the inductive setting from our static data, we follow HyperSAGE [17] to randomly reserve 40% nodes as unseen nodes for testing, while 20% and 40% nodes are used for regular training and validation, respectively.Datasets.We employ two public datasets known as the Princeton ModelNet40 dataset [32] and the National Taiwan University (NTU) 3D model dataset [33].On these two datasets, each 3D object is viewed as a node, and the features associated with each node are extracted using Group-View Convolutional Neural Network (GVCNN) [34] and Multi-View Convolutional Neural Network (MVCNN) [35] following the experimental setting of [12].The resulting feature dimensions from the MVCNN and the GVCNN are 4096 and 2048, respectively, and we concatenate them to form the input features for our study.To form the hypergraph structures for these two datasets, we also follow the setup of [12] by using the K-nearest neighbor algorithm with K = 5 so that all hyperedges of the constructed hypergraph have size 5.We summarize the data preprocessing steps described above in Fig. 7.The goal of the experiment is to predict the label associated with each node (e.g., window, aircraft, shelf, etc).The statistics of the two 3D object recognition datasets are summarized in Table III.Setup and Benchmarks.Since T-spectral and T-spatial HyperGNNs are not applicable to inductive settings, we only implement T-MPHN and compare its performance with the benchmark inductive methods: MLP, HyperSAGE [17], and UniSAGE [24].The HyperSAGE defines the intra-edge and inter-edge aggregations through a generalized mean function p , and we adopt p = 1 for its best results.On the other hand, UniSAGE proposes a node-edge-node propagation rule using mean and summation as the aggregation functions at the first and the second layer, respectively.For all models, we construct 2-layer neural networks.
Results.The average accuracy rates along with standard deviations are reported in Table IV.It is apparent from the table that the T-MPHN achieves consistently better results than the other benchmark methods for both seen and unseen nodes.A closer comparison between seen and unseen samples shows that generalizing a trained model to unseen nodes is not an easy task as all models show reduced accuracy rates.For example, the HyperSAGE gives even lower accuracy than that of the MLP for unseen nodes.By comparing the reduced accuracy percentages from seen nodes to unseen nodes, we also observe the desirable result that the T-MPHN shows the smallest reduction among the four methods.
Effects of hyperparameters.While there are various hyperparameters tuned in the training process, the orders of hypergraph at each layer of the T-MPHN can be flexibly treated as hyperparameters.We find that decreasing hypergraph orders (e.g., 5 ! 3) are generally desirable in practice.This can be viewed as a regularization of the over-smoothing problem.The first layer spreading at the first-hop neighbors of target nodes is naturally the most important one that requires a higher order, while the second layer aggregating the second-hop neighbors could use a lower order.

VII. CONCLUSION AND FUTURE WORK
In this paper, we introduce tensor representations of hypergraphs and derive hypergraph T-spectral convolution by the tensor t-product.While hypergraph neural networks can be built on the T-spectral convolution, the time and space complexities are too large for some real-world applications.To alleviate the time complexity, we localize the T-spectral convolution to the T-spatial convolution by taking only the first frontal slice of the T-spectral convolution.Furthermore,  [18]): The T-product of two 2)  . . .
where the operator bcirc(X ) converts the set of frontal slices of the tensor X into a block circulant matrix and unfold(Y) stacks vertically the set of frontal slices of Y into a N 2 N 3 ⇥ N 4 matrix.The operator fold(•) reverses this process, fold(unfold(X )) = X .Since circulant matrices are diagonalized by the discrete Fourier transform, the Tproduct can be computed efficiently in the Fourier domain as explained in detail in [18].Using MATLAB notation, let X := fft(X , [], 3) denote the tensor obtained by applying the fast Fourier transform (FFT) along each tubal scalar of X .Thus, the T-product of X 2 R N1⇥N2⇥N3 and Y 2 R N2⇥N4⇥N3 can be alternatively computed as where ifft is the inverse FFT.The T-product can be easily extended to high-order tensors in a recursive manner [36].For the M th -order tensors 2)  . . .
where X (k) and Y (k) are (M 1)-order tensors formed from flattening the M th order, and it can be done recursively to the following (M 1) th , (M 2) th , and so on.Each of these successive flatten operations thus involves a T-product of one order less until the base case of the 3 rd -order tensors.

APPENDIX B SYMMETRIZATION OF TENSORS.
In order to obtain symmetric block circulant matrices in the t-product, we need to symmetrize the adjacency tensor and the hypergraph signal tensor.Therefore, we define a symmetrization operator sym(A) that generates a symmetric version A s 2 R N ⇥N ⇥(2N +1) of A 2 R N ⇥N ⇥N , by adding a matrix of zeros 0 N ⇥N as the first frontal slice, dividing by 2, and reflecting the frontal slices of A along the third dimension as . . .
Now, if we let N s = 2N + 1, for a higher-order tensor A 2 R N M , its symmetric version is a M th-order tensor A s 2 R N ⇥N ⇥Ns⇥•••⇥Ns obtained by recursively appending a (M 1)th-order tensor of zeros O 2 R N (M 1) at the front, dividing by 2, and reflecting the (M 1)th-order tensors A (l) along the last dimension as When applied to the hypergraph signal tensor and the weight tensor, we obtain X s and W s , respectively.Notice that this operation is reversible.

APPENDIX C DEFINITION OF NORMALIZED ADJACENCY TENSOR
The normalized adjacency matrix of a graph generally can be defined in non-symmetric or symmetric manner.Similar to the matrix setting, we define the normalized adjacency tensor in two ways.
Definition 4 (Normalized adjacency tensor [37]): For a hypergraph G without any isolated vertex, the non-symmetric As a result, A = A T if A is a symmetrized tensor.

APPENDIX H DEVIATION OF HYPERGRAPH SPECTRAL CONVOLUTION
In the main body, we introduced the T-spectral convolution directly and ignored the relationship between convolutions and the hypergraph Fourier space, in which the convolution theorem [39] is originally defined.Since designing various spectral filters such as polynomial [29], [30], [40], and ARMA [41], [42] leads to different approaches, which widens the design space of convolutional hypergraph neural networks.Here, in this appendix, we show how the T-spectral convolution is derived from the spectral space.

A. Construct Spectral Space
The eigendecomposition of hypergraph Laplacian serves as the basis of hypergraph spectral space.We define the hypergraph Laplacian as a difference operator.
Definition 5 (Laplacian Tensor): Given a hypergraph G with N nodes and m.c.e(G) = M , the Laplacian tensor is defined as the where A is the adjacency tensor and D is a superdiagonal degree tensor with degree of node v i on the corresponding diagonal entry d i•••i .Particularly, d i•••i = P N j1,j2,...,j M 1 =1 a ij1j2...j M 1 .To ensure a bounded spectrum of the Laplacian tensor, the normalized hypergraph Laplacian tensor is furthered defined as where I is a super-diagonal identity tensor with all diagonal entries as 1, and A norm is the normalized adjacency tensor.
Moving on now to consider the decomposition of the Laplacian tensor, we carefully choose the T-eigendecomposition that offers better insights perfectly analogous to the characteristics of the eigendecomposition in the traditional graph spectral convolution theorem [30].Given the normalized Laplacian tensor L norm 2 R N M of an M th order hypergraph, we first modify it to its symmetric version L norm s 2 R N ⇥N ⇥(2N +1) (M 2) according to the symmetries operation in Appendix B. For notation simplicity, we let N s = (2N + 1).It follows that the T-eigendecomposition of L norm s is expressed as where s is an orthogonal tensor [36], and ⇤ 2 R N ⇥N ⇥N (M 2) s is a f-diagonal tensor whose frontal slices are diagonal matrices [36].A visualization of Teigendecomposition for a 3 rd -order Laplacian tensor is shown in Fig. 8.The diagonal components in ⇤ are in a decreasing order [36], which generalizes the concept of frequency of graphs to hypergraphs.

B. Hypergraph Spectral Convolution
Based on convolution theorem [39], the hypergraph spectral convolution between two hypergraph signals is then defined as the element-wise product of their Fourier transforms.
Definition 6 (Hypergraph convolution): The hypergraph spectral convolution ?G between a filter H s 2 R N ⇥1⇥N (M 2) s and a hypergraph signal X s 2 R N ⇥1⇥N (M 2) s is defined as where ⇤ is the T-product defined in Appendix A, is the element-wise Hadamard product, and V is the decomposed orthogonal tensor for L norm s .Let Ĥs = U T ⇤ H s 2 R N ⇥1⇥N (M 2) s be the Fourier transform of the filer H s , then the hypergraph convolution is equivalent to where diag( Ĥs ) = s 's are tuples of Ĥs .The Fourier transformed filter Ĥ itself can be viewed as a non-parametric spectral filter.However, a filter created in this non-parametric manner has little to no dependence on the graph's structure and might not satisfy many of the convolution's desired properties.Such filters, for instance, may propagate to any node arbitrarily.A general and natural practice is to apply filtering on frequency components of the Laplace, leading to hypergraph spectral convolution.
Definition 7 (Hypergraph Spectral Convolution): Given the frequency representation ⇤ of a hypergraph, parameterize the filter h : R !R as h(⇤), then the hypergraph spectral convolution is defined as with the frequency response h(⇤) = By constructing hypergraph spectral convolution via a frequency filter, the resulting convolution commutes with the Laplacian tensor, which is localized in space.[40].In this way, different filters h(⇤) can be designed according to specific tasks.

C. Connection to Spectral T-HGCN
The formulation of our T-spectral convolution is indeed derived from a recursive polynomial parametrization of frequencies, in particular, Chebyshev polynomial.The reason to use recursive polynomial has two folds: 1) recursive formulation is computationally efficient; 2) such recursion can be naturally modeled by cascading layers of neural networks, which is especially appropriate for developing hypergraph convolutional neural network.Recall that the recursive Chebyshev polynomial of order k is T k (x) = 2xT k 1 (x) T k 2 (x), with T 0 = 1, T 1 = x.A spectral filter thus can be designed as where ✓ k is the filter weight for the k th order of the Chebyshev polynomial, and T k ( ⇤) 2 R N ⇥N ⇥N M 2 s is the Chebyshev polynomial of order k evaluated at ⇤ = 2/ max ⇤ I Ns .max is the maximum value of eigentuples of L norm , and I Ns 2 R N ⇥N ⇥N (M 2) s is the symmetric identity tensor.It was proved in [37] that the normalized adjacency tensor L norm is with the largest eigenvalue max = 2, such that ⇤ = ⇤ I Ns has eigenvalues within the range [ 1,1].Applying the truncated order K expansion of the Chebyshev polynomial to the spectral convolution in Eq. (45), we then obtain Since orders higher than 1 can be performed by stacking neural network layers, we follow GCN [29] and restrict the order of the Chebyshev polynomial to K = 1, leading to the spectral convolution

Fig. 1 .
Fig. 1.Robot collaboration network represented by (a) a simple graph and (b) a hypergraph G 1 and (c) another hypergraph G 2 .In (a), each cooperation relationship is denoted by a line connecting exactly two entities; whereas in (b) and (c), each hyperedge denoted by a colored ellipse represents multi-robot cooperation.

Fig. 2 .
Fig. 2. (a) A hypergraph with (b) its clique expansion that is used in spectral HyperGNNs, (c) incidence matrix is utilized in spatial HyperGNNs, and (d) adjacency tensor, where nonzero entries of the adjacency tensor are specified on the right-hand side.In this work, we use tensor-based descriptors to propose a novel tensor-hypergraph neural network (T-HyperGNN) framework.Given a hypergraph G = (V, E) with N nodes of M th -order (that is, m.c.e(G) = M ), its adjacency tensor is defined as an M th -order N -dimensional tensor A 2 R N M .Specifically, for any hyperedge e k = {v k1 , v k2 , ..., v kc } 2 E with c = |e k |  M , the tensor's corresponding entries are given by a p1p2...p M = c ↵ ,(1)

Proposition 4 . 1 :
The T-spatial convolution is localized that propagates only through neighbors of each target node.Proposition 4.2: The T-spatial convolution is permutation invariant on the ordering of the nodes.Proof.See Appendix D for the proof of Proposition 4.1 and 4.2.⇤ 2 (c), the edge e 1 with |e 1 | = 2 < M = 3 can be spanned to two 3 rd order sub-hyperedges e 0 11 = (v 1 , v 2 , v 1 ) and e 0 12 = (v 1 , v 2 , v 2 ) as shown in Fig. 4.

Theorem 4 . 1 :
Given an adjacency tensor of a hypergraph, the adjacency value a e associated with a hyperedge e is a function of (|e|, M):a e = |e| ↵ ,

Fig. 5 .
Fig. 5. (a) A hypergraph; (b) The nonzero adjacency tensor entries for the hypergraph (a); (c) the adjacency value table; (d) the neighborhood table.The parentheses in the neighborhood table represent the nodes forming a hyperedge with the target node in the first column.

Fig. 6 .
Fig. 6.Averaged accuracy of T-MPHN and its corrupted variations on the five academic networks for the ablation study.corrupted T-MPHNs is worsened compared to the full T-MPHN, confirming its need of using the adjacency values and the cross-node interaction operation.

TABLE II AVERAGED
TESTING ACCURACY (%, ± STANDARD DEVIATION) ON FIVE ACADEMIC NETWORKS FOR TRANSDUCTIVE NODE CLASSIFICATION.THE TOP THREE RESULTS ARE HIGHLIGHTED FOR EACH DATASET.

TABLE IV ACCURACY
(%, ± STANDARD DEVIATION) ON TWO 3D OBJECT RECOGNITION DATASETS.THE REDUCED PERCENT MEANS THE ACCURACY PERCENTAGE REDUCTION FROM SEEN NODES TO UNSEEN NODES.THE BEST RESULTS ARE HIGHLIGHTED FOR EACH DATASET.