Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Polynomial Transforms Based on Induction

A polynomial transform is the multiplication of an input vector $x\in\C^n$ by a matrix $\PT_{b,\alpha}\in\C^{n\times n},$ whose $(k,\ell)$-th element is defined as $p_\ell(\alpha_k)$ for polynomials $p_\ell(x)\in\C[x]$ from a list $b=\{p_0(x),\dots,p_{n-1}(x)\}$ and sample points $\alpha_k\in\C$ from a list $\alpha=\{\alpha_0,\dots,\alpha_{n-1}\}$. Such transforms find applications in the areas of signal processing, data compression, and function interpolation. Important examples include the discrete Fourier and cosine transforms. In this paper we introduce a novel technique to derive fast algorithms for polynomial transforms. The technique uses the relationship between polynomial transforms and the representation theory of polynomial algebras. Specifically, we derive algorithms by decomposing the regular modules of these algebras as a stepwise induction. As an application, we derive novel $O(n\log{n})$ general-radix algorithms for the discrete Fourier transform and the discrete cosine transform of type 4.

Polynomial transforms have a number of important applications. For example, they are used for interpolation and approximation [17], solving differential equations [6], data compression and image processing [21][22][23][24], and the DFT specifically is widely used for spectral analysis and fast computation of correlation and convolution.
The origin and main motivation for our work lies in the algebraic signal processing theory [29,31,33]. This theory identifies polynomial transforms as equivalent to The group point of view was then generalized to derive fast Fourier transforms for group algebras C[G] for noncyclic finite groups G [5,9,10,13,35]. Some of them were based on the induction for group algebras, a construction that is algebraically analogous to the method used in this paper.
The polynomial algebra point of view was extended to derive and study larger classes of FFTs [2,18,19,27,43].
The extension to the algorithm derivation of the full class of trigonometric transforms and a large class of algorithms was then accomplished in [32,41] based on early ideas from [37,38]. Since all these algorithms are based on two theorems that generalize and account for the original Cooley-Tukey FFT, all the algorithms were called "Cooley-Tukey type." The close relation between transforms and algebra was fully developed and explained in the algebraic signal processing theory [31,33].
In this paper we generalize the main theorem from [32] and hence the class of Cooley-Tuke type algorithms. Specifically, following the discussion in [32], we rigorously demonstrate in Chapter 5 that these algorithms can be viewed as based on a special case of algebraic induction. Then we generalize the construction method to its most general form and show that it produces novel algorithms. As examples, we derive new general-radix algorithms for the DFT and the DCT of type 4.

Polynomial algebras and transforms.
In this section we discuss polynomial algebras and demonstrate that their decomposition matrices are exactly polynomial transforms. We assume that the reader is familiar with the basic theory of algebras, modules, and matrix representations, even though we strive for a selfcontained presentation in this paper. A good introduction to these topics can be found in [12,16,17] Below, we briefly review definitions and important properties.
A vector space that is also a ring is called an algebra. In this paper, we work with polynomial algebras of the form A = C[x]/p(x). Elements of A are polynomials in x that are added and multiplied modulo p(x). We assume p(x) = n−1 is a polynomial of degree n and separable, i.e. α k = α m for k = m. A is a commutative algebra of dimension n with a multiplicative identity.
A vector space M that permits a multiplication by elements of A, such that am ∈ M for any a ∈ A, m ∈ M, is called an A-module. The special case M = A is called a regular module. A subvector space N ≤ M that is also closed under the multiplication by elements of A, is called an A-submodule of M. If M has only trivial submodules (i.e., {0} and itself), it is called irreducible. It follows from the Wedderburn theorem that a regular module M = A can be decomposed into a direct sum of irreducible A-modules [12,16]. This decomposition is accomplished by the Chinese Remainder Theorem: Suppose the basis of M is a list of polynomials b = p 0 (x), . . . , p n−1 (x) , and in each C[x]/(x−α k ) we choose the basis consisting of 1. Then the matrix that describes the isomorphism (2.1) is precisely the polynomial transform shown in (1.1) : Namely, s(x) = n−1 ℓ=0 s ℓ p ℓ (x) ∈ M becomes, in coordinate form, the column vector and ∆(s(x)) in (2.1) can be computed as the matrix-vector product . , x n−1 is the standard basis, then the polynomial transform (2.2) is the Vandermonde matrix If, in addition, p(x) = x n − 1, then α k = ω k n , where ω n = e −i 2π n with i = √ −1, and the polynomial transform is precisely the discrete Fourier transform DFT n = ω kℓ n 0≤k,ℓ<n . (2.5) . . , T n−1 (x) is the basis consisting of the Chebyshev polynomials of the first kind 2 , then the polynomial transform has the form If, in addition, p(x) = T n (x), then α k = cos (2k+1)π 2n (see Table 2.1), and the polynomial transform is the discrete cosine transform of type 3 [34]: Scaled polynomial transforms. The notion of a polynomial transform can be generalized by allowing a different choice of a basis in the C[x]/(x − α k ) in (2.1). Namely, if we choose the basis c k , c k ∈ C in each C[x]/(x − α k ), then (2.2) becomes the scaled polynomial transform with P b,α as defined in (2.2). Example 2.3. Let p(x) = T n (x), and choose the basis b = V 0 (x), . . . , V n−1 (x) in M, where V ℓ (x) is the ℓ-th Chebyshev polynomial of the third kind. If we choose c k = 1/ cos (k+1/2)π 2n , then the associated scaled polynomial transform is the discrete cosine transform of type 4: P b,α = diag 0≤k<n cos (k + 1/2)π 2n · cos (k+1/2)(ℓ+1/2)π n cos (k+1/2)π 2n 0≤k,ℓ<n = cos (k + 1/2)(ℓ + 1/2)π n 0≤k,ℓ<n = DCT-4 n .
Note that all 16 types of discrete sine and cosine transforms are scaled or unscaled polynomial transforms with bases consisting of Chebyshev polynomials [31].
3. Subalgebra and its structure. In this section we discuss the structure of subalgebras of A = C[x]/p(x).
3.1. Definition. Choose a polynomial r(x) ∈ A, and consider the space of polynomials in r(x) with addition and multiplication performed modulo p(x): where all sums are finite. We call B the subalgebra of A generated by r(x) and write this as B = r(x) ≤ A.
3.2. Structure. Given r(x) ∈ A, we first determine the dimension of B = r(x) . Then we identify B with a polynomial algebra of the form C[y]/q(y) with a suitably chosen polynomial q(y).
Since r(α k ) ∈ β and |β| = m, the above matrix has only m different rows; hence, d ≤ m. On the other hand, it contains the full-rank m × m Vandermonde matrix β ℓ j 0≤j,ℓ<m as a submatrix; hence, d ≥ m. Thus, we conclude that d = dim B = m.
Next, we identify B with a polynomial algebra. Theorem 3.2. The subalgebra B = r(x) can be identified with the polynomial algebra C[y]/q(y), where q(y) = m−1 j=0 (y − β j ), via the following canonical isomorphism of algebras:
Proof. Observe that B and C[y]/q(y) have the same dimension m, and κ maps the generator r(x) of B to the generator y of C[y]/q(y). Hence, it suffices to show that q(r(x)) ≡ 0 mod p(x) in B. From (2.1) we obtain ∆(q(r(x))) = q(r(α 0 )) . . . q(r(α n−1 ) which implies that q(r(x)) ≡ 0 mod p(x) in A, and hence in B.
Let c = q 0 (y), . . . , q m−1 (y) be a basis of C[y]/q(y). The polynomial transform (2.2) that decomposes the regular module C[y]/q(y) (and hence the regular B-module B) is given by (2.1) as

Module induction.
In this section we introduce the concept of module induction, which constructs an A-module M from a B-module N for a subalgebra B ≤ A. We will show that every regular A-module is an induction, which is the basis of our technique for polynomial transform decomposition.

4.1.
Induction. Similar to the coset decomposition in group theory [12,16], we can decompose a polynomial algebra A = C[x]/p(x) using a subalgebra B and associated transversal : Later, in Theorem 4.6, we establish necessary and sufficient conditions for a list of polynomials to be a transversal of B in A. In particular, for any B ≤ A there always exists a transversal.
Given a transversal of B in A, we define the module induction, which is analogous to the induction for group algebras in [12].
Definition 4.2 (Induction). Let B ≤ A be a subalgebra of A with a transversal T as in (4.1), and let N be a B-module. Then the following construction is an A-module: where the direct sum is again of vector spaces. It is called the induction of the Bmodule N with the transversal T to an A-module. We write this as M = N ↑ T A.
In this paper, we are primarily interested in regular modules. These are always inductions, as follows directly from (4.1) and (4.2): Lemma 4.3. Let B ≤ A with a transversal T . Then the regular module A is an induction of the regular module B: 4.2. Structure of cosets. We have established in (3.2) that the subalgebra B ≤ A, generated by r(x) ∈ A, can be identified with a polynomial algebra C[y]/q(y). Next, we investigate the structure of each B-module t ℓ (x)B in the induction (4.3).
Consider a polynomial t(x) ∈ A. As in Theorem 3.2, let r(x) map α to β, and let q(y) = Theorem 3.2 shows that r ℓ (α k ) 0≤k,ℓ<n has exactly m = |β| linearly independent rows of the form For each β j , the above row contributes exactly 1 to the rank of the matrix (4.4) if and only if there exists α k such that t(α k ) = 0 and r(α k ) = β j . Since there are exactly |β ′ | = m ′ such values of β j , we conclude that dim t(x)B = m ′ .
Next, we identify the B-module t(x)B with a C[y]/q(y)-module.
By a slight abuse of notation, we write t(x)B ∼ = C[y]/q ′ (y). This is an isomorphism of modules and should not be confused with the isomorphism of algebras in Theorem 3.2.
Proof. It follows from Theorem 4.
is a basis of t(x)B, viewed as a vector space. On the other hand, 1, y, . . . , y m ′ −1 is obviously a basis of C[y]/q ′ (y), also viewed as a vector space. Hence, η in (4.5) is a bijective linear mapping between t(x)B and C[y]/q ′ (y).
In order for η to be an isomorphism of modules, it must also be a module homomorphism-it must preserve the addition and multiplication in t(x)B and C[y]/q ′ (y). Namely, for h(x) ∈ B and u(x), v(x) ∈ t(x)B, the following conditions must hold: The first condition is trivial. To show that the second condition holds, let Hence, η is a module isomorphism.
Note that, depending on t(x), the dimension of t(x)B may be smaller than the dimension of B: m ′ ≤ m. This effect is called annihilation.
Also, the definition of η in (4.5) assumes the standard basis 1, y, . . . , As a consequence of Theorem 4.5 and the above discussion, decomposing the The decomposition matrix is the same as for the regular module C[y]/q ′ (y) with the same basis, namely is a basis in A. The following theorem states this condition in a matrix form. Theorem 4.6. Using previous notation, T is a transversal if and only if the following is a full-rank n × n matrix: where D ℓ = diag (t ℓ (α k )) 0≤k<n , and B ℓ = r j (α k ) 0≤k<n,0≤j<m ℓ .
Proof. The proof is similar to the proofs of Theorems 3.1 and 4.4. Observe that the k-th element of b ′ in (4.7) is mapped to the k-th column of M ′ in (4.8) by the isomorphism ∆ in (2.1). Hence, b ′ is a basis in A if and only if M ′ has exactly n columns and rank M ′ = n.
Theorem 5.1. Given the induction (4.3), the polynomial transform P b,α can be decomposed as Here, B is the base change matrix from the basis b to the concatenation of bases .
is equal to the j-th element of β (ℓ) , and 0 otherwise. ⊕ denotes the direct sum of matrices: Proof. We prove the theorem for L = 2; that is, for A = t 0 (x)B ⊕ t 1 (x)B. The proof for arbitrary L is analogous.
Next, observe that where M ℓ is an n × m ℓ matrix whose (k, j)-th element is 1 if r(α k ) equals to the j-th element of β (ℓ) , and 0 otherwise; and D ℓ = diag t ℓ (α k ) Hence, from (5.2-5.4) we obtain the desired decomposition: Step 1. A is represented as an induction (4.3) by changing the basis in A to the concatenation of bases b (ℓ) of t ℓ (x)B, using the base change matrix B.
Step 2. Each t ℓ (x)B is decomposed into a direct sum of irreducible B-submodules, using the corresponding polynomial transform P b (ℓ) ,β (ℓ) .
Step 3. The resulting direct sum of irreducible B-modules is decomposed into a direct sum of irreducible A-modules, using the matrix M.
The factorization (5.1) is a fast algorithm for P b,α if the matrices B and M have sufficiently low costs, since the recursive nature of the second step allows for repeated application of Theorem 5.1. We illustrate this with two examples of novel algorithms derived using this theorem in Section 6.
Special case: factorization of p(x). A special case of Theorem 5.1 has been derived in [30,32]. Namely, assume that A = C[x]/p(x), and we can decompose p(x) = q(r(x)). Then B = r(x) ∼ = C[y]/q(y), and any basis t = 1, t 1 (x), . . . , t k−1 (x) of C[x]/r(x) is a transversal of B in A. This leads to the following result. is simply a permutation of α 0 , . . . , α n−1 , and denote the corresponding permutation matrix as P . Then, the polynomial transform decomposition (5.1) has the form Here, ⊗ denotes the tensor product of matrices. Corollary 5.3 has been used to derive a large class of fast algorithms for real and complex DFTs, and DCTs and DSTs [30,32,41]. Theorem 5.1 further generalizes this approach, and, as we show in the following example and in Section 6, also yields fast algorithms not based on Corollary 5.3.
6. Fast Signal Transforms. In this section we apply the module induction to the construction of novel fast algorithms for trigonomatric transforms, which are the most important polynomial transforms used in signal processing. The efficient computation of these transforms is of crucial importance in most applications, and makes straightforward computation using O(n 2 ) operations prohibitive. As mentioned in the introduction, many O(n log n) algorithms have been derived for these transforms (e.g., [3,7,15,34,36,42]) and the origin of these algorithms was revealed by the algebraic approach in [30,32,41], which also produced new algorithms.
In this paper, we complete this work through Theorem 5.1 and its application. Specifically, we will derive two novel O(n log n) general-radix algorithm that could not be obtained with the prior algebraic theory.
We will first briefly touch on the algebraic signal processing theory to explain why these transforms are associated with polynomial algebras. Then we derive the Cooley-Tukey FFT as special case of Theorem 5.1, which motivates why we call all such algorithms "Cooley-Tukey type." Then we derive the novel algorithms, both of which generalize existing algorithms that had no satisfying algebraic explanation before.
6.1. Algebraic Signal Processing. In [30][31][32][33], the authors introduced an axiomatic approach to the signal processing called the algebraic signal processing theory. They observed that the basic assumptions used in signal processing are equivalent to viewing filters as elements of an algebra A, and signals as elements of an associated Amodule M. In particular, in the shift-invariant signal processing of finite discrete one-dimensional filters and signals A = M = C[x]/p(x) is necessarily a polynomial algebra. The choice of A = M, together with a bijective mapping Φ that maps samples from C n to signals in M, defines a signal model (A, M, Φ).
The fundamental tool in signal processing is the Fourier transform, which computes the frequency content of a signal. From the algebraic point of view, the Fourier transform for a signal model (A, M, Φ) is precisely the decomposition (2.1). It can be computed as a matrix-vector product (2.3) with the appropriate polynomial transform (2.2).

Notation.
Hereafter, we use the following special matrices: I n is the identity matrix of size n. J n is the complimentary identity matrix of size n: its (k, n − 1 − k)-th element is 1 for 0 ≤ k < n, and 0 otherwise. 1 n = 1 1 . . . 1 T is a column vector of n ones.
Z n is the n × n circular shift matrix: L n k , where k divides n, is an n × n permutation matrix that selects elements of 0, 1, . . . , n − 1 at the stride k; the corresponding permutation is ik + j → jm + i, where 0 ≤ i < m and 0 ≤ j < k. The (i, j)-th element of L n k is 1 if j = ⌊ ik(n+1) n ⌋ mod n, and 0 otherwise.
where k divides n, is another permutation matrix. T n k = diag w ij n | 0 ≤ i < k, 0 ≤ j < m , where the index i runs faster, and n = km, is a twiddle factor matrix used in the Cooley-Tukey FFT.
Complimentary direct sum: for 0 ≤ ℓ < k. Hence, we can rewrite to obtain the well-known general-radix Cooley-Tukey FFT algorithm [11,32]: 6.4. New Fast Algorithms. In this section, we derive novel fast general-radix algorithms for DFT and DCT-4. Each of them requires O(n log n) operations. To the best of our knowledge, these algorithms have not been reported in the literature.
General-radix Britanak-Rao FFT. In [8], Britanak and Rao derived a fast algorithm for DFT 2m that can be written as the factorization Matrices D 2m m , B 2m m , and X 2m m are specified in (A.4-A.6) by setting k = 1. In Appendix A, we derive the following general-radix version of this algorithm: Theorem 6.1.
Here, D 2km m is a diagonal matrix, and B 2km m and X 2km m are 2-sparse matrices (that is, with each row containing only two non-zero entries) specified in (A.4-A.6). This factorization is obtained by inducing a subalgebra B = ( General-radix Wang algorithm for DCT-4. In [42], Wang derived a fast algorithm for DCT-4 2m that can be written as the factorization In Appendix B, we derive the following general-radix version of this algorithm: Theorem 6.2. Here, Y 2km m is a 2-sparse matrix specified in (B.4). This factorization is obtained by inducing a subalgebra B = T 2k (x) of an algebra DCT-4 k requires O(k log k) operations, and DCT-3 m requires O(m log m) operations [30,32]. Y 2km m requires 3n operations, where n = 2km. Hence, the algorithm for DCT-4 n in Theorem 6.2 requires O(n log n) operations.

Conclusion.
We have introduced a new approach to the factorization of polynomial transforms P b,α based on the decomposition of the underlying regular module A = M = C[x]/p(x) into an induction. This approach is in its most general form since the underlying Theorem 5.1 allows for arbitrary subalgebras. Not every factorization based on this theorem yields a fast algorithm: it depends on the computational costs of matrices B and M that occur in its recursive application.
However, we have shown that the theorem produces at least two novel generalradix algorithms for the DFT and a DCT. Both algorithms cannot be obtained using the prior Corollary 5.3. In addition, both generalize algorithms from the literature, which now become the special cases of radix 2.
Equally important, we make another step towards a complete algebraic theory of fast algorithms for polynomial transforms.
Future work. In addition to the DFT, DCT, and DST, other polynomial transforms have been studied. In particular, polynomial transforms based on orthogonal polynomials have found applications in such areas as function interpolation, data compression, and image processing [22][23][24]. For practical applications, fast algorithms for this class of polynomial transforms are needed. With the exception of DCT and DST, the fastest algorithms, reported in the literature to date, require O(n log 2 n) operations (in particular, more than 43n log 2 2 n for n = 2 k ) [14,28]. The question is whether our approach can improve this bound for some or all of these transforms.