GSVD of the patient-matched LGA tumor and normal DNA copy-number profiles.
The structure of the LGA discovery, tumor and normal datasets Di is that of two matrices of 59 matched columns (i.e., patients), and 933,827, not necessarily matched or equal in numbers, rows (i.e., tumor and normal genomic regions, or Affymetrix probes). The GSVD of Eq (1) simultaneously separates the datasets into a single set of normalized, not necessarily orthogonal probelets VT (i.e., patterns of variation across the patients), which are identical for both datasets, but correspond to different sets of generalized singular values Σi (i.e., weights, or superposition coefficients) and orthonormal arraylets Ui (i.e., patterns of variation across the genome) in each dataset. The GSVD is depicted in a raster display, with relative DNA copy-number gain (red), no change (black), and loss (green), which explicitly shows only the first through the 10th, and the 50th through the 59th probelets and corresponding tumor and normal arraylets, and tumor and normal generalized singular values. The angular distances of Eq (4) define the significance of each probelet in the tumor dataset relative to its significance in the normal dataset in terms of the ratio of the corresponding tumor to normal generalized singular values [17]. The inset bar chart shows that the angular distances largest in magnitude correspond to the first and second probelets, and are > 2π/15, whereas the magnitude of the angular distance that corresponds to the 53rd probelet is < π/16.