data_file_1.txt (0.45 kB)
pan-gene matrix in R
This code takes a binary gene matrix of 1s and 0s from QuartetS (ortholog prediction from amino acid FASTA files) output and sequentially adds each column, counting each row with one or more 1s. The process is repeated 1,000 times, each time the order of the columns being permuted. The output is a matrix in R where the variation of each permutation is contained in each row. A pan-gene curve can then be plotted by calculating the median of each column and incorporating variation into the curve by including all values.