Fungal genome sequencing: basic biology to biotechnology.

Abstract The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet’s stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.


Introduction
The first completed whole genome sequence of Saccharomyces cerevisiae strain S288C was a landmark in basic biology and fungal genomics (Goffeau et al., 1996). Thereafter, the genome of fission yeast, Schizosaccharomyces pombe (Wood et al., 2002), was sequenced, following S. cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, and Homo sapiens. A comparative genomic approach has helped to identify and characterize the function of novel ORFs in S. cerevisiae, and has shown its probable function in the evolution of yeast species (Cai et al., 2008). To better understand its evolution and natural history, Rhind et al. (2011) have compared the genomes and transcriptomes of all known fission yeasts, i.e. S. pombe, Schizosaccharomyces japonicus, Schizosaccharomyces octosporus, and Schizosaccharomyces cryophilus. Interestingly, the comparative genome analysis of strains of S. cerevisiae and isolates of Cryptococcus neoformans showed variations in the structure and the function of several proteins (Engel & Cherry, 2013;Ormerod et al., 2013).
Despite the necessity, progress in sequencing fungal genomes had been slow at initial stages. However, during the last decade, the sequencing studies took pace due to the involvement of a consortium of mycologists in collaboration with scientists from the Whitehead Institute, MIT Center for Genome Research (now the Broad Institute). They launched the Fungal Genome Initiative (FGI-http://www.broad.mit.edu/ annotation/fgi/), with the goal to sequence the genomes of fungi throughout the kingdom. Thereafter, a number of specific databases have been regularly updated, which provide great assistance to the different areas of fungal biology. The Genome Online Database (GOLD) is a site totally dedicated to whole genome-sequencing projects (http://www.genomeonline.org) and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece (http:// gold.imbb.forth.gr/) (Liolios et al., 2007). The fungal genome provides a deeper sequence understanding of the fundamental cellular processes common to all eukaryotes. It has solved some of our most urgent and major challenges in health, agriculture, enzyme biotechnology, bioenergy, and ecological diversity ( Figure 1).
The fungal phylum Basidiomycota encompasses diverse organisms, from the single-celled C. neoformans to the large size Armillaria spp. They can be plant pathogens, or animal pathogens; some of them produce wonderful metabolites (laccases and peroxidases), including pigments and toxins (Sanodiya et al., 2009;Sharma & Kuhad, 2008). Despite their importance, basidiomycete biology is poorly understood in comparison with ascomycete biology.
Further, the genomic information has provided the basic mechanism in understanding the cellular commands for the manufacture and maintenance of eukaryotic systems needed to control and operate, as well as a tool to make designer organism, which has potential to bring revolution in the pharmaceutical and fermentation industries (Annaluru et al., 2014;Frazier et al., 2003). This review is an attempt to highlight the importance of fungal genome-sequencing programs, in basic and applied research, which was disparaged for a long period of time.

Fungal genome databases
During the last decade, realizing the importance of fungal genomics, a consortium of mycologists in the year 2000, in collaboration with scientists from the Broad Institute, launched the Fungal Genome Initiative (FGI-http://www. broad.mit.edu/annotation/fgi/) ( Table 1). The Saccharomyces Genome Database (SGD) is the community resource for whole genome of S. cerevisiae strain S288C, which provides information on functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions, and the primary literature from which these data are derived (http://www.yeastgenome.org/) (Goffeau et al., 1996). The Textpresso (http://textpresso.yeastgenome.org) search tool is available to search approximately 50 000 papers collected by the SGD project (Muller et al., 2004). The Pathway Tools interface provides a complete description of each pathway, with molecular structures, ''enzyme commission numbers'' and full reference listing (http://pathway.yeastgenome.org) . SGD uses the GBrowse, a genome browser to display diverse types of genomic information (http://browse.yeastgenome.org) (Stein et al., 2002). All the curated interaction data in SGD are loaded in bulk from the BioGRID database of physical and genetic interactions (Biological General Repository for Interaction Datasets; http://www.thebiogrid.org/) (Breitkreutz et al., 2008). Access to the fungal genomic data is also available through multiple online resources (Table 1).
The US Department of Energy-JGI, in 2012, completed 2635 projects, a three-fold increase over previous year, and generated 456 trillion nucleotides of genome-sequence data from microbial communities, fungi, algae, and plants (Nordberg et al., 2014). In the past year alone, JGI has added 650 genomes to the public databases out of which 63 belong to fungi (Figure 2). The 1000 fungal genome project is one of the latest JGI large-scale genomic initiatives focused at divergent fungal species to provide compressive data for studying complex metagenome.

Schizosaccharomyces pombe
http://www.pombase.org/ PomBase is a comprehensive database for the fission yeast Schizosaccharomyces pombe, providing structural and functional annotation, literature curation and access to large-scale data sets (Rhind et al., 2011) 13.  (Grigoriev et al., 2014;Nordberg et al., 2014) potato, tomato, and soybean. The Phytophthora Functional Genomics Database (PFGD; http://www.pfgd.org) has been integrated with the Solanaceae Genomics Database (SolGD; http://www.solgd.org) to provide insight into the mechanisms of infection and resistance, specifically as they relate to the genus Phytophthora pathogens and their plant hosts (Gajendran et al., 2006). The Candida Genome Database (CGD, http://www.candidagenome.org/) is another online resource of gene and protein information for the opportunistic fungal pathogen, Candida albicans (Jones et al., 2004). The comparative genomic analysis of eight closely related Candida species has reveled new features to existing CGD, i.e. Biochemical Pathways and the Textpresso (Butler et al., 2009;Skrzypek et al., 2010). Merging of Aspergillus website (http://www.aspergillus.org.uk) and the Central Aspergillus Data Repository, CADRE (http://www.cadre-genomes.org.uk) has benefited from extensive cross-linking with medical information to create a unique resources, spanning genomics, industrial, and clinical aspects of the genera (Gilsenan et al., 2012).

Need to sequence fungal genome
Completion of the S. cerevisiae genome sequencing by Goffeau et al. (1996) and several prokaryotic genomes has led to a great incentives for the cause to sequence other fungal genomes. In one-and half-decades, there has been an impressive rise in fungal genomics that has greatly expanded our understanding of the genetic, physiological, and ecological diversity of these organisms. Many of these sequenced species  (Mewes et al., 2000) 20. form clusters of related organisms designed to enable comparative studies. The genome data explore an unparalleled opportunity to study the genetic composition and genomic comparison of the medically, industrially, and environmentally important fungi to unravel its evolutionary relatedness. Comparative studies of filamentous fungal genomes with essential and highly conserved components of normal development in animals have suggested that fungal and animal lineages may have diverged from their common originator, well before the emergence of any multicellular arrangement (Moore & Meškauskas, 2006;Wainright et al., 1993). The filamentous fungi and yeasts have enormous potential for useful metabolites such as organic acids, enzymes, and a variety of pharmaceuticals. Citric acid fermentation by A. niger has been studied for nearly 100 years, but with the availability of genome data, the central carbon metabolism has been modeled (Pel et al., 2007). Genomic data produced evidence of genetic redundancy in economically important fungi, which can eventually decrypt the secret of differential enzyme productions, secretion pathway, and other key factors for pathogenicity. The genome sequence of the model fungus, Neurospora crassa, was the starting point in the study of animal and plant pathogenesis, biotechnology, molecular genetics, biochemistry, physiology, molecular cell biology photobiology, circadian rhythms, gene silencing, ecology, and evolution (Galagan et al., 2003). A high-quality draft sequence of the N. crassa genome has approximately 10 000 protein-coding genes, which is more than twice as many as in the fission yeast, S. pombe and only about 25% fewer than in the fruit fly, Drosophila melanogaster (Galagan et al., 2003). Many predicted Neurospora proteins have no homologues in the yeasts, S. cerevisiae and S. pombe, but are similar to proteins in animals, plants, and other filamentous fungi. These genomic features approve Neurospora to be an excellent eukaryotic model system for the studies of numerous aspects of biology.

Fungal genome and human health
Fungi have been associated with a number of human diseases, some acute and others chronic. Therefore, the initial priority of genome sequencing was for fungi that present a serious threat to human health or serve as important models for biomedical research (Table 2). Candida albicans, one of the first eukaryotic pathogens selected for genome sequencing, is the most commonly encountered human fungal pathogen, causing skin and mucosal infections in generally healthy individuals and life-threatening infections in persons with severely compromised immune function. The significance of the extensive allelic differences in C. albicans is unknown, but may function to increase genetic diversity and contribute to the evolution of drug resistance (Cowen et al., 2002). Candida genome sequence (14 Mb) reveals several genes for environmental sensing (calmodulin signaling pathway, including a protein kinase), adaptation (pH regulatory genes), and encodes a small family of chloride channels with members resembling to a variety of mammalian tissues (Jones et al., 2004). Comparative genomic analyses could provide important clues about the evolution of the pathogens and its mechanisms of pathogenesis, which is needed for disease management (Jones et al., 2004).
Coccidioides immitis and Coccidioides posadasii are primary pathogens of immunocompetent mammals, including humans, which infect at least 150 000 people annually; 40% of whom develop a pulmonary infection (Hector & Laniado-Laborin, 2005;Sharpton et al., 2009). There genome sequences suggesting that pathogenic Coccidioides species are not only soil saprophytes but the adaptive changes to existing genes and the acquisition of a small number of new genes that has transformed them from plant based to animal based for nutritional support (Sharpton et al., 2009). Another Ascomycetes fungus, Aspergillus fumigatus, is a prototypical airborne opportunistic pathogen, affecting a wide range of susceptible patient groups, particularly those with neutropenia, receiving corticosteroids, with T cell defects (Ronning et al., 2005). Much of the basic biology of this organism was poorly understood, but the recent completion of its genome sequence and comparative genomic studies of two clinical isolates of A. fumigatus revealed species-specific chromosomal islands, which contribute to their rapid adaptation to heterogeneous environments (Fedorova et al., 2008;Ronning et al., 2005).
The Basidiomycetous yeast, Cryptococcus neoformans, causes life-threatening infections in the lungs and central nervous system, and is regarded as one of the most important pathogens in human fungal infections. Loftus et al. have sequenced 20 Mb genome of C. neoformans, basidiomycetous model yeast for fungal pathogenesis, which contains 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages (Loftus et al., 2005). Interestingly, transposon richness in the genome may be responsible for karyotype instability and phenotypic variation, which eventually exhibits changes in key virulence factors, nutrient acquisition and metabolic profiles (Loftus et al., 2005;Ormerod et al., 2013). Genome of 118 isolates of C. gattii have recently been sequenced, including the North American Pacific Northwest (PNW) subtypes and the global diversity of molecular type VGII, to better ascertain the natural source and genomic adaptations leading to the emergence of infection in the PNW (Engelthaler et al., 2014). It also illustrates the importance of population diversification as a result of micro-evolution in pathogenic organisms and the clonal expansion under differential selection pressure (Ormerod et al., 2013).
The genome of two basidiomycete fungi Malassezia globosa and Malassezia restricta, responsible for dandruff and seborrheic dermatitis, and systemic infections in patients with repressed immune system have recently been sequenced (Martinez et al., 2012). The Malassezia genome lacks fatty acid synthase and contains several secreted lipases and hydrolases, which are helpful in adapting to human skin. Phylogenetic analysis using available whole genome sequences revealed that M. globosa, a human fungal pathogen closely related to maize pathogen Ustilago maydis, implying an ancestral shift from plant to animal host preference . It is remarkable that clusters of genes for secreted proteins from M. globosa are also found in the U. maydis genome, although these proteins appear to function in plant-specific interactions. The genome of Trichophyton rubrum has also been sequenced and compared with the related species to find out the candidate genes involved in human skin and nail infections (Martinez et al., 2012). Further, the genome sequence of the dermatophytes and related species, i.e. Trichophyton tonsurans, Trichophyton equinum, Microsporum canis, and Microsporum gypseum, has revealed that the dermatophytes entire genome has duplicated in the course of evolution and were enriched with genes for proteases, kinases, secondary metabolites, and lysine motif (LysM), that may contribute to Highly inflammatory, involving the scalp, beard or exposed areas of the body, i.e. nails and skin (Burmester et al., 2011) the ability of these fungi to cause disease (Martinez et al., 2012).

Fungal genome and plant disease
Overall, a crop loss due to fungal infection exceeds $200 billion annually. Consequently, the control of plant diseases is crucial for sustainable production of food and reductions in agricultural use of land, water, fertilizers, and fuel. However, the disease control has been hampered by a limited understanding of the genetic and biochemical basis of pathogenicity, including mechanisms of infection and of resistance in the host. The top 10 most important plant-pathogenic fungi on the basis of economic importance are: Colletotrichum spp., (ix) Ustilago maydis, and (x) Melampsora lini (Dean et al., 2012) (Table 3). It is estimated that each year enough rice is destroyed by rice blast disease caused by fungus M. oryzae, to deprive feed for 60 million people (Dean et al., 2012). Analysis of the genes set and active transposable elements from the genome sequence of M. oryzae (38.8 Mb) provides an insight into the invasion and coexistence with widespread paddy cultivation by a fungus to cause disease (Dean et al., 2005). Recent reports of a robust gene expression dataset of M. oryzae challenged with the putative biocontrol bacterium Lysobacter enzymogenes strain C3 and mutant DCA has provided numerous hypotheses on the fungal defense mechanism and bacterial biocontrol potential (Malthioni et al., 2013). Unlike the genome of M. oryzae, the U. maydis genome (19.8 Mb) is reduced and devoid of any transposable elements. The most interesting feature of this genome is the clues that enable its survival as a successful parasite without inflicting too much damage on the host (Hertz-Fowler & Pain, 2007).
The rust fungi are highly destructive plant pathogens and are obligated on their plant host. The genome sequences of four rust fungi (two Melampsoraceae and two Pucciniaceae) have been analyzed so far. Genome-wide analyses of these species, as well as transcriptomics performed on a broader range of rust fungi, revealed hundreds of small secreted proteins considered as rust candidate secreted effector proteins (CSEPs). Puccinia is the largest genus of rust fungi and currently contains approximately 4000 species. Puccinia graminis, P. triticina, and P. striiformis represent distinct lineages within the cereal and grass rusts. The genome of multiple isolates of P. striiformis f. sp. tritici has been sequenced to identify CSEPs and relate them to their distinct virulence profiles (Cantu et al., 2013). Integration of genomics, transcriptomics, and effector-directed annotation of P. striiformis f. sp. tritici isolates has enabled the development of a framework for mining effector proteins in closely related isolates and relate these to their virulence profiles. Recent analysis of the genome size of 30 rust species (representing eight families) revealed the occurrence of very large genome sizes, including the two largest fungal genomes ever reported, Gymnosporangium confusum (893.2 Mbp) and Puccinia chrysanthemi (806.5 Mbp) (Tavares et al., 2014). Another plant fungus, F. graminearum, has become adapted to different kinds of hosts and environments, which has resulted in major economic losses in the grain production worldwide. Analysis of the 36 Mb genomes provides deep insight into how this fungus adapts and interacts with its host. Comparison and analysis of the genomes of three phenotypically diverse species, i.e. F. graminearum, Fusarium verticillioides, and F. oxysporum f. sp. lycopersici revealed lineage-specific and nutrition-specific genomic regions, which may be responsible for the conversion of nonpathogenic strains into a potential pathogen (Ma et al., 2010;Pain & Hertz-Fowler, 2008).
The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that globally reduces the yield and quality of wheat . The 21-chromosome comprising 39.7 Mb genome of M. graminicola has revealed an apparently novel origin for dispensable chromosomes by horizontal transfer followed by extensive recombination, a possible mechanism of undetected pathogenicity and exciting new aspects of genome structure. It was found that structural rearrangements have strongly affected eight small dispensable chromosomes and positive selection in a small number of genes, while the essential chromosomes were syntenic (Stukenbrock et al., 2010). A surprising feature of the M. graminicola genome is a low number of genes for enzymes that break down plant cell walls; this may represent an evolutionary response to evade detection by plant defense mechanisms .

Fungal genome in bioactive compound production
The fungal kingdom includes many species with unique and unusual biochemical pathways for economically important, low molecular-weight bioactive compounds classified together as secondary metabolites (Keller et al., 2005). The main groups of fungal secondary metabolites are (1) peptides, (2) alkaloids, (3) terpenes, and (4) polyketides (Keller et al., 2005). Large-scale genome-sequencing projects have helped to anticipate the capacity of basidiomycetes to synthesize polyketides (Lackner et al., 2012). The polyketide pathway is known for the production of biotechnologically important antibiotics (e.g. tetracycline and erythromycin), immunosuppressant, antitumor agents (e.g. epothilone), and lipid-lowering drugs (e.g. lovastatin). In addition, various pigments, polyphenols, as well as a plethora of mycotoxins, such as the aflatoxins and fumonisins, are also produced (Crawford & Townsend, 2010;Lackner et al., 2012).
Genome-mining efforts indicate that the capability of fungi to produce secondary metabolites has been substantially underestimated because their gene clusters are silent under standard cultivation conditions (Brakhage, 2013). Bioinformatic algorithms such as SMURF (Khaldi, 2010), antiSMASH (Medema et al., 2011), and FungiFun (Priebe et al., 2011) thus allow identification of secondary metabolism gene clusters. Recently, Chen et al. (2012) have reported the complete genome sequence of medicinal mushroom, Ganoderma lucidum strain 260125-1 (43.3 Mb), and identified a large set of genes and potential gene clusters involved in putative secondary metabolism and wood degradation (Sanodiya et al., 2009). The organism is well known for lignin  (Hu et al., 2012) degrading enzymes, i.e. laccases, lignin-peroxidise, and manganese-peroxidise (Sharma et al., 2013). Another study reports 28.15 Mb genome sequences of Omphalotus olearius revealing a diverse network of sesquiterpene syntheses and two metabolic gene clusters associated with anticancer illudin sesquiterpenoids biosynthesis (Wawrzyn et al., 2012). By combining genomic and biochemical information, Wawrzyn et al. (1992b) had developed a predictive framework that will allow researchers to tap into the vast terpenomes from Basidiomycota, and target-specific biosynthetic genes for heterologous pathway engineering in order to produce natural and novel compounds with potential new bioactivities (Ward et al., 1992b).

Fungal genome in food and feed
The two industrial Aspergilli species (Aspergilli niger and Aspergilli oryzae) contain the highest percentage of extracellular enzymes used in the food and feed industry, i.e. cellulases, hemicellulases, pectinases, amylases, inulinases, lipases, and proteases (Pel et al., 2007). The ability to secrete large amounts of proteins, development of a transformation system and inability of producing aflatoxin, has qualified A. oryzae in modern biotechnology for the production of traditional fermented foods and beverages (Ward et al., 1992b). Aspergilli oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its importance for mankind (Ward et al., 1992b). The genome (37 Mb) of A. oryzae contains 12 074 genes and has additional 7-9 Mb sequence in comparison with the genomes of A. nidulans and A. fumigatus (Machida et al., 2005). The genome sequences are enriched with genes involved in the synthesis of secondary metabolites, amino acid, and sugar uptake transporters to further sustain A. oryzae, as an ideal micro-organism for fermentation (http://www.aspergillusgenomes.org.uk) (Machida et al., 2005). The genomes of A. oryzae, A. fumigatus, and A. nidulans revealed 135, 99, and 90 secreted proteinase genes, respectively, which constitute roughly 1% of the total genes in each genome. Similarly, A. oryzae possesses more secretory proteinase genes that function in acidic pH, including aspartic proteinase, pepstatininsensitive proteinase, serine-type carboxypeptidase, and aorsin. This may be responsible for the increase in the adaptation of A. oryzae to an acidic pH during the course of its domestication (Machida et al., 2005). Considering the industrial importance, 33.9 Mb long genome of A. niger was sequenced and, thereafter, the annotated genome data were transformed into a model of central carbon metabolism, i.e. citrate synthesis (Pel et al., 2007). Different glycosyl hydrolase, lyase, and esterase families involved in the polysaccharide degradation in the Aspergilli sequenced were identified using the carbohydrateactive enzymes (CAZy) classification (http://www.cazy.org/). The A. niger genome revealed abundance of proteins involved in proteolytic degradation, including a variety of secreted aspartyl endoproteases, serine carboxypeptidases, di-and tripeptidylaminopeptidases, and several secondary metabolite clusters. Another filamentous fungus, Ashbya gossypii, is currently used as an attractive model to study filamentous growth and in the food industry for the production of vitamin B 2 . Dietrich et al. (2004) had sequenced and annotated the genome of A. gossypi. With a size of only 9.2 Mb, encoding 4718 protein-coding genes, it is the smallest genome of a free-living eukaryote, characterized to date. More than 90% of A. gossypii gene shows both homology and a particular pattern of synteny with S. cerevisiae, qualifying the additional role in basic research (Dietrich et al., 2004).
The hypogeous fruiting body and an ectomycorrhizal symbiont, i.e. Tuber melanosporum Vittad (truffle), and true morel Morchella conica are in worldwide demand for its delicacy. Identification of processes and conditions that trigger fruit body formation can be facilitated by a thorough analysis of truffle genomic sequences. To obtain a better understanding of the biology and evolution of the ectomycorrhizal symbiosis, 125 Mb long sequence of the haploid genome of T. melanosporum has been completed (Martin et al., 2010). The genome contains 7500 protein-coding genes with very rare multi-gene families. The upregulation of genes encoding for lipases and multi-copper oxidases suggest that T. melanosporum degrades its host cell walls during colonization (Martin et al., 2010). Further, many Morchella species are highly valued as edible species, but their cultivation is still a challenge. Recent fungal complete genome sequencing of M. conica CCBAS932 (http://1000. fungalgenomes.org/home/) may provide a potential clue in solving cultivation problems.

Fungal genome and lignin degradation
Lignin is a heterogeneous phenolic polymer that provides strength and rigidity to wood, and protects cellulose and hemicellulose from microbial attack. The wood rotting basidiomycete, P. chrysosporium, has a plethora of genes that enable wood degradation. First to be sequenced was the P. chrysosporium genome and has revealed several new isozymes for many previously identified lignocellulolytic enzymes, such as manganese peroxidase (MnP) and copper radical oxidases (Kersten & Cullen, 2007;Martinez et al., 2004). It has also revealed new putative flavine adenine dehydrogenase (FAD)-dependent oxidases, which have been predicted to take part in lignocellulose degradation (Martinez et al., 2004). The majority of genome sequencing projects after P. chrysosporium (Martinez et al., 2004) is focused on lignocellulose degraders, such as G. lucidum  and brown-rot, such as Postia placenta (Martinez et al., 2009), and Serpula lacrymans (Eastwood et al., 2011). Another basidiomycete, i.e. Schizophyllum commune, has been reported to be a pathogen of humans and trees, but it mainly adopts a saprobic lifestyle by causing white-rot. Sequencing of the 38.5 Mb genome assembly of S. commune strain H4-8 revealed 11.2% repeat content. Compared with the genomes of other basidiomycetes, S. commune has the highest number of FOLyme genes, glucose oxidase, lignocellulosic genes, glycoside hydrolases, and polysaccharide lyases, which explain its abundance (Ohm et al., 2010).
Till date more than 1000 complete fungal genomes have been publicly released, out of which approximately 40 genomes belong to the Basidiomycota. The phylum Basidiomycota contains roughly 30 000 described species, accounting for 37% of the true fungi (Kirk et al., 2001).
As part of the JGI Fungal Genomics Program, 12 species of wood-decaying Agaricomycotina are now being analyzed, including seven white-rot species and five brown-rot species (http://genome.jgi.doe.gov/programs/fungi/). Currently, complete genome sequences of the lignin degrading basidiomycetus species are from the order, Agricales, Boletales or Polyporales, e.g. Pleurotus ostreatus, Laccaria bicolor, Schizophyllum commune, Postia placenta, Coprinopsis cinerea, Serpula lacrymans, Ceriporiopsis subvermispora, G. lucidum, O. olearius, and Agaricus bisporus Eastwood et al., 2011;Fernández-Fueyo et al., 2012;Kersten & Cullen, 2007;Lorenz et al., 2014;Martin et al., 2008;Martinez et al., 2009;Morin et al., 2012;Stajich et al., 2010;Wawrzyn et al., 2012). Comparative analyses of lignolytic fungal genomes suggest that lignin-degrading peroxidases expanded in the lineage leading to the ancestor of the Agaricomycetes, which is reconstructed as a white-rot species (Floudas et al., 2012). Moreover, comparative analyses of lignin degrading enzymes in A. bisporus with other white-rot fungus, P. chrysosporium, the brown-rot fungus, Postia placenta, the coprophilic litter fungus, C. cinerea, and the ectomychorizal fungus, L. bicolor, revealed enzyme diversity consistent with adaptation to substrates rich in humic substances and lignocellulose (Doddapaneni et al., 2013). Annotation of another ligninolytic fungus, G. lucidum genome revealed a set of 36 ligninolytic oxido-reductases. Interestingly, G. lucidum possesses a large set of ligninolytic peroxidases, along with laccases and a cellobiose dehydrogenase compared with well-known lignin degraders, P. chrysosporium and S. commune. The presence of these enzymes suggests that G. lucidum may exploit different strategies for the breakdown of lignin, including oxidation by hydrogen peroxide in a reaction catalyzed by class-II peroxidases . Contrary to this, S. commune, a white-rot species, lacks class-II peroxidases and, in this regard, is more similar to the brown-rot P. placenta as well as the ectomycorrhizal L. bicolor (Grigoriev et al., 2011;Martin et al., 2008). These findings from just 10 species under three orders suggest that there has been extensive diversification in the decay mechanisms in Basidiomycota, thereby proposing a broad sampling of the phylum for whole genome sequencing.

Role of fungal genome in ecology and environment
Several fungal genome-sequencing projects have provided the evidence and reasoning for essential ecosystem functions, such as decomposing organic matter, nutrient cycling, and in the case of mycorrhizal species, also nutrient transfer and land colonization by plants. In forest ecosystems, they are largely responsible for the breakdown of large biopolymers, i.e. cellulose, hemicellulose, lignin, and chitin ( Figure 1) (Dix & Webster, 1995;Hättenschwiler et al., 2005;Kellner & Vandenbol, 2010;Martin et al., 2008, Steffen et al., 2002. Ectomycorrizal soil fungus, L. bicolor, forms a truly symbiotic association with trees and thus has a beneficial impact on plant growth in natural and agroforestry ecosystems (Martin et al., 2008). Laccaria bicolor with 65 Mb genome sequences is equipped to process the diverse nitrogen both from plant and animal sources, which are found in decaying organic matter. Another model fungus, i.e. A. bisporus, which is rich source of protein in human diet has successfully adapted to humic-rich environment. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The gene repertoire and the expression of hydrolytic enzymes in humic-rich substrates reveal that A. bisporus is substantially diverse from the taxonomically related ectomycorrhizal symbiont L. bicolor (Morin et al., 2012).
Cell survival depends on an organism's ability to sense and respond to environmental stresses. In saline environments, organisms respond to osmolarity changes through multiple signaling pathways (Hohmann, 2009). The relatively recent discovery of fungi in hypersaline environments has enabled the study of salt tolerance (Gunde-Cimerman et al., 2009;Hohmann, 2009). The Dead Sea is one of the most hypersaline habitats for Eurotium rubrum with a genome size of 26.2 Mb. The transcriptome analyses under different salt growth conditions revealed differentially expressed genes encoding ion and metabolite transporters (Kis-Papo et al., 2014). The genome composition is typical characteristic of the halophilic prokaryotes, supporting the theory of convergent evolution under extreme hypersaline stress.
Basidiomycetous fungi Wallemia sebi is a common foodborne contaminant that has been isolated from environments with different levels of water activity (a w ) (Padamsee et al., 2012;Pitt & Hocking, 2009). The W. sebi CBS 633.66 compact genome (9.8 Mb) has revealed the largest fraction of genes with functional domains compared with other Basidiomycota (Padamsee et al., 2012). Despite the seemingly reduced genome, several osmotic stress proteins and a high number of transporters were found that also provide clues to the ability of W. sebi to colonize harsh environments (Padamsee et al., 2012).

Fungal genome in enzyme biotechnology and bioenergy
The world market for enzymes has reach from $5.1 billion to $7 billion (6.3% annually) in 2013. Continued strong demand for specialty enzymes, as well as above average growth in the animal feed, paper, and pulp industry and ethanol production markets, will drive the enzyme market further (http://www.feedindustrynetwork.com/enzymes.aspx). Fungal enzymes are versatile and already being used in textile, detergent, food, and feed industry (Supplementary Table 1). The application of enzymes permits the industrial processes to be performed under milder conditions, using less energy and producing fewer toxic byproducts.
Regarding relevance to medicine and industry, and the desire to better understand this genus, the genomes of 10 Aspergilli have recently been sequenced, seven of which have been annotated by the collaborative effort (Gilsenan et al., 2012). The filamentous fungus Aspergillus niger has the ability to produce tremendous amounts of useful chemicals and enzymes. This fungus is the major source of citric acid for food, beverages, and pharmaceuticals and of several important commercial enzymes, including glucoamylase, which is widely used for the conversion of starch to food syrups and to fermentative feedstocks for ethanol production (Cullen, 2007). The availability of whole genomes allows the search of enzymes with enhanced properties and provides invaluable aid toward improving the production of chemicals and enzymes in this organism (Table 4).
Fungi can produce all kinds of carbohydrate-active enzymes (CAZymes) (Cantarel et al., 2009). Among them, plant cell wall-degrading enzymes received special attention because of their importance in fungal pathogens for penetration and its application in the hydrolysis of woody material for bioethanol production. A number of studies have revealed that the activity of hydrolytic enzymes from different fungi showed preferences for different types of plant biomass and adaption to their lifestyles (Battaglia et al., 2011;King et al., 2011;Zhao et al., 2013). Recently, a complete and systematic comparative analysis of CAZymes across the fungal kingdom has been reported (Zhao et al., 2013). More than 300 CAZymes have been reported from the proteomes predicted from 187 CAZyme families. The distribution of some CAZyme families was found to be phylum specific. For example, 28 families were found in the Ascomycetes, whereas, 15 families appeared to be Basidiomycota specific (Zhao et al., 2013). The fungi of division Ascomycota Trichoderma reesei (teleomorph Hypocrea jecorina) are widely used in industry as a source of cellulases and hemicellulases for the hydrolysis of plant cell wall polysaccharides. Whereas, Podospora anserina has the highest number of carbohydrate-binding proteins, and lignin and cellulose-degrading enzymes that has been found in any fungal genome sequenced till date (Espagne et al., 2008).
Several bioprocesses, which currently produce biofuels and renewable products, are fungal based. The US Department of Energy (DOE) Joint Genome Institute (JGI) has launched a Fungal Genomics Program (FGP) aiming to scale up sequencing and analysis of fungal genomes to explore their diversity and applications for bioenergy (Table 4) (Grigoriev et al., 2011).
Fungal enzymes play an important role in second-and third-generation biofuel production processes in deconstructing lignocellulosic plant stocks. They are involved in maintaining feedstock status, plant biomass saccharification, enzyme production to bioprocesses development for producing ethanol, or higher alcohols. Genomic analyses have shown that white-rot species possess multiple lignin-degrading peroxidases (PODs) and expanded suites of enzymes attacking crystalline cellulose. To test the adequacy of the white and brown-rot categories, Grigoriev et al. analyzed 33 fungal genomes (Rileya et al., 2014).
Bioethanol is produced through microbial fermentation of carbohydrates derived from agricultural feedstocks, mainly starch and sucrose. Yeasts conventionally used for industrial ethanol production such as S. cerevisiae, Saccharomyces bayanus, and various hybrids, such as Saccharomyces carlsbergensis, produce ethanol rapidly from glucose, maltose, mannose, or sucrose, but they are not capable of fermenting xylose, arabinose, and cellobiose. Because of these deficiencies, a great deal of research has been devoted to metabolic engineering of S. cerevisiae for improved xylose and cellobiose metabolism (Galazka et al., 2010;Hahn-Hägerdal et al., 2006;Jeffries, 2006). In approaching this problem from a fungal genomics perspective, researchers have examined numerous unconventional xylose fermenting species (Stephanopoulos, 2007), Pachysolen tannophilus (Schneider et al., 1981), Candida shehatae (Dupree & Vanderwalt, 1983), and Scheffersomyces (Pichia) stipitis (Jeffries, 2006;Kurtzman & Robnett, 2010;Smith et al., 2008). One interesting discovery to emerge by sequencing the complete genome of P. stipitis is that this yeast possesses numerous genes for the rapid metabolism of cellobiose. To better understand xylose utilization for subsequent microbial engineering, DOE-JGI conducted the genome sequence of two xylose-fermenting, beetle-associated fungi, Spathaspora passalidarum, and Candida tenuis. A comparative genomic approach was applied to identify genes involved in xylose metabolism. Genomic expression profiling across five Hemiascomycete species with different xylose-consumption phenotypes implicated many genes and processes involved in xylose assimilation (Wohlbach et al., 2011). Comparison of the large number of S. cerevisiae strains also enabled the characterization of a cluster of five ORFs that have integrated into the genomes of the wine and bioethanol strains on diverse genomic locations (Borneman et al., 2011).
Pseudozyma aphidis DSM 70725, a Basidiomycetous Fungus, is an efficient and a novel producer of mannosylerythritol lipids (MELs) and biosurfactant cellobiose lipid is also secreted during nitrogen limitation. Considering the biotechnological importance, P. aphidis DSM 70725 genome (17.92 Mb) has been recently sequenced and compared with the nucleotide contigs against the genome of the closely related species P. antarctica T34 (Lorenz et al., 2014). Strikingly, the complete MEL biosynthesis genes were found to be significantly conserved between P. antarctica T34 and P. aphidis (Lorenz et al., 2014). Potential biofuel fungi, Umbelopsis isabellina NBRC 7884 (genome size, 37 Mb) and Mortierella alpina isolate CDC-B6842 (genome size, 39.53 Mb), belong to the subdivision Mucoromycotina, many members of which have been shown to produce high levels of intracellular triacylglyceride accumulation, versatility in nutrient utilization, and high growth rate (Etienne et al., 2014;Takeda et al., 2014). Further, comparative genome study can be used to determine the genetic factors related to oleaginous characteristics.
Many fungi have the ability to increase the productivity of bioenergy crops, such as Miscanthus, switchgrass, and Populus, through mutually beneficial relationships called mycorrhizae (van der Heijden et al., 1998). Research advances on mychorrhizal and biocontrol fungi accelerated by genome-sequencing projects could make biomass energy crops more economically viable by lowering our dependence on pesticides and chemical fertilizers. Further, epidemics of poplar leaf rust, caused by Melampsora spp., is a major constraint on the development of bioenergy programs based on domesticated poplars as a result of the lack of durable host resistance. Rust fungi are obligate biotrophic parasites with a complex life cycle that often includes two phylogenetically unrelated hosts. Recently, DOE-JGI has sequenced and compared the 101 Mb genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89 Mb genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust (Duplessisa et al., 2011). Comparative genomic studies of M. larici-populina and P. graminis f. sp. tritici to other saprotrophic, pathogenic, and symbiotic basidiomycetes reveled that rust fungi lineages did not involve major changes in the ancestral repertoire of known conserved proteins (Duplessisa et al., 2011). Penicillium species are ubiquitous filamentous ascomycetes important to the biotechnology, biomedical and food industries. They commonly occur as food spoilage agents and opportunistic pathogens and are widely used as versatile cell factories. Berg et al. (2008) had sequenced the 32.19 Mb genome of P. chrysogenum Wisconsin 54-1255 and identified numerous genes responsible for key steps in penicillin production. The recent sequencing of the two leading filamentous fungi used in cheese manufacture, Penicillium roqueforti and Penicillium camemberti, and comparison with the penicillin producer Penicillium rubens (previously known as Penicillium chrysogenum) reveals a 575 kb long genomic island in P. roqueforti (Cheeseman et al., 2014). This genomic island accommodates about 250 genes, some of which are probably involved in competition with other micro-organisms.

Future prospects of fungal genome sequencing
The science of genomics has largely been driven by the desire to understand the organization and function of fungal genome. Later, the development of novel high-throughput DNA sequencing methods has provided a new method for both mapping and quantifying transcriptomes in yeast (Nagalakshmi, 2008). Over the past 10 years, large-scale sequencing has been revolutionized by the development of several next-generation sequencing (NGS) technologies. NGS can provide unprecedented data on the composition of mixed microbial communities from important clinical infections with low absolute abundance in the samples and high levels of fungal DNA from contaminating sources. It can be used not only for determining the DNA sequence of a microbial genome, ribosomal rRNA gene segments from fungi and bacteria in DNA extracted from bronchiolar lavage samples and oropharyngeal wash but also for resolving higher-order structures within the eukaryotic nucleus (Bittinger et al., 2014). The use of NGS to obtain transcriptomics data is collectively known as RNA deep sequencing or RNA-seq. The first novel transcribed regions were reported in the yeasts S. cerevisiae and Schizosaccharomyces pombe in the year 2008 and since then RNA-seq has been extended to a number of other organisms (Wilhelm et al., 2008). Recently, comparative global transcriptional studies of entomopathogenic fungi Metarhizium anisopliae and Metarhizium acridum provided a broad-based analysis of gene expression during early colonization processes, particularly in terms of the genes involved in host recognition, metabolic pathways and pathogen differentiation (Gao et al., 2011). A fungal transcriptome study has started in 2007 as express sequence tag (EST), followed by gene expression profiling. Until today, 198 fungal transcriptomes have been sequenced by JGI alone in 3 years, with marked exponential growth (Figure 3). Although RNA-Seq is still a technology under active development, it offers several key advantages over existing technologies. First, unlike hybridization-based approaches, RNA-Seq is not limited to detecting transcripts that correspond to an existing genomic sequence. RNA-Seq can reveal the precise location of transcription boundaries and also give information about how two exons are connected (Wang et al., 2009). This makes RNA-Seq particularly attractive for non-model organisms with genomic sequences that are yet to be determined.  (Etienne et al., 2014) The legacy of more than 150 years of fungal research, coupled with the availability of molecular and genetic tools and the advancement in sequencing technologies, offers enormous potential for continued discovery. Fungal genome sequences from several ongoing (http://www.tigr.org/tdb/ mdb/mdbinprogress.html) and planned (http://www.genome.wi.mit.edu/seq/fgi) projects will provide extraordinary opportunities for comparative analyses. This new era in fungal biology promises to yield insights into this important group of organisms, as well as to provide a deeper understanding of the fundamental cellular processes common to all eukaryotes. Construction of a synthetic genome (Gibson et al., 2008) and a designer chromosome III, of S. cerevisiae (Annaluru et al., 2014), demonstrates the possibility to develop a custom made genome for all economically important fungi in the coming days with the help of publicly available genome sequence database. Moreover, the ability to implement many simultaneous and directed changes to natural DNA sequences and to build and test synthetic systems will present researchers with a powerful new tool for de novo life form (Endy, 2008).

Conclusion
Studies on fungal biology have been supported by rapid enrichment of bio-informatics tools and genome sequence data. Novel approaches for extracting information on the functioning of metabolic pathways, gene expression, protein levels, sub-cellular localization, and functionality and disease forecasting provide a ''genomic insight'' of how an organism grows, reproduces, and responds to its surroundings. The genome sequence along with transcriptome sequence analysis of infected humans and plants lead to a more comprehensive understanding of the pathogenesis system, which eventually helps in developing more effective surveillance and disease management strategies for the most devastating pathogens. Results of comparative genomic data increase our understanding of eukaryotic genome evolution processes, using fungi as models. However, more data will probably be needed to understand the complexity of the genetic factors for various molecular and ecological adaptations of fungi. Next-generation sequencing technology may help to initiate a comparative study of multiple and related genomes to understand the evolution of gene for metabolic pathways, economically important enzymes, and the evolutionary relationships related to protein function. Considering their basic, economic, and biotechnological importance, fungal genome studies need to be more extensive and highly focused.