figshare
Browse
ARCHIVE
ref_aln.tar.gz (2.09 MB)
TEXT
README.txt (9.79 kB)
ARCHIVE
restez_sql_db.tar.gz (1.41 GB)
ARCHIVE
taxdmp.zip (62.82 MB)
.GENBANK
README.genbank (34.1 kB)
1/0
5 files

Fern Tree of Life (FTOL) input data

Version 9 2024-10-30, 08:34
Version 8 2024-03-14, 02:46
Version 7 2023-12-26, 09:22
Version 6 2023-12-20, 04:52
Version 5 2023-02-16, 04:55
Version 4 2023-01-18, 05:05
Version 3 2022-11-10, 11:13
Version 2 2022-06-23, 07:24
Version 1 2022-03-31, 06:43
dataset
posted on 2024-10-30, 08:34 authored by Joel NittaJoel Nitta, Santiago Ramírez-Barahona, Eric SchuettpelzEric Schuettpelz, Wataru Iwasaki

The data included here are used in a pipeline that (mostly) automatically generates a maximally sampled fern phylogenetic tree based on plastid sequences in GenBank (https://github.com/fernphy/ftol).

The first step is to download the latest release of GenBank data from the NCBI GenBank FTP site (https://ftp.ncbi.nlm.nih.gov/genbank/) and use it to create a local database of fern sequences. This is done with custom R scripts contained in https://github.com/fernphy/ftol, in particular setup_gb.R (https://github.com/fernphy/ftol/blob/main/R/setup_gb.R).

Next, a set of reference FASTA files for 79 target loci (one per locus; ref_aln.tar.gz) is generated. These include 77 protein-coding genes based on a list of 83 genes (Wei et al. 2017) that was filtered to only genes that show no evidence of duplication, plus two spacer regions (trnL-trnF and rps4-trnS). Each FASTA file in ref_aln.tar.gz includes one representative (longest) sequence per avaialable fern genus. This is done with prep_ref_seqs_plan.R (https://github.com/fernphy/ftol/blob/main/prep_ref_seqs_plan.R).

Sequences matching the target loci are then extracted from each accession in the local database using the FASTA files contained in ref_aln.tar.gz as references with the “Reference_Blast_Extract.py” script of superCRUNCH (Portik and Wiens 2020).

The extracted sequences are aligned with MAFFT (Katoh et al. 2002), phylogenetic analysis is done using IQ-TREE (Nguyen et al. 2015) and divergence times estimated with treePL (Smith and O’Meara 2012).

For additional methodological details, see:

Nitta JH, Schuettpelz E, Ramírez-Barahona S, Iwasaki W. 2022. An open and continuously updated fern tree of life. Frontiers in Plant Sciences 13 https://doi.org/10.3389/fpls.2022.909768.

Funding

Japan Society for the Promotion of Science (Kakenhi) Grant Number 16H06279

Smithsonian National Museum of Natural History Peter Buck Fellowship

Japan Society for the Promotion of Science (Kakenhi) Grant Number 22H04925

Japan Society for the Promotion of Science (Kakenhi) Grant Number 22K15171

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC