<p dir="ltr">Abstract: <br>Modern gene-synthesis platforms let us probe protein function and genome biology at unprecedented scale. Yet in large, diverse gene libraries the proportion of error-free constructs decreases with length due to the propagation of oligo synthesis errors. To rescue these rare, error-free molecules we developed BAR-CAT (Barcode-Assisted Retrieval CRISPR-Activated Targeting), an in-vitro enrichment method that couples unique PAM-adjacent 20-nt barcodes to each library member and uses multiplexed dCas9-sgRNA complexes to fish out the barcodes corresponding to perfect assemblies. After a single 15-min reaction and optimized wash regime (BAR-CAT v1.0), three low-abundance targets in a 300,000-member test library were enriched 600-fold, greatly reducing downstream requirements. When applied to 384x and 1,536x member DropSynth gene libraries, BAR-CAT retrieved up to 122-fold enrichment for 12 targets and revealed practical limits imposed by sgRNA competition and library complexity, which now guide ongoing protocol scaling. By eliminating laborious clone-by-clone validation and working directly on plasmid libraries, BAR-CAT provides a versatile platform for recovering perfect synthetic genes, subsetting large libraries, and ultimately lowering the cost of functional genomics at scale.<br><br>This dataset contains processed enrichment data for all spacers including:<br><b><i>sequence</i></b> - spacer sequence<br><b><i>status</i></b> - if the sequence was an enrichment target or not<br><b><i>log2enrich</i></b> - log2 fold enrichment score<br><b><i>reads.ini</i></b> - initial reads before enrichment<br><b><i>reads.postenrich</i></b> - reads after enrichment<br><br>The raw Illumina MiSeq reads of the target gene libraries and nanopore sequence reads for enriched libraries are available on the NCBI Sequence Read Archive under BioProject accession PRJNA1273454 (https://www.ncbi.nlm.nih.gov/bioproject/1273454).</p><p><br></p>
Funding
National Science Foundation MCB-2032259
National Institute of General Medical Sciences T32GM149387