figshare
Browse

Codes for Epigenetics analysis including CHIPSEQ, Cut&Run

Version 12 2024-10-17, 18:14
Version 11 2024-07-09, 19:59
Version 10 2024-07-09, 19:58
Version 9 2023-10-05, 23:09
Version 8 2021-03-30, 16:02
Version 7 2021-03-30, 16:00
Version 6 2020-05-04, 16:39
Version 5 2019-05-24, 19:50
Version 4 2018-12-03, 22:33
Version 3 2018-12-03, 22:30
Version 2 2018-12-03, 22:28
Version 1 2018-12-03, 19:53
software
posted on 2024-10-17, 18:14 authored by Beisi XuBeisi Xu

Click for all other analysis(ATAC-seq, RNA-seq, SLAM-seq, HiC, HiChIP) used.

chip_getfrag.sh:

Bash script to quality control CHIPSEQ from bam file.

install samtools, bedtools, UCSC tools, biobambam first,

Output bed file and cross-correlation plot for review to estimate the fragment size.

chip_getbigwig.sh:

A second script will generate bigwig tracks based on estimate fragment size normalized to 15M reads.

cutrun_getbigwig.sh:

It start from a bam file mapped by bwa after trim_galore and marked duplicated reads by biobambam.

will generate bigwig file normalized to 10M fragments size < 2kb. i.e. a sample with 5M fragments(<2kb) will double the height.

genome_sizes.tgz:

genome sizes for hg38/hg19/mm10 et.al needed for these scripts.

Peak calling and get reproducible peaks:

For peak calling of sharp peaks, we usually call peak twice by MACS2 using "-q 0.05" and "-q 0.5":

  • macs2 callpeak -t ChIP.bam -g hs --nomodel --extsize fragmentsize(detected by chip_getfrag.sh) -n ChIP -q 0.05
  • macs2 callpeak -t ChIP.bam -g hs --nomodel --extsize fragmentsize(detected by chip_getfrag.sh) -n ChIP.FDR50 -q 0.5

We also routinely remove artifact peaks overlap with "ENCODE blacklist".

and get reproducible peaks from replicates such as:

  • bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep2_macs2_FDR50_peaks.narrowPeak -u > WT.rep1kept.tmp.bed
  • bedtools intersect -a WT.rep1kept.tmp.bed -b WT.rep3_macs2_FDR50_peaks.narrowPeak -u > WT.rep1kept.bed
  • bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep1_macs2_FDR50_peaks.narrowPeak -u > WT.rep2kept.tmp.bed
  • bedtools intersect -a WT.rep2kept.tmp.bed -b WT.rep3_macs2_FDR50_peaks.narrowPeak -u > WT.rep2kept.bed
  • bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep1_macs2_FDR50_peaks.narrowPeak -u > WT.rep3kept.tmp.bed
  • bedtools intersect -a WT.rep3kept.tmp.bed -b WT.rep2_macs2_FDR50_peaks.narrowPeak -u > WT.rep3kept.bed
  • cat WT.rep1kept.bed WT.rep2kept.bed WT.rep3kept.bed | sortBed -i | mergeBed -i stdin > WT.Repro.bed
  • rm WT.rep*kept.bed WT.rep*kept.tmp.bed

If prefer highest confidence peaks, for example, design CRIPSR screen gRNA, you could choose to use all high confidence peaks.

  • bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep2_macs2_peaks.narrowPeak -u > WT.rep1kept.bed
  • bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep3_macs2_peaks.narrowPeak -u >> WT.rep1kept.bed
  • bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep1_macs2_peaks.narrowPeak -u > WT.rep2kept.bed
  • bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep3_macs2_peaks.narrowPeak -u >> WT.rep2kept.bed
  • bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep1_macs2_peaks.narrowPeak -u > WT.rep3kept.bed
  • bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep2_macs2_peaks.narrowPeak -u >> WT.rep3kept.bed
  • cat WT.rep1kept.bed WT.rep2kept.bed WT.rep3kept.bed | sortBed -i | mergeBed -i stdin > WT.Repro.bed
  • rm WT.rep*kept.bed

Homer motif analyses:

  • conda install -c bioconda homer==4.10
  • perl ~/condaenv/chip/share/homer-4.10-0/configureHomer.pl -install hg19
  • findMotifsGenome.pl WT.Repro.bed hg19 homerout_WT.Repro -bits -local 2 -size given -mcheck ~/condaenv/chip/share/homer-4.10-0/data/knownTFs/all/known.motifs -mknown ~/condaenv/chip/share/homer-4.10-0/data/knownTFs/all/known.motifs

History