Codes for Epigenetics analysis including CHIPSEQ, Cut&Run
Click for all other analysis(ATAC-seq, RNA-seq, SLAM-seq, HiC, HiChIP) used.
chip_getfrag.sh:
Bash script to quality control CHIPSEQ from bam file.
install samtools, bedtools, UCSC tools, biobambam first,
Output bed file and cross-correlation plot for review to estimate the fragment size.
chip_getbigwig.sh:
A second script will generate bigwig tracks based on estimate fragment size normalized to 15M reads.
cutrun_getbigwig.sh:
It start from a bam file mapped by bwa after trim_galore and marked duplicated reads by biobambam.
will generate bigwig file normalized to 10M fragments size < 2kb. i.e. a sample with 5M fragments(<2kb) will double the height.
genome_sizes.tgz:
genome sizes for hg38/hg19/mm10 et.al needed for these scripts.
Peak calling and get reproducible peaks:
For peak calling of sharp peaks, we usually call peak twice by MACS2 using "-q 0.05" and "-q 0.5":
- macs2 callpeak -t ChIP.bam -g hs --nomodel --extsize fragmentsize(detected by chip_getfrag.sh) -n ChIP -q 0.05
- macs2 callpeak -t ChIP.bam -g hs --nomodel --extsize fragmentsize(detected by chip_getfrag.sh) -n ChIP.FDR50 -q 0.5
We also routinely remove artifact peaks overlap with "ENCODE blacklist".
and get reproducible peaks from replicates such as:
- bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep2_macs2_FDR50_peaks.narrowPeak -u > WT.rep1kept.tmp.bed
- bedtools intersect -a WT.rep1kept.tmp.bed -b WT.rep3_macs2_FDR50_peaks.narrowPeak -u > WT.rep1kept.bed
- bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep1_macs2_FDR50_peaks.narrowPeak -u > WT.rep2kept.tmp.bed
- bedtools intersect -a WT.rep2kept.tmp.bed -b WT.rep3_macs2_FDR50_peaks.narrowPeak -u > WT.rep2kept.bed
- bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep1_macs2_FDR50_peaks.narrowPeak -u > WT.rep3kept.tmp.bed
- bedtools intersect -a WT.rep3kept.tmp.bed -b WT.rep2_macs2_FDR50_peaks.narrowPeak -u > WT.rep3kept.bed
- cat WT.rep1kept.bed WT.rep2kept.bed WT.rep3kept.bed | sortBed -i | mergeBed -i stdin > WT.Repro.bed
- rm WT.rep*kept.bed WT.rep*kept.tmp.bed
If prefer highest confidence peaks, for example, design CRIPSR screen gRNA, you could choose to use all high confidence peaks.
- bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep2_macs2_peaks.narrowPeak -u > WT.rep1kept.bed
- bedtools intersect -a WT.rep1_macs2_peaks.narrowPeak -b WT.rep3_macs2_peaks.narrowPeak -u >> WT.rep1kept.bed
- bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep1_macs2_peaks.narrowPeak -u > WT.rep2kept.bed
- bedtools intersect -a WT.rep2_macs2_peaks.narrowPeak -b WT.rep3_macs2_peaks.narrowPeak -u >> WT.rep2kept.bed
- bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep1_macs2_peaks.narrowPeak -u > WT.rep3kept.bed
- bedtools intersect -a WT.rep3_macs2_peaks.narrowPeak -b WT.rep2_macs2_peaks.narrowPeak -u >> WT.rep3kept.bed
- cat WT.rep1kept.bed WT.rep2kept.bed WT.rep3kept.bed | sortBed -i | mergeBed -i stdin > WT.Repro.bed
- rm WT.rep*kept.bed
Homer motif analyses:
- conda install -c bioconda homer==4.10
- perl ~/condaenv/chip/share/homer-4.10-0/configureHomer.pl -install hg19
- findMotifsGenome.pl WT.Repro.bed hg19 homerout_WT.Repro -bits -local 2 -size given -mcheck ~/condaenv/chip/share/homer-4.10-0/data/knownTFs/all/known.motifs -mknown ~/condaenv/chip/share/homer-4.10-0/data/knownTFs/all/known.motifs