Trying ABC pipeline on a small example¶
We just verify on our own the pipeline of Moore et al. (2019) following the README file in their repository. We obtained nearly identical results (up to least significant digits in some files).
Running the ABC model¶
First of all we activate the abcmodel
environment: conda activate abcmodel
.
Running the ABC model consists of the following steps:
- Define candidate enhancer regions
- Quantify enhancer activity
- Compute ABC Scores
Step 1: defining candidate elements¶
macs2
requires Python 2, so we installed it on a separate conda environment py2
.
conda activate py2
macs2 callpeak \
-t test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
-n wgEncodeUwDnaseK562AlnRep1.chr22.macs2 \
-f BAM \
-g hs \
-p .1 \
--call-summits \
--outdir test_chr22/ABC_output/Peaks/
This creates the following in ABC_output/Peaks (which was empty before):
wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.xls wgEncodeUwDnaseK562AlnRep1.chr22.macs2_summits.bed wgEncodeUwDnaseK562AlnRep1.chr22.macs2_model.r
For information purpose, input_data
and reference
contain the following:
$ l test_chr22/input_data Chromatin/ Expression/ HiC/ $ l test_chr22/input_data/Chromatin/ ENCFF384ZZM.chr22.bam ENCFF384ZZM.chr22.bam.bai wgEncodeUwDnaseK562AlnRep1.chr22.bam wgEncodeUwDnaseK562AlnRep1.chr22.bam.bai wgEncodeUwDnaseK562AlnRep2.chr22.bam wgEncodeUwDnaseK562AlnRep2.chr22.bam.bai wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed $ l test_chr22/input_data/Expression/ K562.ENCFF934YBO.TPM.txt $ l test_chr22/input_data/HiC/raw/chr22/ chr22.bedpe* chr22.bedpe.gz* chr22.KRnorm.gz chr22.KRobserved.gz $ l test_chr22/input_data/HiC/raw/powerlaw/ hic.mean_var.txt hic.powerlaw.txt $ l test_chr22/reference/ chr22 RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed chr22.bed RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
Now we sort narrowPeak file and invoke makeCandidateRegions.
conda deactivate #after that we must be in abcmodel conda environment
#Sort narrowPeak file
bedtools sort -faidx test_chr22/reference/chr22 -i test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak > test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted
python src/makeCandidateRegions.py \
--narrowPeak test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir test_chr22/ABC_output/Peaks/ \
--chrom_sizes test_chr22/reference/chr22 \
--regions_blacklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_whitelist test_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000
We should insert here some information about blacklists and whitelists for clarification.
$ l test_chr22/ABC_output/Peaks/ params.txt wgEncodeUwDnaseK562AlnRep1.chr22.macs2_model.r wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.xls wgEncodeUwDnaseK562AlnRep1.chr22.macs2_summits.bed
Step 2: quantifying enhancer activity¶
python src/run.neighborhoods.py \
--candidate_enhancer_regions test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes test_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac test_chr22/input_data/Chromatin/ENCFF384ZZM.chr22.bam \
--DHS test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam,test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep2.chr22.bam \
--expression_table test_chr22/input_data/Expression/K562.ENCFF934YBO.TPM.txt \
--chrom_sizes test_chr22/reference/chr22 \
--ubiquitously_expressed_genes reference/UbiquitouslyExpressedGenesHG19.txt \
--cellType K562 \
--outdir test_chr22/ABC_output/Neighborhoods/
Results in the following:
[...] BEDTools completed successfully. Feature DHS completed in 1.1089489459991455 Assigning classes to enhancers Total enhancers: 3330 Promoters: 572 Genic: 1553 Intergenic: 1205 Neighborhoods Complete!
and in a new output directory Neighborhoods
containing the following:
$ l test_chr22/ABC_output/Neighborhoods/ EnhancerList.bed EnhancerList.txt Enhancers.DHS.wgEncodeUwDnaseK562AlnRep1.chr22.bam.CountReads.bedgraph Enhancers.DHS.wgEncodeUwDnaseK562AlnRep2.chr22.bam.CountReads.bedgraph Enhancers.H3K27ac.ENCFF384ZZM.chr22.bam.CountReads.bedgraph GeneList.bed GeneList.TSS1kb.bed GeneList.txt Genes.DHS.wgEncodeUwDnaseK562AlnRep1.chr22.bam.CountReads.bedgraph Genes.DHS.wgEncodeUwDnaseK562AlnRep2.chr22.bam.CountReads.bedgraph Genes.H3K27ac.ENCFF384ZZM.chr22.bam.CountReads.bedgraph Genes.TSS1kb.DHS.wgEncodeUwDnaseK562AlnRep1.chr22.bam.CountReads.bedgraph Genes.TSS1kb.DHS.wgEncodeUwDnaseK562AlnRep2.chr22.bam.CountReads.bedgraph Genes.TSS1kb.H3K27ac.ENCFF384ZZM.chr22.bam.CountReads.bedgraph
Step 3: computing the ABC score¶
python src/predict.py \
--enhancers test_chr22/ABC_output/Neighborhoods/EnhancerList.txt \
--genes test_chr22/ABC_output/Neighborhoods/GeneList.txt \
--HiCdir test_chr22/input_data/HiC/raw/ \
--hic_resolution 5000 \
--scale_hic_using_powerlaw \
--threshold .02 \
--cellType K562 \
--outdir test_chr22/ABC_output/Predictions/ \
--make_all_putative
$ l test_chr22/ABC_output/Predictions/ EnhancerPredictionsAllPutativeNonExpressedGenes.txt.gz EnhancerPredictions.txt EnhancerPredictionsAllPutative.txt.gz GenePredictionStats.txt EnhancerPredictions.bedpe parameters.predict.txt EnhancerPredictionsFull.txt
Test ABC pipeline through SSH¶
Load environments¶
Load the abcmodel
environment from base
and load all required modules.
conda activate base && conda activate abcmodel && module load bioinfo/samtools-1.9 bioinfo/bedtools-2.27.1 bioinfo/tabix-0.2.5 bioinfo/juicer-1.5.6
Using macs2
¶
Use the conda environment py2
and load the system/Python-2.7.2
module whenever you need to use macs2
.
conda activate py2 && module load system/Python-2.7.2 # comes with macs2
Remember to deactivate the py2
environment and to unload the Python2
module when done with macs2
.
conda deactivate && module unload system/Python-2.7.2
Create test_chr22
¶
cp -r example_chr22/ test_chr22/ && rm -r test_chr22/ABC_output/
Then follow the above pipeline (later this markdown will be included at the end of test_abc_pipeline.md
).
Run the ABC model¶
Don't forget to use srun --pty bash
to connect to a node before processing.
Step 1: define candidate elements¶
conda activate base && conda activate abcmodel && module load bioinfo/samtools-1.9 bioinfo/bedtools-2.27.1 bioinfo/tabix-0.2.5 bioinfo/juicer-1.5.6 && conda activate py2 && module load system/Python-2.7.2
srun --pty bash
macs2 callpeak \
-t test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
-n wgEncodeUwDnaseK562AlnRep1.chr22.macs2 \
-f BAM \
-g hs \
-p .1 \
--call-summits \
--outdir test_chr22/ABC_output/Peaks/
conda deactivate && module unload system/Python-2.7.2
#First sort narrowPeak file
srun --pty bash
bedtools sort -faidx test_chr22/reference/chr22 -i test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak > test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted
#Then
srun --pty bash
python src/makeCandidateRegions.py \
--narrowPeak test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir test_chr22/ABC_output/Peaks/ \
--chrom_sizes test_chr22/reference/chr22 \
--regions_blacklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_whitelist test_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000
Step 2: quantifying enhancer activity¶
srun --pty bash
python src/run.neighborhoods.py \
--candidate_enhancer_regions test_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes test_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac test_chr22/input_data/Chromatin/ENCFF384ZZM.chr22.bam \
--DHS test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam,test_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep2.chr22.bam \
--expression_table test_chr22/input_data/Expression/K562.ENCFF934YBO.TPM.txt \
--chrom_sizes test_chr22/reference/chr22 \
--ubiquitously_expressed_genes reference/UbiquitouslyExpressedGenesHG19.txt \
--cellType K562 \
--outdir test_chr22/ABC_output/Neighborhoods/
Step 3: computing the ABC score¶
srun --pty bash
python src/predict.py \
--enhancers test_chr22/ABC_output/Neighborhoods/EnhancerList.txt \
--genes test_chr22/ABC_output/Neighborhoods/GeneList.txt \
--HiCdir test_chr22/input_data/HiC/raw/ \
--hic_resolution 5000 \
--scale_hic_using_powerlaw \
--threshold .02 \
--cellType K562 \
--outdir test_chr22/ABC_output/Predictions/ \
--make_all_putative