Summary statistics on Nasser et al. 2021 ABC predictions in our 5 biosamples of interest¶
Resources¶
Data¶
All data we use are available in /work2/project/regenet/results/multi/abc.model/Nasser2021/
(private).
Code¶
We use Sarah Djebali's bedpe.sumstats.sh
script. See example here.
R packages optparse
and reshape2
are required. It suffices to load the following module:
module load system/R-4.0.4_gcc-9.3.0
Miscellaneous¶
The script bedpe.sumstats.sh
requires a lot of memory. By default, with srun --pty bash
, we have too little memory for it not to be killed, so better run:
srun --mem=32G --pty bash
Compute summary statistics¶
The idea is to do the following:
srun --mem=32G --pty bash
module load system/R-4.0.4_gcc-9.3.0
/work2/project/regenet/workspace/thoellinger/scripts/bedpe.sumstats.sh ../Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_enhancers.sorted.bedpe.gz $(awk 'BEGIN{FS="\t"; min=1; max=0;} {if($8>max){max=$8}; if($8<min){min=$8}} END{print min"-"max;}' ../Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_enhancers.sorted.bedpe) "500-500"
for each file of interest. That would be quite long, so, we do the following once and for all:
srun --mem=32G --pty bash
module load system/R-4.0.4_gcc-9.3.0
declare -a files=("Nasser2021ABCPredictions.all_biosamples.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.all_biosamples.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.merged_intestine_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_liver_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.merged_liver_and_intestine_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.hepatocyte-ENCODE.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.liver-ENCODE.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.HepG2-Roadmap.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.large_intestine_fetal-Roadmap.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.small_intestine_fetal-Roadmap.all_putative_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.hepatocyte-ENCODE.all_putative_enhancers.merged_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.liver-ENCODE.all_putative_enhancers.merged_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.HepG2-Roadmap.all_putative_enhancers.merged_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.large_intestine_fetal-Roadmap.all_putative_enhancers.merged_enhancers.sorted.bedpe"
"Nasser2021ABCPredictions.small_intestine_fetal-Roadmap.all_putative_enhancers.merged_enhancers.sorted.bedpe")
cd /work2/project/regenet/results/multi/abc.model/Nasser2021/sumstats
for file in "${files[@]}"
do
gzip -c "../$file" > "../$file.gz"
dir_name=$(echo "$file" | sed -e 's/Nasser2021ABCPredictions.//g' -e 's/.sorted.bedpe//g')
mkdir "../sumstats_results/$dir_name"
/work2/project/regenet/workspace/thoellinger/scripts/bedpe.sumstats.sh "../$file.gz" $(awk 'BEGIN{FS="\t"; min=1; max=0;} {if($8>max){max=$8}; if($8<min){min=$8}} END{print min"-"max;}' "../$file") "500-500"
mv ./* "../sumstats_results/$dir_name"
done
For testing purpose, whenever necessary, one can use the following:
declare -a files=("Nasser2021ABCPredictions.all_biosamples.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.all_biosamples.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.merged_intestine_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.intestine.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.merged_liver_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver.all_putative_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.merged_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.merged_liver_and_intestine_enhancers.sorted.bedpe" "Nasser2021ABCPredictions.liver_and_intestine.all_putative_enhancers.sorted.bedpe") for file in "${files[@]}" do dir=$(echo "$file" | sed -e 's/Nasser2021ABCPredictions.//g' -e 's/.sorted.bedpe//g') echo $dir # echo "${file//.sorted.bedpe/}" done
Results¶
Distances of Enhancer-TSS pairs¶
Biosample Name \ Enhancers type | Original Enhancers | Merged enhancers (across considered biosamples) | Merged enhancers (across all 131 biosamples) |
---|---|---|---|
All 131 biosamples | Same as column 3 | ||
Liver and Intestine | ) | ||
Liver | |||
Intestine | |||
hepatocyte-ENCODE | No overlap between enhancers of a single biosample. Same as column 1. | ||
HepG2-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. | ||
liver-ENCODE | No overlap between enhancers of a single biosample. Same as column 1. | ||
large_intestine_fetal-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. | ||
small_intestine_fetal-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. |
Distribution of nb of connections of enhancers (red) / of TSS (green)¶
Number of connections of element 1 (enhancers) / of element 2 (TSS) => TSS usually makes more connections to enhancers, than enhancers make connections to TSS (the four screens distinguish between the ABC.score -quartiles).
Biosample Name \ Enhancers type | Original Enhancers | Merged enhancers (across considered biosamples) | Merged enhancers (across all 131 biosamples) |
---|---|---|---|
All 131 biosamples | Same as column 3 | ||
Liver and Intestine | ) | ||
Liver | |||
Intestine | |||
hepatocyte-ENCODE | No overlap between enhancers of a single biosample. Same as column 1. | ||
HepG2-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. | ||
liver-ENCODE | No overlap between enhancers of a single biosample. Same as column 1. | ||
large_intestine_fetal-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. | ||
small_intestine_fetal-Roadmap | No overlap between enhancers of a single biosample. Same as column 1. |
Summary statistics on enhancer lengths¶
Enhancers | 1st Qu. | Median | Mean | 3rd Qu. | Max |
---|---|---|---|---|---|
All 131 biosamples | 200 | 308 | 482 | 637 | 6991 |
All 131 biosamples (merged) | 288 | 631 | 808 | 1078 | 11616 |
Liver + intestine | 200 | 353 | 491 | 644 | 4626 |
Liver + intestine (merged) | 200 | 423 | 575 | 756 | 6298 |
Liver + intestine (merged 131) | 520 | 982 | 1231 | 1668 | 11616 |
Liver | 200 | 296 | 455 | 585 | 4626 |
Liver (merged) | 200 | 348 | 507 | 650 | 4737 |
Liver (merged 131) | 528 | 1027 | 1292 | 1780 | 11616 |
Intestine | 200 | 428 | 543 | 724 | 3679 |
Intestine (merged) | 221 | 488 | 607 | 814 | 4461 |
Intestine (merged 131) | 710 | 1209 | 1469 | 1978 | 11073 |