Skip to content

GO enrichment of all genes obtained

We use GO_enrichment_FDR.R which allows us to choose between multiple procedures for multiple testing correction. Here we are interested in controlling the FDR ; we can use either pAdjustMethod = "BH" (Benjamini-Hochberg, default parameter) or pAdjustMethod = "fdr".

All results (list of genes and detailed GO enrichment) can be found here:

/work2/project/regenet/workspace/thoellinger/shared/2022/promoter_capture_hic/

Dependencies

We use R 4.0. We need the clusterProfiler package. See custom_go.md for installation details.

Main remarks

  • From time to time, one may get the following error while using GO_enrichment_FDR.R:
Error in `[.data.frame`(d, , 2) : undefined columns selected

This is expected, and should be considered as a warning only (I did not take the time to handle the case with an exception). This happens when no significant GO terms are found ; in such case, no graphical outputs are computed.

Collecting the list of genes of interest

cd /work2/project/regenet/workspace/thoellinger/shared/2022/promoter_capture_hic/

We obtained this list using our R markdown doing network analysis of E-G pairs starting from genes directly involved in hemochromatosis or involved in the regulation of iron metabolism.

We saved this list of 54 (12 known + 42 inferred) genes as new_genes_v1.list. For each one of the 13 initial genes, we also saved separately the corresponding inferred genes. I a few cases, no genes were inferred starting from one of the 13 genes (+ for one of the 13, we did not have CHiC data at all):

└── results
 ...
├── 13_initial_genes.list # 13
├── new_genes_v1.list # 54
└── separate
    ├── BMP6.list 1
    ├── CIAPIN1.list # 8
    ├── CYBRD1.list 1
    ├── HFE2.list # 5
    ├── HFE.list 1
    ├── NEO1.list # 2
    ├── SLC11A2.list # 8
    ├── SLC39A14.list # 4
    ├── SLC40A1.list 1
    ├── TFR2.list # 15
    ├── TFRC.list 1
    └── TMPRSS6.list # 7

GO enrichment

module load system/R-4.0.4_gcc-9.3.0

All 54 genes (original + inferred)

mkdir -p GO_FDR/all_genes/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/new_genes_v1.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/all_genes/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "54 provided genes; 49 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 1086 distinct GO terms"
[1] "Of those 1086 GO terms, 18 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

All 42 genes (inferred only ; w/o original genes)

mkdir -p GO_FDR/all_genes_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/new_genes_v1_without_original.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/all_genes_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "42 provided genes; 38 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 740 distinct GO terms"
[1] "Of those 740 GO terms, 0 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

All genes co-expressed in liver

Those are: HFE + TFR2 + HJV (HFE2) + HAMP + TMPRSS6

Note: we have no CHiC data for HAMP, and there are no inferred genes from HFE.

mkdir -p GO_FDR/genes_inferred_from_that_expressed_in_liver/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/genes_inferred_from_that_expressed_in_liver.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/genes_inferred_from_that_expressed_in_liver/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "24 provided genes; 21 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 421 distinct GO terms"
[1] "Of those 421 GO terms, 4 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

All genes expressed in intestine

Those are: DCYTB (CYBRD1) / DMT1 (SLC11A2) / SLC40A1

Note: there are no inferred genes from CYBRD1 nor SLC40A1 => so the GO enrichment is basically that of SLC11A2.

Each original gene separately

Inferred from CIAPIN1

mkdir -p GO_FDR/separate/CIAPIN1/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/CIAPIN1.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/CIAPIN1/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "8 provided genes; 7 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 313 distinct GO terms"
[1] "Of those 313 GO terms, 64 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from CIAPIN1 w/o CIAPIN1

mkdir -p GO_FDR/separate/CIAPIN1_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/CIAPIN1_wo_original.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/CIAPIN1_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "7 provided genes; 6 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 310 distinct GO terms"
[1] "Of those 310 GO terms, 73 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from HFE2

mkdir -p GO_FDR/separate/HFE2/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/HFE2.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/HFE2/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "5 provided genes; 3 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 17 distinct GO terms"
[1] "Of those 17 GO terms, 15 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from HFE2 w/o HFE2

mkdir -p GO_FDR/separate/HFE2_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/HFE2_wo_original.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/HFE2_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "4 provided genes; 3 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 17 distinct GO terms"
[1] "Of those 17 GO terms, 15 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from NEO1

mkdir -p GO_FDR/separate/NEO1/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/NEO1.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/NEO1/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "2 provided genes; 2 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 45 distinct GO terms"
[1] "Of those 45 GO terms, 45 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from NEO1 w/o NEO1

mkdir -p GO_FDR/separate/NEO1_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/NEO1_wo_original.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/NEO1_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "1 provided genes; 1 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 31 distinct GO terms"
[1] "Of those 31 GO terms, 31 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
wrong orderBy parameter; set to default `orderBy = "x"`
[1] "Writing last output image (this one might take some time)..."
`universe` is not in character and will be ignored...
No gene set have size > 10 ...
--> return NULL...
Warning message:
The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
Please use the `.add` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in UseMethod("mutate") : 
  no applicable method for 'mutate' applied to an object of class "NULL"
Calls: pbar -> %>% -> ggplot -> summarise -> group_by -> mutate
Execution halted

Inferred from SLC11A2

mkdir -p GO_FDR/separate/SLC11A2/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/SLC11A2.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/SLC11A2/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "8 provided genes; 8 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 104 distinct GO terms"
[1] "Of those 104 GO terms, 18 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from SLC11A2 w/o SLC11A2

mkdir -p GO_FDR/separate/SLC11A2_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/SLC11A2_wo_original.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/SLC11A2_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "7 provided genes; 7 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 68 distinct GO terms"
[1] "Of those 68 GO terms, 0 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
Error in `[.data.frame`(d, , 2) : undefined columns selected
Calls: upsetplot ... eval_tidy -> split -> split.default -> [ -> [.data.frame
Execution halted

Inferred from SLC39A14

mkdir -p GO_FDR/separate/SLC39A14/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/SLC39A14.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/SLC39A14/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "4 provided genes; 4 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 70 distinct GO terms"
[1] "Of those 70 GO terms, 69 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from SLC39A14 w/o SLC39A14

mkdir -p GO_FDR/separate/SLC39A14_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/SLC39A14_wo_original.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/SLC39A14_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "3 provided genes; 3 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 19 distinct GO terms"
[1] "Of those 19 GO terms, 19 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from TFR2

mkdir -p GO_FDR/separate/TFR2/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/TFR2.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/TFR2/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "15 provided genes; 14 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 305 distinct GO terms"
[1] "Of those 305 GO terms, 0 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
Error in `[.data.frame`(d, , 2) : undefined columns selected
Calls: upsetplot ... eval_tidy -> split -> split.default -> [ -> [.data.frame
Execution halted

No significant enrichment found in any GO term.

Inferred from TFR2 w/o TFR2

mkdir -p GO_FDR/separate/TFR2_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/TFR2_wo_original.list -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/TFR2_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "14 provided genes; 13 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 276 distinct GO terms"
[1] "Of those 276 GO terms, 0 have a BH-adjusted p-val < 0.1"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
Error in `[.data.frame`(d, , 2) : undefined columns selected
Calls: upsetplot ... eval_tidy -> split -> split.default -> [ -> [.data.frame
Execution halted

No significant enrichment found in any GO term.

Inferred from TMPRSS6

mkdir -p GO_FDR/separate/TMPRSS6/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/TMPRSS6.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/TMPRSS6/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "7 provided genes; 6 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 185 distinct GO terms"
[1] "Of those 185 GO terms, 46 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."

Inferred from TMPRSS6 w/o TMPRSS6

mkdir -p GO_FDR/separate/TMPRSS6_wo_orig/symbol
Rscript GO_enrichment_FDR.R -k "SYMBOL" -G results/separate/TMPRSS6_wo_original.list -f 0.05 -c "BP" -a "BH" -o "default_universe" -d "GO_FDR/separate/TMPRSS6_wo_orig/symbol"
...
[1] "Loading input data..."
[1] "Warning: using defaut universe automatically provided by the clusterProfiler package"
[1] "Done."
[1] "Computing GO enrichment..."
`universe` is not in character and will be ignored...
[1] "Done."
[1] "18866 (default) background genes"
[1] "6 provided genes; 5 found by `enrichGO`"
[1] "Computed GO enrichment (whether significant or not) for 165 distinct GO terms"
[1] "Of those 165 GO terms, 69 have a BH-adjusted p-val < 0.05"
[1] "Writing outputs tables..."
[1] "Done. Writing output images..."
...
[1] "Done."