Annotated regulatory Index

2017-

Regulatory information encoded in the human genome is activated by sequence-specific DNA binding factors, creating focal alterations in chromatin structure that are hypersensitive to DNase I. We created deep reference maps of DNase I hypersensitive sites (DHSs) from 733 human biosamples encompassing 439 cell and tissue types and states, and integrated these to delineate and numerically index ~3.6 million DHSs encoded within the human genome, providing a common coordinate system for regulatory DNA.

We find that the scale of cell and tissue states sampled exposes a large degree of stereotyped actuation of large sets of elements, signaling the operation of distinct genome-scale regulatory programs. These actuation patterns of individual elements can be captured comprehensively by a simple regulatory vocabulary reflecting their dominant program. This vocabulary, in turn, enables regulatory annotation of both protein-coding genes and the vast array of well-defined but poorly-characterized non-coding RNA genes.

Finally, we find that regulatory vocabularies open new avenues for systematically interpreting non-coding genetic variation, and substantially empower the connection of disease-associated variation with specific cell and tissue states.

Taken together, our results provide a common and extensible coordinate system and vocabulary for human regulatory DNA, and open a new global perspective on the architecture of human gene regulation.

Material available for download:

  • DHS Index and Vocabulary of ~3.6M DHSs ( tsv, hg38)

    Column Example Description
    seqname chr1 Chromosome
    start 1782520 Start position
    end 1782770 End position
    identifier 1.10643 Unique identifier (chr#.position%)
    mean_signal 1.030869481 Mean DNase-seq signal per biosample (“confidence score”)
    numsamples 54 Number of biosamples with a DHS
    summit 1782650 Estimated DHS summit position
    core_start 1782590 Start position of core-region
    core_end 1782710 End position of core-region
    component Digestive Main DHS Vocabulary component
  • 733 biosample metadata ( html, pdf, tsv, xlsx, Google spreadsheet)

  • DHS-by-biosample matrices
    Rows correspond to DHS Index elements, columns correspond to rows in biosample metadata (both in order).

Meuleman et al., bioRxiv (2019)
Work with Sasha Muratov, John Stamatoyannopoulos and others.

Avatar
Wouter Meuleman
Investigator

My research interests include computational (epi)genomics, genome organization, and data visualization