fromencoded ata to encodea nalyses

37
FROM ENCODE DATA TO ENCODE ANALYSES J. Seth Stra6an, PhD ENCODE Data Coordina=ng Center (DCC) Asia Pacific Bioinforma=cs Conference January, 2016 J. Seth Stra6an, PhD ENCODE DCC 1

Upload: others

Post on 02-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FROMENCODED ATA TO ENCODEA NALYSES

FROM  ENCODE  DATA  TO  ENCODE  ANALYSES  

J.  Seth  Stra6an,  PhD  ENCODE  Data  Coordina=ng  Center  (DCC)  Asia  Pacific  Bioinforma=cs  Conference  January,  2016  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  1  

Page 2: FROMENCODED ATA TO ENCODEA NALYSES

ENCODE:    Metadata,  Data,  and  Analyses  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  2  

So  far,  you  have  learned  

•  The  ENCODE  Portal  is  the  canonical  source  for  ENCODE  metadata  and  data.  

•  The  Portal  also  documents  ENCODE  standards  like  an=body  standards,  data  release.  

•  The  Portal  links  to  documenta:on  and  tutorials.  

•  How  to  use  the  Portal  to  browse  and  search  what  ENCODE  has  done.  

Focus  for  the  rest  of  the  course  

•  Visualiza:on  of  ENCODE  data.  

•  Programma=c  search  and  download  of  ENCODE  metadata  and  data.  

•  ENCODE  data  analyses,  and  how  you  can  replicate  them.  

Page 3: FROMENCODED ATA TO ENCODEA NALYSES

Find  an  experiment  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  3  

Use  metadata  to  find  data:  

•  Search  for  “H3K9ac  neural  tube”  

•  Facet  on  ChIP-­‐seq;  mouse;  mm10  assembly  

•  Select  an  experiment,  for  example  

h@ps://www.encodeproject.org/

experiments/ENCSR087PLZ/  

•  Note  metadata  on  protocols,  replicates  

•  Graph:  files  are  related  by  processing  steps  

•  Download  from  the  graph  or  a  list  

•  Click  on  “Visualize  Data”  to  visualize  the  

results  of  this  experiment.  

Page 4: FROMENCODED ATA TO ENCODEA NALYSES

Visualize  the  experiment  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  4  

Adjust  the  browser  seengs  to  display  fold-­‐over-­‐signal  in  ”full”  

Page 5: FROMENCODED ATA TO ENCODEA NALYSES

Find  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  5  

Use  metadata  to  find  data:  

•  Search  for  “H3K9ac  neural  tube”  

•  Facet  on  ChIP-­‐seq;  mouse;  mm10  

assembly  

•  Get  a  list  of  several  experiments  

•  Click  on  “Visualize  Data”  to  visualize  

all  the  experiments  matching  this  

search.  

Page 6: FROMENCODED ATA TO ENCODEA NALYSES

Visualize  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  6  

Stage-­‐dependent  

H3K9ac  signal  

present  at  Pax9  in  

neural  tube  at  

e11.5,  e13.5.  

Page 7: FROMENCODED ATA TO ENCODEA NALYSES

Find  &  download  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  7  

Use  metadata  to  find  data:  

•  Search  for  “H3K9ac  neural  tube”  

•  Facet  on  ChIP-­‐seq;  mouse;  mm10  

assembly  

•  Get  a  list  of  several  experiments  

•  Click  on  “Download”  to  download  

selected  metadata  and  complete  

links  to  data.  

Page 8: FROMENCODED ATA TO ENCODEA NALYSES

Download  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  8  

Use  metadata  to  find  data:  

•  Search  for  “H3K9ac  neural  tube”  

•  Facet  on  ChIP-­‐seq;  mouse;  mm10  

assembly  

•  Get  a  list  of  several  experiments  

•  Click  on  “Download”  to  download  

selected  metadata  and  complete  

links  to  data.  

Page 9: FROMENCODED ATA TO ENCODEA NALYSES

Download  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  9  

•  “Download”  produces  a  file  with  a  list  of  

links  to  all  the  files  for  all  the  experiments  in  

your  search.  

•  You  can  iterate  through  the  list  in  your  own  

script.  

•  Or:  

xargs -n 1 curl -O -L < files.txt!

•  The  first  link  is  to  a  file  called  metadata.tsv  

that  contains  metadata  you  need  to  

interpret  what  each  file  is.  

Page 10: FROMENCODED ATA TO ENCODEA NALYSES

Download  several  experiments  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  10  

•  metadata.tsv:    Each  line  contains  metadata  on  a  file  from  the  download  package.  

Page 11: FROMENCODED ATA TO ENCODEA NALYSES

Programma=c  access  via  the  ENCODE  REST  API  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  11  

•  All  Portal  content  is  accessible  via  URL’s;  just  add  ?format=json!•  The  database  record  is  returned  in  JSON  format  •  JSON  can  be  parsed  in  your  language  of  choice  

Page 12: FROMENCODED ATA TO ENCODEA NALYSES

Programma=c  access  via  the  ENCODE  REST  API  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  12  

Page 13: FROMENCODED ATA TO ENCODEA NALYSES

Programma=c  access  via  the  ENCODE  REST  API  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  13  

Page 14: FROMENCODED ATA TO ENCODEA NALYSES

Programma=c  access  via  the  ENCODE  REST  API  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  14  

Page 15: FROMENCODED ATA TO ENCODEA NALYSES

The  ENCODE  Portal:    Recap  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  15  

•  Interac=ve  access  to  ENCODE  metadata  via  faceted  browsing  and  search  •  Interac=ve  retrieval  of  ENCODE  data  one  file  at  a  =me  •  Batch  download  of  ENCODE  metadata  and  data  files  •  Programma=c  access  using  the  ENCODE  REST  API  

Next:    ENCODE  Data  Analysis  Pipelines    •  What  do  they  produce?  •  How  can  they  be  run?  

Page 16: FROMENCODED ATA TO ENCODEA NALYSES

Pipelines  Demonstra=on  and  Exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  16  

To  set  up  an  account:  h6ps://www.encodeproject.org/tutorials/apbc-­‐2016/    Click  “Prepare  to  run  web-­‐based  pipelines”  

Log  in  -­‐>  

Page 17: FROMENCODED ATA TO ENCODEA NALYSES

DCC  Delivers  ENCODE  Data  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  17  

+  CCCFFFFFHHHHGIJJIGGHEIIEGGEGGIJJBHIG  @BI:SL-­‐HAB:D0RRAACXX:8:2309:21201:7829  1:X:0:GCCGTCGA  CTAACCCTAACCCTAACCCTAACCCTAACCCTAACC  +  CCCFFFFFHHHHHJJJJJJJGJJJJIIJJJJGGIGJ  @BI:SL-­‐HAB:D0RRAACXX:8:2113:4623:40045  1:X:0:GCCGTCGA  GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA  +  ??@ADDBDH:CDHHI+AEFHI?GGHII:EFIII?F=  @BI:SL-­‐HAB:D0RRAACXX:8:2206:11680:21762  1:X:0:GCCGTCGA  AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT  +  Sample   Library   Primary  Data   Processed  Data  

AWS  S3  Bucket  ENCODE  Files  

Page 18: FROMENCODED ATA TO ENCODEA NALYSES

ENCODE  DCC  Delivers  ENCODE  Metadata  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  18  

+  CCCFFFFFHHHHGIJJIGGHEIIEGGEGGIJJBHIG  @BI:SL-­‐HAB:D0RRAACXX:8:2309:21201:7829  1:X:0:GCCGTCGA  CTAACCCTAACCCTAACCCTAACCCTAACCCTAACC  +  CCCFFFFFHHHHHJJJJJJJGJJJJIIJJJJGGIGJ  @BI:SL-­‐HAB:D0RRAACXX:8:2113:4623:40045  1:X:0:GCCGTCGA  GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA  +  ??@ADDBDH:CDHHI+AEFHI?GGHII:EFIII?F=  @BI:SL-­‐HAB:D0RRAACXX:8:2206:11680:21762  1:X:0:GCCGTCGA  AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT  +  Sample   Library   Primary  Data   Processed  Data  

Page 19: FROMENCODED ATA TO ENCODEA NALYSES

ENCODE  Analysis  Pipelines  as  Deliverables  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  19  

+  CCCFFFFFHHHHGIJJIGGHEIIEGGEGGIJJBHIG  @BI:SL-­‐HAB:D0RRAACXX:8:2309:21201:7829  1:X:0:GCCGTCGA  CTAACCCTAACCCTAACCCTAACCCTAACCCTAACC  +  CCCFFFFFHHHHHJJJJJJJGJJJJIIJJJJGGIGJ  @BI:SL-­‐HAB:D0RRAACXX:8:2113:4623:40045  1:X:0:GCCGTCGA  GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA  +  ??@ADDBDH:CDHHI+AEFHI?GGHII:EFIII?F=  @BI:SL-­‐HAB:D0RRAACXX:8:2206:11680:21762  1:X:0:GCCGTCGA  AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT  +  Sample   Library   Primary  Data   Processed  Data  

Goals:  1.  Deploy  ENCODE-­‐defined  pipelines  for  ChIP-­‐seq,  RNA-­‐seq,  DNase-­‐seq,  methyla=on.  2.  Use  those  pipelines  to  generate  the  standard  ENCODE  peaks,  quan=ta=ons,  CpG.  3.  Capture  metadata  to  make  clear  what  sosware,  versions,  parameters,  inputs  were  used.  4.  Capture,  accession,  and  distribute  the  output.  5.  Deliver  exactly  the  same  pipelines  in  a  form  that  anyone  can  run  on  their  data  or  with  

ENCODE  data  –  one  experiment  or  1000.  

Replicability  –  Provenance  –  Ease  of  Use  –  Scalability  

Page 20: FROMENCODED ATA TO ENCODEA NALYSES

Deployment  Plauorm  Considera=ons  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  20  

+  CCCFFFFFHHHHGIJJIGGHEIIEGGEGGIJJBHIG  @BI:SL-­‐HAB:D0RRAACXX:8:2309:21201:7829  1:X:0:GCCGTCGA  CTAACCCTAACCCTAACCCTAACCCTAACCCTAACC  +  CCCFFFFFHHHHHJJJJJJJGJJJJIIJJJJGGIGJ  @BI:SL-­‐HAB:D0RRAACXX:8:2113:4623:40045  1:X:0:GCCGTCGA  GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA  +  ??@ADDBDH:CDHHI+AEFHI?GGHII:EFIII?F=  @BI:SL-­‐HAB:D0RRAACXX:8:2206:11680:21762  1:X:0:GCCGTCGA  AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT  +  Sample   Library   Primary  Data   Processed  Data  

Replicability  –  Provenance  –  Ease  of  Use  –  Scalability  We  chose  to  deploy  first  to  a  web/cloud-­‐based  plauorm,  DNAnexus  

Code  is  open  source  and  adaptable  for  deployment  to  your  HPC  environment  h6ps://github.com/ENCODE-­‐DCC  

Develop   Share   Run   Elas:c   Provenance   Cost  HPC  Cluster  (Scripts)   Hard   Hard   Hard   Cluster-­‐Dependent   Moderate   Obscure/Subsidized  

HPC  Container   Hard   Moderate   Moderate   Cluster-­‐Dependent   Good   Obscure/Subsidized  

Web/Cloud   Moderate   Easy   Easy   Highly   Excellent   Apparent  but  Low  

Page 21: FROMENCODED ATA TO ENCODEA NALYSES

Schema:    ENCODE  ChIP-­‐seq  IDR  Pipeline  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  21  

fastq  reads   Map  

Pool  Replicates  Subsample  

Pseudoreplicates  Call  Peaks  

IDR  (TF)  Overlap  (Histone)  

Signal  Tracks  

BAM  BAM  

2  Pseudoreplicates  per  replicate  

2  Pseudoreplicates  per  pool  

Peak  Calls  

IDR-­‐thresholded/replicable  Peak  Calls  

bigWig  BAM,  BAI  Processed,  

mapped  reads  

Target   Key  So^ware   Input  Files   Output  Files   QA  Metrics  

TF's  

bwa  

fastq's  (SE  or  PE)  Two  biological  replicates  

Matched  controls  

   

NRF  (Non-­‐redundant  frac=on)  PBC1  and  2  (PCR  bo6leneck  coefficients)  

Number  of  dis=nct  uniquely-­‐mapping  reads  NSC/RSC  (Strand  cross-­‐correla=on)    

IDR  Rescue  Ra=o  IDR  Self-­‐Consistency  Ra=o  IDR  Reproducibility  Test  

Picard  markDuplicates   One  bam  per  replicate  samtools   bigWig  fold  signal  over  control  

MACS2  (Signal  tracks)   bigWig  p-­‐value  signal  over  control  SPP  (PeakSeq,  GEM  future)   bed/bigBed  true  replicates  peaks  

IDR2   bed/bigBed  pooled  replicates  peaks           bed/bigBed  IDR  thresholded  peaks  

Histone  Mods  

MACS2  for  peaks  Overlap  thresholding  

IDR2  (future)   bed/bigBed  Replicated  peaks  

h0ps://github.com/ENCODE-­‐DCC/chip-­‐seq-­‐pipeline  

Page 22: FROMENCODED ATA TO ENCODEA NALYSES

Pipelines  Demonstra=on  and  Exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  22  

To  set  up  an  account:  h6ps://www.encodeproject.org/tutorials/apbc-­‐2016/  

Log  in  -­‐>  Exercises  

Histone  ChIP-­‐seq            RNA-­‐seq  

Page 23: FROMENCODED ATA TO ENCODEA NALYSES

Uniformly  Processed  Data  On  the  ENCODE  Portal  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  23  

Histone  ChIP-­‐seq  Example  h6ps://www.encodeproject.org/experiments/ENCSR087PLZ/  •  Pipeline  graph  shows  rela=onships  between  files  •  Click  on  files  to  see  more  file  metadata  and  download  links  •  Click  on  steps  to  see  more  sosware  metadata  and  download  links  

Transcrip=on  Factor  ChIP-­‐seq  Example  h6ps://www.encodeproject.org/experiments/ENCSR077DKV/  •  Same  mapping,  signal  tracks  and  peak  calls  •  Also  have  the  IDR-­‐thresholded  peak  calls  •  “Conserva=ve”  set,  based  on  “true”  replicates;  “op=mal”  set  if  peaks  can  be  

rescued  by  pseudo-­‐replica=on.    

Page 24: FROMENCODED ATA TO ENCODEA NALYSES

ENCODE  ChIP-­‐seq  Quality  Metrics:  Resources  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  24  

fastq  reads   Map  

Pool  Replicates  Subsample  

Pseudoreplicates  Call  Peaks   IDR  

Signal  Tracks  

BAM  BAM  

2  Pseudoreplicates  per  replicate  

2  Pseudoreplicates  per  pool  

Peak  Calls  

IDR-­‐thresholded  Peak  Calls  

bigWig  BAM,  BAI  Processed,  

mapped  reads   h0ps://github.com/ENCODE-­‐DCC/chip-­‐seq-­‐pipeline  Es:mates   Descrip:on   References  Depth   Number  of  uniquely  mapping  reads   Jung  YL,    et  al.  Nucleic  Acids  Research.  2014;42(9):e74  

Number  of  dis=nct  uniquly  mapping  reads  

Library  Complexity   Non-­‐Redundant  Frac=on  

Landt  S,  et  al.  Genome  Res.  2012.  22:  1813-­‐1831  PCR  Bo6leneck  Coefficient  

ChIP  Quality   Normalized  Strand  Cross-­‐Correla=on  Rela=ve  Strand  Cross-­‐Correla=on  

Replicate  Concordance  IDR  Rescue  Ra=o  

Li  Q,  et  al.  Annals  Applied  Sta=s=cs.  2011,  Vol.  5,  No.  3,  1752–1779  IDR  Self-­‐Consistency  Ra=o  IDR  Reproducibility  Test  

Page 25: FROMENCODED ATA TO ENCODEA NALYSES

Schema:    ENCODE  WGBS  Pipeline  

Ben  Hitz,  PhD    ENCODE  DCC  25  

RNA-­‐Seq  Pipeline  

Non  bisulfite  conversion  rate  

QC  metrics  Map  to  λ  genome  

FASTQ  (SE/PE)  Replicates  

Extract  methyl  calls  

Trim  Reads   BAM  

BigWigs  BigWigs  BigBEDs  (.bb)  

Map    (converted  genome)  

FASTQ  (SE/PE)  Replicates  

Extract  methyl  calls  

Trim  Reads   BAM  (Bismark)  

BigWigs  BigWigs  BigBEDs  (.bb)  

Map    (converted  genome)  

BISMARK  (v  0.10)  

Bed/BigBed  files  for:  •  CG  context  •  CHG  context  •  CHH  context  

h0ps://github.com/ENCODE-­‐DCC/dna-­‐me-­‐pipeline  

Page 26: FROMENCODED ATA TO ENCODEA NALYSES

Schema:    ENCODE  RNA-­‐seq  Pipeline  

Ben  Hitz,  PhD    ENCODE  DCC  26  

IDR/MAD  

FASTQ  (SE/PE)  Replicates  

Map  Reads  

Quan:fica:on  

Signal  Tracks  BAM  (tophat)  

RSEM  file  

Map  Reads   BAM  (STAR)  

BigWigs  BigWigs  BigWigs  BigWigs  (.bw)  

Signal  Tracks   BigWigs  BigWigs  BigWigs  BigWigs  (.bw)  

QC  &  filtered  quan:fica:on  

FASTQ  (SE/PE)  Replicates  

Map  Reads  

Quan:fica:on  

Signal  Tracks  BAM  (tophat)  

RSEM  file  

Map  Reads   BAM  (STAR)  

BigWigs  BigWigs  BigWigs  BigWigs  

Signal  Tracks  BigWigs  BigWigs  BigWigs  BigWigs  

Replicate  2  

For  each  Mapper  (STAR,  tophat)  BAM  files:  •  mapped  to  genome  •  mapped  to  transcriptome    BigWig  files:  •  plus/minus  strand  (paired)  •  uniquely  mapped  •  mul=+uniquely  mapped  Quan=fica=ons    (RSEM):  •  genome    •  transcriptome    

h0ps://github.com/ENCODE-­‐DCC/long-­‐rna-­‐seq-­‐pipeline  

Page 27: FROMENCODED ATA TO ENCODEA NALYSES

Uniformly  Processed  Data  On  the  ENCODE  Portal  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  27  

RNA-­‐seq  Example  h6ps://www.encodeproject.org/experiments/ENCSR368QPC/  

 •  Pipeline  graph  shows  rela=onships  between  files  •  Click  on  files  to  see  more  file  metadata  and  download  links  •  Click  on  steps  to  see  more  sosware  metadata  and  download  links  

Page 28: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

Ben  Hitz,  PhD    ENCODE  DCC  28  

Page 29: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  29  

Page 30: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  30  

Page 31: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  31  

Page 32: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  32  

“Download”  to  generate  temporary  URL’s  to  the  selected  files  

Page 33: FROMENCODED ATA TO ENCODEA NALYSES

Results  from  the  ChIP-­‐seq  exercise  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  33  

“Download”  to  generate  temporary  URL’s  to  the  selected  files  

Page 34: FROMENCODED ATA TO ENCODEA NALYSES

Visualize  on  the  UCSC  Genome  Browser  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  34  

Page 35: FROMENCODED ATA TO ENCODEA NALYSES

Visualize  on  the  UCSC  Genome  Browser  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  35  

Page 36: FROMENCODED ATA TO ENCODEA NALYSES

Pipeline  Workshop  Summary  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  36  

DCC  Goals:  1.  Deploy  ENCODE-­‐defined  pipelines  for  ChIP-­‐seq,  RNA-­‐seq,  DNase-­‐seq,  methyla=on.  2.  Use  those  pipelines  to  generate  the  standard  ENCODE  peaks,  quan=ta=ons,  CpG.  3.  Capture  metadata  to  make  clear  what  sosware,  versions,  parameters,  inputs  were  used.  4.  Capture,  accession,  and  distribute  the  output.  5.  Deliver  exactly  the  same  pipelines  in  a  form  that  anyone  can  run  on  their  data  or  with  

ENCODE  data  –  one  experiment  or  1000.  Replicability  –  Provenance  –  Ease  of  Use  –  Scalability  

Page 37: FROMENCODED ATA TO ENCODEA NALYSES

Contributors  

J.  Seth  Stra6an,  PhD    ENCODE  DCC  37  

ENCODE  Data  Coordina:ng  Center  Mike  Cherry,  PI,  Stanford  Jim  Kent,  co-­‐PI,  UCSC  Eurie  Hong,  Project  Manager  Pipeline  Developers  Ben  Hitz,  WGBS,  Sosware  Lead  Tim  Dreszer,  RNA-­‐seq,  DNAse-­‐seq  J.  Seth  Stra6an,  ChIP-­‐seq  Portal  Developers  Laurence  Rowe  Nikhil  Podduturi  Forrest  Tanaka  Data  Wranglers  Esther  Chan  Jean  Davidson  Venkat  Malladi  Cricket  Sloan  J.  Seth  Stra6an  QA  &  Biocura:on  Assistance  Brian  Lee  Marcus  Ho  Adi=  Narayanan  Support  Staff  Stuart  Miyasato  Ma6  Simison  Zhenhua  Wang  

ENCODE  Data  Analysis  Center  Zhiping  Weng,  PI,  University  of  Massachuse6s  Mark  Gerstein,  co-­‐PI,  Yale  Methyla:on  Junko  Tsuji,  U  Mass  Eric  Mendenhall,  U  Alabama,  HAIB  RNA-­‐seq  Alex  Dobin,  CSHL  Carrie  Davis,  CSHL  Rafael  Irizarryt,  Harvard  Xintao  Wei,    UConn  Brent  Gravely,  UConn  Colin  Dewey,    U  Wisconsin  Roderic  Guigó,  CRG  Sarah  Djebali,  CRG  ChIP-­‐seq  Anshul  Kundaje,  Stanford  Nathan  Boley,  Stanford  Jin  Lee,  Stanford  

DNAnexus  Mike  Lin  Andey  Kislyuk  Singer  Ma  Bre6  Hannigan  Ohad  Rodeh  Joe  Dale  George  Asimenos  

@encodedcc   encode-­‐[email protected]   h6ps://github.com/ENCODE-­‐DCC/