cinet: a cyberinfrastructure for network science

62
CINET: A CyberInfrastructure for Network Science S.M.Shamimul Hasan On behalf of CINET team Technical Report # 15060 Network Dynamics and SimulaBon Science Lab (NDSSL) Virginia BioinformaBcs InsBtute Virginia Tech

Upload: ndsslvt

Post on 07-Aug-2015

123 views

Category:

Devices & Hardware


4 download

TRANSCRIPT

Page 1: CINET: A CyberInfrastructure for Network Science

CINET:  A  CyberInfrastructure  for    Network  Science  

S.M.Shamimul  Hasan  On  behalf  of    CINET  team  

 Technical  Report  #  15-­‐060  

Network  Dynamics  and  SimulaBon  Science  Lab  (NDSSL)  Virginia  BioinformaBcs  InsBtute  

Virginia  Tech  

Page 2: CINET: A CyberInfrastructure for Network Science

CINET  Team  •  Virginia  Tech:  Keith  Bisset,  Abhijin  Adiga,    Edward  Fox,  

Maleq  Khan,  Chris  Kuhlman,  Henning  Mortveit,  Madhav  Marathe,  Samarth  Swarup,  Anil  VullikanB  

•  Indiana  University:  Geoff  Fox,  Judy  Qiu,  Stephen  Wu  •  SUNY  Albany:  S.S.  Ravi  •  Jackson  State  University:  Richard  Aló,  Chris  Cassidy  •  University  of  Houston  Downtown:  Ongard  Sirisaengtaksin    •  Argonne  NaBonal    Lab  and  U.  Chicago:  Pete  Beckman    •  VT  Students:  S.M.  Shamimul  Hasan,  Md  Hasanuzzaman,  S  M  

Arifuzzaman,  Maksudul  Alam,  Sherif  Abdelhamid,  Zalia  Shams,  Tirtha  Bhaaacharjee  

•  Persistent  Systems:  Harsha,  Gaurav,  Tanmay,  Rakhi,  Abhijeet,  Niranjan  and  Team  

Page 3: CINET: A CyberInfrastructure for Network Science

CINET:  Team  (cont.)  •  Several  evaluators  are  incorporaBng  CINET  into  courses  –  S.  S.  Ravi  at  the  University  at  Albany,  SUNY  –  Edward  Fox  at  Virginia  Tech  –  Anil  VullikanB  at  Virginia  Tech  –  Henning  Mortveit  at  Virginia  Tech  –  Aravind  Srinivasan  at  University  of  Maryland  –  Albert  Esterline  (NCAT)  

•  Other  evaluators  planning  to  use  CINET  in  research  –  Zsuzsanna  Fagyal  at  UIUC  –  Maa  Macauley  at  Clemson  University  –  T.  M.  Murali  at  Virginia  Tech  

 

Page 4: CINET: A CyberInfrastructure for Network Science

Network  

“Network  is  a  group  or  system  of  interconnected  people  or  things”  -­‐  Oxford  DicBonaries      

“Network  science  is  the  study  of  network  representaBons  of  physical,  biological,  and  social  phenomena”  -­‐  NaBonal  Research  Council  

Page 5: CINET: A CyberInfrastructure for Network Science

Network  Science  

•  Research  in  network  science  has  been  increasing  very  rapidly  in  the  last  decade,  in  many  different  scienBfic  fields.  

•  Networks  can  be  very  large:  ~108  nodes,  ~1010  edges,  requiring  HPC  for  analysis  

•  There  is  a  need  for  middleware,  i.e.,  an  interface  layer  o  Domain  experts  don’t  need  to  become  experts  in  graph  theory,  data  

mining,  and  high-­‐performance  compuBng  o  Provides  an  abstracBon  layer  that  allows  separaBon  of  innovaBon  

above  and  below  this  layer  

Page 6: CINET: A CyberInfrastructure for Network Science

CINET:  Vision  •  Self-­‐sustainable  

–  Users  can  contribute  new  networks,  data,  algorithms,  hardware,  and  research  results  

•  Self-­‐manageable  –  End  users  will  be  insulated  from  the  complexiBes  of  resource  allocaBon,  

scheduling,  cross-­‐plahorm  interacBons,  and  other  low-­‐level  concerns  

•  Repeatable  Science  –  The  exact  version  of  a  model  that  produced  a  result  is  kept  –  All  model  input  parameters  are  captured  –  Any  system  configuraBon  informaBon  is  captured  –  All  input  data  versions  are  kept  –  The  enBre  set  of  configuraBon  informaBon  for  an  experiment  (mulBple  

runs)  should  be  accessible  by  providing  a  URL  –  Encourage  users  of  the  system  to  include  pointers  to  results  in  published  

work  

Page 7: CINET: A CyberInfrastructure for Network Science

System  Architecture  

Page 8: CINET: A CyberInfrastructure for Network Science

•  Provides  over  150+  networks,  18  graph  generators  and  80+  measures  

•  New  improved  UI  for  Granite  •  Components  (apps)  that  allow  researchers  to  interact  with  CINET:  

VisualizaBon  of  networks,  Adding  networks,  Adding  structural  analysis  tools  

•  Structural  analysis  using  Galib,  NetworkX  and  SNAP  •  Version  1.0  of  a  Python-­‐based  DSL    for  compuBng  complex  

workflows  •  Resource  manager  1.0  completed:  allows  mulBple  computaBonal  

and  analyBcal  resources  to  be  used  and  selected  •  Website  with  addiBonal  resources  (course  notes,  etc.).  

Version  2.0  

Page 9: CINET: A CyberInfrastructure for Network Science

Digital  Library  

Digital  Library:    v Support  network  science  research  v Manage  conBnuously  produced,  large-­‐scale  scienBfic  output  

v Provide  simulaBon-­‐specific  services  to  support  science  

v Manage  large  network  graphs  and  workflow  of  content  collecBons  

 

Page 10: CINET: A CyberInfrastructure for Network Science

Digital  Library  Data:  –  List  of  networks  &  metadata.  –  List  of  measures  &  metadata.  –  Parameters  for  measures.  –  List  of  generators  &  metadata.  –  Parameters  for  generators.  Services:  — MemoizaBon:  Record  details  of  every  experiment  run  — IncenBvizaBon:  Report  how  many  Bmes  a  parBcular  graph  was  used  

— Browsing  and  Searching:  graphs,  measures,  results  

Page 11: CINET: A CyberInfrastructure for Network Science

TransacBonal  Data  

•  Following  data  is  stored  in  database    –  Users  –  Details  Network  Analysis  run  by  users  including  parameters  set  for  

each  –  Details  Generator  Analysis  run  by  users  including  parameters  set  for  

each  

•  Following  is  stored  in  file  system  –  Output  files  of  Network  &  Generator  Analysis.  

•  Mapping  exists  between  data  stored  in  database  and  file  system  

Page 12: CINET: A CyberInfrastructure for Network Science

Performance  Improvements  

•  Blackboard  is  used  ONLY  for  placing  job  request  

•  Simpler  &  fewer  number  of  components  •  Components  are  fully  distributed  –  Web-­‐app,  blackboard,  brokers  exist  on  separate  VMs  

•  Brokers  are  no  more  required  to  poll  the  data  but  directly  noBfied  by  blackboard  container.  

 

Page 13: CINET: A CyberInfrastructure for Network Science

Resource  Manager  

•  Decides  what  is  the  best  resource  for  a  given  job  request  – Through  a  set  of  defined  rules  

•  Tracks  the  health  of  and  load  on  compute  resources  – And,  considers  this  knowledge  in  determining  the  best  resource(s)  

Page 14: CINET: A CyberInfrastructure for Network Science

Granite  Structural  Analysis  of  Complex  

Networks  

Page 15: CINET: A CyberInfrastructure for Network Science

Graph  Analysis  Resources  and  Challenges  

•  Resources  :  –  StaBc  Analysis  tools:  Provide  efficient  implementaBons  of  various  graph  measures  or  algorithms  (e.g.,  Galib,  NetworkX).    

–  Large  collecBon  of  Data  Sets  (of  networks)  •  Challenge  1:  How  can  we  make  an  analyBc  engine  that  will  

–  Reduce  programming  overhead,    –  Reuse    exisBng  resources    

•  Challenge  2:  Provide  a  simple  computaBonal  interface  to  Domain  Experts  to  use  available  resources  and  program  interacBvely  

Page 16: CINET: A CyberInfrastructure for Network Science

CINET  -­‐  Granite  

•  Granite  allows  users  to  run  various  network  measures  on  a  variety  of  networks  – Measures  can  either  be  staBc  (e.g.,  degree  distribuBon,  cluster  coefficient)  or  dynamic  (e.g.,  disease  diffusion)  

–  Network  size  can  range  from  Bny  (10s  of  nodes)  to  very  large  (100s  of  millions  of  nodes)  

•  Granite  automaBcally  picks  best  implementaBon  of  specified  measure  

•  Granite  automaBcally  picks  most  appropriate  compute  resource  

Page 17: CINET: A CyberInfrastructure for Network Science

•  Granite  includes  modules  from  three  graph  algorithm  libraries:  –  Galib  (developed  at  NDSSL)    –  NetworkX  (developed  at  Los  Alamos  NaBonal  Lab)    –  SNAP  (developed  at  Stanford  University)  

Graph  Libraries  

CINET:  A  CyberInfrastructure  for    Network  Science  

Page 18: CINET: A CyberInfrastructure for Network Science

Graph  Centrality  Measures  in  CINET  u  Degree  list  <Node-­‐ID,  Degree>  u  Degree  statistics    u  Degree  distribution  u  Average  neighbor  degree  u  Hub-­‐authority  u  Pagerank  

u  Clustering  coefficient  distribution  

u  Streaming-­‐based  CC  distribution  (apprx.)  

u  Betweenness  centrality  

u  Closeness  centrality  u  Degree  centrality  u  Eigenvalue  centrality  

u  k-­‐core    u  k-­‐crust  u  k-­‐corona  u  k-­‐clique  coefficient    u  Core  number  

u  Ro  distribution  

u  Coreness  of  nodes  <ID,  coreness>  u  CC  list        <Node-­‐ID,  CC>  u  External-­‐memory  CC  algorithm  

(exact)  

u  Parallel  CC  algorithm  

u  Generate  degree  sequence  u  Closeness  centrality  -­‐  weighted  

u  Ro  distribution  u  Closeness  vitality  –  

unweighted  

u  Closeness  vitality  -­‐  weighted  

u  Communicability  centrality  

u  In-­‐degree  centrality  u  Out-­‐degree  centrality  

Page 19: CINET: A CyberInfrastructure for Network Science

Graph  Shortest  path  and  ConnecBvity  Measures  in  CINET  

u  Number  of  connected  components    

u  Component  graph  

u  Component  size  distribution      

u  Strongly  connected  component  

u  Weakly  connected  component  

u  Bi-­‐connected  component  

u  Check  bi-­‐connectivity  

u  BFS  tree  /  forest    

u  BFS  predecessor  list  u  BFS  successor  list  u  Partitioning  by  BFS  traversal  u  DFS  predecessor  list  u  DFS  Successor  list  u  DFS:  nodes  in  post-­‐order  

visits  

u  DFS  Tree  u  Articulation  point  u  Bridge  edges  u  Diameter  

u  Center  u  Periphery  u  Check  connectivity  u  Eccentricity  

u  Radius  u  DFS:  nodes  in  pre-­‐order  visits   u  Check  if  graph  is  s  DAG  

u  Topological  sort  

Page 20: CINET: A CyberInfrastructure for Network Science

Weighted  Shortest  Path  and  MoBf  counBng  

u  Minimum  spanning  tree  

u  Single  source  shortest  path  

Weighted  shortest  path  related  u  Shortest  path  tree/forest  u  Weighted  diameter  (exact  and  approx.)  

u  Average  pairwise  distance  (exact  and  approx.)  

u  Distribution  of  pair-­‐wise  distance  (exact  and  approx.)  

Subgraph  /  Motif  counting  u  Count  triangle        

u  Clique  counts  (specialized)  u  Graph  transitivity  u  All  maximal  clique  

u  Clique  number  

u  Largest  clique  containing  a  node  

Flow  u  Maximum  flow    

u  Minimum  cut  

CINET:  A  CyberInfrastructure  for    Network  Science  

Page 21: CINET: A CyberInfrastructure for Network Science

Other  Measures  

u  Shuffle  edges  

u  Degree-­‐assortative  shuffle  

u  Age-­‐assortative  shuffle  

u  Compare  graphs  

u  Remove  nodes  

u  Remove  edges  

u  Remove  high  degree  nodes      (top  x%)  

u  Remove  high  degree  nodes  (degree  >=x)  

u  Check  if  a  degree  sequence  is  graphical  

u  Compare  graphs  

u  Isolated  nodes  u  Vertex  cover  u  Dominating  set  

u  Minimum  edge  dominating  set  

u  Check  graph  consistency  u  Check  if  bipartite  graph  

u  Check  if  chordal  graph  u  Maximal  independent  set  

u  Number  of  common  neighbors  

CINET:  A  CyberInfrastructure  for    Network  Science  

Page 22: CINET: A CyberInfrastructure for Network Science

Simple  GeneraBve  Models  of  Networks  in  CINET  

u Random  graph  generators  u  Erdos-­‐Renyi  random  graph  

u  G(n,  p)  graph  u  G(n,  p)  component  

u  G(n,  m)  graph  

u  G(n,  r)  graph  u  Watts-­‐Strogatz  small-­‐world  graph  

u  Waxman  random  graph    u  Chung-­‐Lu      

u  Havel-­‐Hakimi  

u  Preferential  Attachment  

u  Small  world  

u  Circle  u  Star  u  Chain  u  Lattice  

u Deterministic  graph  generators  u  Binary  tree  graph  u  Star  u  Wheel  

u  Grid  u  Torus  u  Hypercube  u  Petersen  

Page 23: CINET: A CyberInfrastructure for Network Science

Currently  Available  Networks  •  150+  small  and  large  networks  

–  Sizes  vary  from  100  edges  to  110M  edges  –  Social  contact  networks    

•  Chicago,  Washington  DC,  Detroit,  New  York,  Seattle  –  Multi-­‐modal  urban  transportation  networks  (e.g.,  subway,  cars,  

buses).    •  Portland,  OR  

–  Adolescent  friendship  networks  •  High  school  in  New  River  Valley  

–  Blog  and  other  online  networks  •  Slashdot,  Epinions  

–  Infrastructure  networks  •  Ad  hoc  and  mesh,  phone  call,  electrical  power  

–  Biological  networks  

Page 24: CINET: A CyberInfrastructure for Network Science

Networks  in  CINET  (cont.)  Types  of  Networks  u  Web  graph    u  Autonomous  System/Internet    u  Road/transport  networks    u  Collaboration  networks    u  Co-­‐appearance  networks    u  Social  networks    u  Biological  networks    u  Infrastructure(e.g.  power)    u  Others  

u  Stanford  SNAP  u  Pajek  Dataset  u  http://www-­‐personal.umich.edu/~mejn/netdata/  u  Some  others  publicly  available  sources  

Original  Sources  

Page 25: CINET: A CyberInfrastructure for Network Science

List  of  Networks  

Autonomous  System/Internet   Web  Graph  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010331  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010407  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010414  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010421  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010428  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010505  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010512  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010519  u  Autonomous  systems  -­‐  Oregon-­‐1  -­‐  010526  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010331  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010407  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010414  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010421  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010428  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010505  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010512  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010519  u  Autonomous  systems  -­‐  Oregon-­‐2  -­‐  010526  u  The  Internet  Topology  Zoo  -­‐  AboveNet  u  The  Internet  Topology  Zoo  -­‐  AGIS  

u California  Web  Graph  u EPA  Web  Graph  u EuroSiS  web  mapping  study  u Web  Graph  of  Berkeley  and  Stanford  

Collaboration  Graph  

u Condense  Matter  collaboration  network  u Condensed  Matter  collaborations  1999  u Condensed  Matter  collaborations  2003  u Condensed  Matter  collaborations  2005  u CS  PhD  supervision  relation  graph  u Erdos  Collaboration  Network  u General  Relativity  and  Quantum  Cosmology  

collaboration  network  u High-­‐Energy  Theory  Collaboration  Network  2001  u High-­‐Energy  Theory  Collaboration  network  2003  u Network  Science  Collaboration  u Phenomenology  Collaboration  Network  

Page 26: CINET: A CyberInfrastructure for Network Science

Social,  Proximity  and    Infrastructure    Networks  

u Miami  Chung-­‐Lu  u Miami  Contact  Network  u Portland  Contact  Network  u Primary  School  Cumulative  

Networks  1  u Primary  School  Cumulative  

Networks  2  u Seattle  Contact  Network  u Slashdot  Social  Network  2008  u Slashdot  Social  Network  2009  u Youtube  Social  Network  

Road/Transport/Infrastructure  Networks  

u Airlines  u California  transportation  Network  u Pennsylvania  transportation  

network  u Texas  transportation  network  u US  Air  Lines  u US  Power  Grid  u Western  States  Power  Grid  

u Dolphins'  Social  Network  in  NZ  u Brightkite  Friendship  network  u Enron  Email  Data  with  Manager-­‐Subordinate  

Relationship  Metadata  u Enron  email  Network  u Enron  Giant  Component  u Epinions  Scoical  Network  u Giant  Component  of  Brightkite  Network  u Giant  Component  of  Epinions  Networks  u Giant  Component  of  Gowalla  Network  u Giant  Component  of  Max  Planck's  Facebook  

Network  u Giant  Component  of  Slashdot0811  Network  u Giant  Component  of  Slashdot0902  Network  u Gowalla  friendship  network  u Hypertext  2009  dynamic  contact  network  u Hyves  Social  Network  u Infectious  SocioPatterns  -­‐  2009-­‐04-­‐28  u Infectious  SocioPatterns  -­‐  2009-­‐04-­‐29  u Karate  network  u LiveJournal  Social  Network  u Max  Planck  -­‐  Flickr  Social  Network  

Page 27: CINET: A CyberInfrastructure for Network Science

List  of  Networks  (Contd.)  

Biological  Networks   Co-­‐appearance/co-­‐purchase  Networks  

•  C.  Elegans  Neural  Network  •  Yeast  PPI  network    

Games/Sports  Networks  

•  American  College  Football  Network  

•  Soccer  WorldCup'98  

•  Les  Miserables  • Network  Gloassary  •  PoliBcs  books  • Word  adjacencies  

Others/misc.  Networks  

•  Dynamic  Java  code  •  Small  World  Network  

Page 28: CINET: A CyberInfrastructure for Network Science

Making  Granite  Self-­‐Sustainable:  Concept  of  Services  and  Apps  

Page 29: CINET: A CyberInfrastructure for Network Science

User  Management  

•  User  can  request  account.  Account  is  operaBonal  only  aser  Admin  acBvates  it.  

•  Admin  can  acBvate  or  deacBvate  accounts.  •  User  can  change  password.  •  All  the  enBBes  –  Networks,  Measures,  Generators,  Analyses  –  have  owners.    

Page 30: CINET: A CyberInfrastructure for Network Science

User  Management  

Page 31: CINET: A CyberInfrastructure for Network Science

Add  Network  •  User  can  add  network  by  uploading  network  file  •  Uploaded  network  is  validated  •  For  valid  networks,  edges  &  nodes  are  automaBcally  

calculated  •  Networks  are  converted  into  .gph  &  .nx  format  –  •  User  can  specify  metadata  for  the  uploaded  network  •  User  can  specify  if  the  network  is  –  

–  Public  :  available  to  all  users  for  analysis.  –  Private:  available  to  only  the  owner,  which  is  the  default  opBon  

Page 32: CINET: A CyberInfrastructure for Network Science

Add  Network  

Page 33: CINET: A CyberInfrastructure for Network Science

VisualizaBon  •  CINETViz  app  fully  integrated  in  Granite.  •  User  can  submit  visualizaBon  job  for  a  network.  •  VisualizaBon  process  is  scalable  &  abstracted  from  backend  through  middleware  (blackboard  &  brokers)  

•  Once  visualizaBon  job  is  completed,  user  can  view  &  download  generated  visualizaBon.  

•  VisualizaBon  has  2  user  interfaces  in  Granite    –  Quick  view  while  selecBng  network  for  analysis  –  Detailed  view  in  VisualizaBon  tab  

Page 34: CINET: A CyberInfrastructure for Network Science

Features  –  VisualizaBon  

Page 35: CINET: A CyberInfrastructure for Network Science

   VisualizaBon  of  Networks  (Contd.)  

Karate Club Network Miami Graph

Page 36: CINET: A CyberInfrastructure for Network Science

VisualizaBon  of  Networks  (Contd.)  

Amazon Co-purchase Network

Page 37: CINET: A CyberInfrastructure for Network Science

CINET  website  •  Central  locaBon  of  CINET  •  Portal  for  course  materials    •  Web  address  hJp://www.vbi.vt.edu/ndssl/cinet    

CINET:  A  CyberInfrastructure  for    Network  Science  

Page 38: CINET: A CyberInfrastructure for Network Science

Graph  Dynamical  Systems  Calculator  (GDSC)  

•  Provide  a  Web  ApplicaBon  to  enable  users  to  compute  dynamics  for  their  systems.  

•  Evaluate  arbitrary  (small)  graphs,  a  range  of  vertex  funcBons,  and  update  schemes.  

•  GDSC  is  an  applicaBon  in  CINET.  

Overview

Page 39: CINET: A CyberInfrastructure for Network Science

Future  Work  

•  Add  graph  modificaBon  algorithms  –  Remove  edges  –  Swap  edges  

•  Add  data  model  to  manage  system  workflow  •  Domain  specific  language  •  Registry  Service  

Page 40: CINET: A CyberInfrastructure for Network Science

 Digital  Library  to  support  

ComputaBonal  Epidemiology  Datasets  

Page 41: CINET: A CyberInfrastructure for Network Science

SyntheBc  InformaBon  Based  Epidemiological  Laboratory  (SIBEL)  

Page 42: CINET: A CyberInfrastructure for Network Science

The  Problem  

•  ComputaBonal  epidemiology  employs  computer  models  and  informaBcs  tools  to  reason  about  the  spaBo-­‐temporal  spread  of  diseases.  

•  Studies  are  conducted,  in  general,  through  the  use  of  a  simulaBon  and  require  informaBon  on  the  populaBon  structure,  agent  behavior,  disease  transmission,  and  a  model  of  the  disease.  

•  The  heterogeneous  content  includes  metadata,  text,  tables,  spreadsheets,  experimental  descripBons,  and  large  result  files.  

Page 43: CINET: A CyberInfrastructure for Network Science

NDSSL’s  networked  epidemiology  data  repository  

Category   Data   Size   Representation  Synthetic  Population  

Household,  Person  Activity  

566  GB   Relational  

Social  Network  and  Output  

Contact  Network,  Simulation  Output  

1.84  TB   File  

Experiment   Experiment   240  GB   Relational  

Page 44: CINET: A CyberInfrastructure for Network Science

The  Problem  (cont.)  •  Data  access  and  digital  library  services  in  current  setups  are  

cumbersome  due  to  heterogeneity  and  fragmentaBon  across  datasets.  

•  There  is  no  accepted  framework  that  allows  unified  access  to  such  content.  

•  The  diversity  of  models,  data  sources,  data  representaBons,  and  modaliBes  that  are  collected,  used,  and  modified  moBvate  the  development  of  a  digital  library  (DL)  framework  to  support  computaBonal  epidemiology.  

•  We  propose  a  data  mapping  framework  for  digital  library  systems  for  computaBonal  epidemiology  datasets.  

•  The  proposed  framework  provides  a  unified  view  to  access  and  query  complete  epidemiology  workflow  data.  

Page 45: CINET: A CyberInfrastructure for Network Science

Unified  View  to  Access  and  Query  Complete  Epidemiology  Workflow  Data  

Page 46: CINET: A CyberInfrastructure for Network Science

Resource  DescripBon  Framework  (RDF)  

•  Directed  labeled  graphs  •  Model  elements  

–  Resource:  These  are  the  things  being  described  by  RDF  expressions.    

–  Property:  Is  a  specific  aspect,  characterisBc,  aaribute  or      relaBon  used  to  describe  a  resource  Value  

–  Statement:  A  statement  in  RDF  consists  of    resource  +  property  +  value                                                                                                              subject            predicate          object    

Page 47: CINET: A CyberInfrastructure for Network Science

RDF  Example  

•  For  the  statement  “Shamimul  Hasan  is  the  creator  of  the  web  page  www.vt.edu/~shasan2.    

•  We  have  RDF  statement  as  

•  Node  and  arc  diagram  as  

Subject(resource)   www.vt.edu/~shasan2  

Predicate(property)   creator  

Object(literal)                 “Shamimul  Hasan”  

www.umr.edu/~shasan2 Shamimul Hasan creator  

Page 48: CINET: A CyberInfrastructure for Network Science

Framework  •  Data  mapping  provides  us  the  flexibility  to  switch  between  various  

databases  and  execute  queries  on  them.  

Page 49: CINET: A CyberInfrastructure for Network Science

Experimental  Study  

•  We  considered  a  real-­‐Bme  epidemiology  simulaBon  study  conducted  in  the  Seaale  area.  The  study  assumed  that  influenza  transmits  in  various  regional  populaBons  through  person-­‐person  contact.  

•  We  use  the  D2RQ  Mapping  Language  to  convert  relaBonal  and  file  data  to  RDF  graphs,  Virtuoso  Open-­‐Source  EdiBon  6.1.6  as  RDF  data  engine,  and  the  SPARQL  query  language.  

Page 50: CINET: A CyberInfrastructure for Network Science

Experimental  Study  (cont.)  

Databases   RDF  Graph  Size  (GB)  

Number  of  Triples  

RDF  Graph  Generation  

Time  (Minutes)  

Seattle  Synthetic  Population  

177   661,848,662   317  

Output   3.10   12,979,996   6  

Experiment   0.01   66,654   0.37  

Page 51: CINET: A CyberInfrastructure for Network Science

Experimental  Study  (cont.)  Queries   Bottom-­‐up  Approach  

(SPARQL  Query  Runtime  in  Seconds)  

Top-­‐down  Approach  (SPARQL  Query  

Runtime  in  Seconds)  How  many  people  of  a  particular  demographic  are  sick?  

0.04   7.18  

Find  who  infected  whom  of  a  particular  Demographic  

0.38   9.18  

How  many  people  get  infected  on  a  particular  simulation  day?  

0.03   5.76  

Page 52: CINET: A CyberInfrastructure for Network Science

Reference  •  Sherif  Hanie  El  Meligy  Abdelhamid,  Md.  Maksudul  Alam,  Richard  Aló,  Shaikh  Arifuzzaman,  Peter  H.  

Beckman,  Tirtha  Bhaaacharjee,  Md  Hasanuzzaman  Bhuiyan,  Keith  R.  Bisset,  Stephen  Eubank,  Albert  C.  Esterline,  Edward  A.  Fox,  Geoffrey  Fox,  S.  M.  Shamimul  Hasan,  Harshal  Hayatnagarkar,  Maleq  Khan,  Chris  J.  Kuhlman,  Madhav  V.  Marathe,  Natarajan  Meghanathan,  Henning  S.  Mortveit,  Judy  Qiu,  S.  S.  Ravi,  Zalia  Shams,  Ongard  Sirisaengtaksin,  Samarth  Swarup,  Anil  Kumar  S.  VullikanB,  Tak-­‐Lon  Wu:  CINET  2.0:  A  CyberInfrastructure  for  Network  Science.  eScience  2014:  324-­‐331  

•  S.  M.  Shamimul  Hasan,  Sandeep  Gupta,  Edward  A.  Fox,  Keith  R.  Bisset,  Madhav  V.  Marathe:  Data  mapping  framework  in  a  digital  library  with  computaBonal  epidemiology  datasets.  JCDL  2014:  449-­‐450  

•  S.  M.  Shamimul  Hasan,  Keith  R.  Bisset,  Edward  A.  Fox,  Kevin  Hall,  Jonathan  Leidig,  Madhav  V.  Marathe:  An  Extensible  Digital  Library  Service  to  Support  Network  Science.  ICCS  2013:  419-­‐428  

•  Sherif  Elmeligy  Abdelhamid,  Richard  Aló,  S.  M.  Arifuzzaman,  Peter  H.  Beckman,  Md  Hasanuzzaman  Bhuiyan,  Keith  R.  Bisset,  Edward  A.  Fox,  Geoffrey  Charles  Fox,  Kevin  Hall,  S.  M.  Shamimul  Hasan,  Anurodh  Joshi,  Maleq  Khan,  Chris  J.  Kuhlman,  Spencer  J.  Lee,  Jonathan  Leidig,  Hemanth  MakkapaB,  Madhav  V.  Marathe,  Henning  S.  Mortveit,  Judy  Qiu,  S.  S.  Ravi,  Zalia  Shams,  Ongard  Sirisaengtaksin,  Rajesh  Subbiah,  Samarth  Swarup,  Nick  Trebon,  Anil  VullikanB,  Zhao  Zhao:  

•  CINET:  A  cyberinfrastructure  for  network  science.  eScience  2012:  1-­‐8  •  Resource  DescripBon  Framework  (RDF)  developed  by    World  Wide  Web  ConsorBum  (W3C)-­‐  hap://

bit.ly/1aXP5k2  

Page 53: CINET: A CyberInfrastructure for Network Science

Student  AcBvity  

•  Please  Visit  Granite  website:      hap://ndssl.vbi.vt.edu/apps/cinet/  

•  Launch  App  •  Login  

–  Username:  demo    –  Password:  demo1234  

•  Start  a  New  Analysis  with  “Karate”  network  and  “PageRank”  measure.  

•  Check  analysis  report.  

Page 54: CINET: A CyberInfrastructure for Network Science

Many  Thanks!  

         

Page 55: CINET: A CyberInfrastructure for Network Science

AddiBonal  Slides    

Page 56: CINET: A CyberInfrastructure for Network Science

Extensible  MemoizaBon  Service  

•  Query  a  set  of  digital  objects  that  exactly  match  a  metadata  paaern  

•  UBlizaBon  –  EducaBon  –  students  –  Baseline  scenarios  –  Comparisons,  body  base,  similar  regions  

 

Page 57: CINET: A CyberInfrastructure for Network Science

Architecture  

Page 58: CINET: A CyberInfrastructure for Network Science

Architecture  (Cont.)  

Page 59: CINET: A CyberInfrastructure for Network Science

Architecture  (Cont.)  

Page 60: CINET: A CyberInfrastructure for Network Science

•  Small  |G|  <  100,000  –  Example:  RND-­‐G(n,p)  Random  Graph  1  (nodes:1,000,  edges:  4,971)  

•  Medium  100,000  ≤|G|<10,000,000  –  Example:  RND-­‐G(n,p)  Random  Graph  500  (nodes:  500,000,  edges:  5.00E+06)  

•  Large  |G|≥10,000,000  –  Example:  Seaale  contact  network  (nodes:  3,207,037,  and  edges:  8.66E+07).  

   

Network  Category  

Page 61: CINET: A CyberInfrastructure for Network Science

Performance  §  Shadowfax  (Virginia  Tech)  §  912  cores,  5  TB  RAM,  80  TB  storage,  7168  CUDA  cores  §  100+  networks  §  100+  measures  

Page 62: CINET: A CyberInfrastructure for Network Science

Performance