ci report for nsf visit

14
1 Building a Scalable, Cost-effective Cyberinfrastructure for Multidisciplinary Scientific Research in Big Data Era Student: Xiangron Ma and Zhao Fu Advisor: Yingtao Jiang Mei Yang University of Nevada, Las Vegas Department of Electrical and Computer Engineering

Upload: zhao-fu

Post on 23-Jan-2018

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CI Report for NSF Visit

1

Building a Scalable, Cost-effective Cyberinfrastructure for Multidisciplinary Scientific Research in Big Data Era

Student: Xiangron Ma and Zhao FuAdvisor: Yingtao Jiang Mei Yang University of Nevada, Las Vegas

Department of Electrical and Computer Engineering

Page 2: CI Report for NSF Visit

• Infrastructure

• Distributed Storage System

• Connectivity

• Processing and Visualization Framework

• User Interface

• Research Results and Plan

• Education/Outreach/Border Impact

Overview

Page 3: CI Report for NSF Visit

Infrastructure

• Hardware

• System Performance

• Network Architecture

• Virtualization Resources

Page 4: CI Report for NSF Visit

Hardware

Unit Qty. Details (per unit)

Head node 1 • Processor :• 2X 6-Core 2.4G Hz Xeon

• RAM: 64 GB • Disk: 2X 1.2 TB SAS• Network:

• 2X 10 GE• 1X 10 GE QSFP+

Compute/Storage node

10 • Processor :• 2X Quad-Core 2.5G Hz Xeon

• RAM: 32 GB• Disk:

• 4X 1 TB SATA• 1X 150GB SAS

• Network: • 2X 10 GE

Ethernet switch 1 • 48Port 10GE RJ45• 4Port 40GE QSFP+

Misc. N/A • Enclosure, Ethernet cables, accessories, KVM, labor

Table 1 Hardware Specification

Page 5: CI Report for NSF Visit

System Performance

0.000

1.000

2.000

3.000

4.000

5.000

6.000

7.000

8.000

9.000

1 core 2 core 4 core 8 core 16 core

Base

Opt

Fig. 1. Benchmark results(Parsec.Bodytrack)

• System Performance Evaluation

Fig. 2. Benchmark results(Matrix Multiplication )

Page 6: CI Report for NSF Visit

Network Architecture

Namenode(n1)

Interface 1192.168.1.3/28

Controller/Network Node(n2)Interface 110.1.1.2/24

Internet

Firewall

IPMI Interface10.1.0.102/23

Compute/Network Node(n3)

Compute/Network Node(n11)

Management Network10.1.1.1 – 10.1.1.11

Tunnel Network10.1.1.102 – 10.1.1.111

External Network192.168.0.0/28

IPMI Network10.1.0.101 – 10.1.0.111

...

Interface 210.1.1.2/23

NAT

Gateway 10.1.0.1/23

Interface 210.1.1.102/24

Interface 110.1.1.3/24

IPMI Interface10.1.0.103/23

Interface 210.1.1.103/24

Interface 110.1.1.11/24

IPMI Interface10.1.0.111/23

Interface 210.1.1.111/24

Fig. 3. Network Architecture in NRDC

Page 7: CI Report for NSF Visit

Distributed Storage System

InstanceInstancesInstanceInstances

InstanceInstances

InstanceInstance

Compute NodesCinder

StorageService

HDFS Volume

Cinder Volume

HDFS Volume

Cinder Volume

HDFS Volume

Cinder Volume

HDFS Volume

Cinder Volume

Physical Storage Namespace

(Name Node)

Instances

VHDFS Volume

VHDFS Volume

VHDFSVolume

VHDFS Volume

Virtual Storage Namespace

Nova Compute Service

Neutron Network Service

Virtual Storage Cluster

Physical Storage Cluster

Fig. 4. Distributed Storage System Architecture

Page 8: CI Report for NSF Visit

Virtualization Resources

Fig. 5. Cloud Computing Resources

• OpenStack Dashboard

Page 9: CI Report for NSF Visit

Connectivity

Query APIs

list_sites() : enumerate all the sites (sensor towers and cameras) deployed.list_properties(site_id) : enumerate all monitored properties available on specified site.list_streams(site_id): enumerate the camera streamslist_image_sites_names(): List the name of all the sites in storage system.

Transfer APIs

get_sensor_data(sensor_ids) : download sensor data from NRDC, returns dataframe.get_csv(sensor_id): download sensor data and saved as a CSV fileimport_images(siteid, presets, startdate, enddate, timerange, savedir): Download image

from specified direction of one site in the given time range.

Synchronization APIs

db_sync() : synchronize the entire remote database to HDFS

Table 2. Connectivity API

Page 10: CI Report for NSF Visit

Processing and Visualization Framework

Hadoop

Hive

Apache Spark (cluster)

Connectivity API(REST Client)

Domain Specific Libararies

GNU Octave R

SKLearn

Scipy

Sympy

Storage Service

Connectivity Service

Processing & Visualization

ServiceNumpy

iPython Kernel

Visualization Libraries(Matplotlib,ggplot,plotly,etc.)

iPython Notebook

Fig. 6. Processing and Visualization Framework

Page 11: CI Report for NSF Visit

User Interface

• Code execution

• Rich text

• Visualization

• Rich media

• Documentation

• IPython Notebook

Fig. 7. User Interface

Page 12: CI Report for NSF Visit

Research Results and Plan

• Research – Publications

• Published 2 journals, 3 conference papers• Submitted 1 conference, 2 journal papers in preparation

– Proposals• Submitted two NSF grant proposals and one proposal to

Toyota• One proposal collaborated with Nexus researcher to DoD is

under preparation

– Future plan• Service engagement• Further optimization/ improvement • CI-enabled data mining

Page 13: CI Report for NSF Visit

Education/Outreach/Border Impact

• Results– Hosted two CI workshops to train Nexus researchers and

UNLV students– More than 35 graduate students were trained– Engaged 3 undergraduate students in development work

• Future plan– Develop more workshops/tutorials to Nexus researchers– Engaging more Nexus researchers to use the CI node in

their research work– Better user management – Public user support

Page 14: CI Report for NSF Visit

Demo