ci report for nsf visit
TRANSCRIPT
1
Building a Scalable, Cost-effective Cyberinfrastructure for Multidisciplinary Scientific Research in Big Data Era
Student: Xiangron Ma and Zhao FuAdvisor: Yingtao Jiang Mei Yang University of Nevada, Las Vegas
Department of Electrical and Computer Engineering
• Infrastructure
• Distributed Storage System
• Connectivity
• Processing and Visualization Framework
• User Interface
• Research Results and Plan
• Education/Outreach/Border Impact
Overview
Infrastructure
• Hardware
• System Performance
• Network Architecture
• Virtualization Resources
Hardware
Unit Qty. Details (per unit)
Head node 1 • Processor :• 2X 6-Core 2.4G Hz Xeon
• RAM: 64 GB • Disk: 2X 1.2 TB SAS• Network:
• 2X 10 GE• 1X 10 GE QSFP+
Compute/Storage node
10 • Processor :• 2X Quad-Core 2.5G Hz Xeon
• RAM: 32 GB• Disk:
• 4X 1 TB SATA• 1X 150GB SAS
• Network: • 2X 10 GE
Ethernet switch 1 • 48Port 10GE RJ45• 4Port 40GE QSFP+
Misc. N/A • Enclosure, Ethernet cables, accessories, KVM, labor
Table 1 Hardware Specification
System Performance
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
8.000
9.000
1 core 2 core 4 core 8 core 16 core
Base
Opt
Fig. 1. Benchmark results(Parsec.Bodytrack)
• System Performance Evaluation
Fig. 2. Benchmark results(Matrix Multiplication )
Network Architecture
Namenode(n1)
Interface 1192.168.1.3/28
Controller/Network Node(n2)Interface 110.1.1.2/24
Internet
Firewall
IPMI Interface10.1.0.102/23
Compute/Network Node(n3)
Compute/Network Node(n11)
Management Network10.1.1.1 – 10.1.1.11
Tunnel Network10.1.1.102 – 10.1.1.111
External Network192.168.0.0/28
IPMI Network10.1.0.101 – 10.1.0.111
...
Interface 210.1.1.2/23
NAT
Gateway 10.1.0.1/23
Interface 210.1.1.102/24
Interface 110.1.1.3/24
IPMI Interface10.1.0.103/23
Interface 210.1.1.103/24
Interface 110.1.1.11/24
IPMI Interface10.1.0.111/23
Interface 210.1.1.111/24
Fig. 3. Network Architecture in NRDC
Distributed Storage System
InstanceInstancesInstanceInstances
InstanceInstances
InstanceInstance
Compute NodesCinder
StorageService
HDFS Volume
Cinder Volume
HDFS Volume
Cinder Volume
HDFS Volume
Cinder Volume
HDFS Volume
Cinder Volume
Physical Storage Namespace
(Name Node)
Instances
VHDFS Volume
VHDFS Volume
VHDFSVolume
VHDFS Volume
Virtual Storage Namespace
Nova Compute Service
Neutron Network Service
Virtual Storage Cluster
Physical Storage Cluster
Fig. 4. Distributed Storage System Architecture
Virtualization Resources
Fig. 5. Cloud Computing Resources
• OpenStack Dashboard
Connectivity
Query APIs
list_sites() : enumerate all the sites (sensor towers and cameras) deployed.list_properties(site_id) : enumerate all monitored properties available on specified site.list_streams(site_id): enumerate the camera streamslist_image_sites_names(): List the name of all the sites in storage system.
Transfer APIs
get_sensor_data(sensor_ids) : download sensor data from NRDC, returns dataframe.get_csv(sensor_id): download sensor data and saved as a CSV fileimport_images(siteid, presets, startdate, enddate, timerange, savedir): Download image
from specified direction of one site in the given time range.
Synchronization APIs
db_sync() : synchronize the entire remote database to HDFS
Table 2. Connectivity API
Processing and Visualization Framework
Hadoop
Hive
Apache Spark (cluster)
Connectivity API(REST Client)
Domain Specific Libararies
GNU Octave R
SKLearn
Scipy
Sympy
Storage Service
Connectivity Service
Processing & Visualization
ServiceNumpy
iPython Kernel
Visualization Libraries(Matplotlib,ggplot,plotly,etc.)
iPython Notebook
Fig. 6. Processing and Visualization Framework
User Interface
• Code execution
• Rich text
• Visualization
• Rich media
• Documentation
• IPython Notebook
Fig. 7. User Interface
Research Results and Plan
• Research – Publications
• Published 2 journals, 3 conference papers• Submitted 1 conference, 2 journal papers in preparation
– Proposals• Submitted two NSF grant proposals and one proposal to
Toyota• One proposal collaborated with Nexus researcher to DoD is
under preparation
– Future plan• Service engagement• Further optimization/ improvement • CI-enabled data mining
Education/Outreach/Border Impact
• Results– Hosted two CI workshops to train Nexus researchers and
UNLV students– More than 35 graduate students were trained– Engaged 3 undergraduate students in development work
• Future plan– Develop more workshops/tutorials to Nexus researchers– Engaging more Nexus researchers to use the CI node in
their research work– Better user management – Public user support
Demo