revenue earned from students in usa
TRANSCRIPT
Jongwook Woo
HiPIC
CSULA
Revenue & employment Analysis of International Students in USA
A CIS 528 Project by:
Priyanka Kale, Apekshit Bhingardive, Aditya Verma, Prof. Jongwook Woo
High Performance Information Computing CenterJongwook Woo
CSULA
Content
Introduction
System Development Cycle
Requirement Analysis
Design
Implementation
Results/Visualization
References
High Performance Information Computing CenterJongwook Woo
CSULA
Purpose
To develop a system which will
assist us to determine the
revenue generated by
international students.
Examining the relationship
between new international
enrollments and institutional
income at public colleges,
universities and professional
organizations in the US.
High Performance Information Computing CenterJongwook Woo
CSULA
Adherence to SDLC
High Performance Information Computing CenterJongwook Woo
CSULA
WHY ?
To understand the effects of increased international student
enrollment on net revenue generation in US
find out the income from Universities
predict the impact of international students on revenue generation
predict employment opportunities in the US
High Performance Information Computing CenterJongwook Woo
CSULA
How ?
• Basic formula for calculating economic benefit
High Performance Information Computing CenterJongwook Woo
CSULA
How ?
Estimate of Economic Benefit, which is the overall imported
dollars from international students without any multiplier effect
Determine the appropriate direct import dollars from
international students studying at U.S. institutions of higher
education
High Performance Information Computing CenterJongwook Woo
CSULA
Continued..
The analysis is specific to each institution’s student expenses
and the type of student (i.e. undergraduate, graduate, non-
degree) reported by each institution.
The analysis is broken down by the tuition and fees at specific
institutions and a derived living expense based upon the
reported institutional living expenses plus estimated
incidentals
High Performance Information Computing CenterJongwook Woo
CSULA
Implementation
Analysis on huge data is required which will be done using the
Hadoop File system (HDFS)
Hadoop environment using Horton Sandbox on Azure
Using Python and HIVE [Pyhive] – iPython Notebook
HUE
Google Fusion tables
WEKA Framework
GitLab and GitHub
High Performance Information Computing CenterJongwook Woo
CSULA
Horton Sandbox configuration
Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory
High Performance Information Computing CenterJongwook Woo
CSULA
Loading data into HDFS:
High Performance Information Computing CenterJongwook Woo
CSULA
Creating Tables from command Line
High Performance Information Computing CenterJongwook Woo
CSULA
Creating tables in HUE from existing data
High Performance Information Computing CenterJongwook Woo
CSULA
Continued..
High Performance Information Computing CenterJongwook Woo
CSULA
Connecting HIVE through Python
• Using Ipython notebook for writing the python code
• Embedding HiveQL inside python code.
High Performance Information Computing CenterJongwook Woo
CSULA
Executing HIVE commands through script:
Example: Input.sql
High Performance Information Computing CenterJongwook Woo
CSULA
Executing the hive script from python code:
High Performance Information Computing CenterJongwook Woo
CSULA
Attempt using HiveQL in spark (Future Prospect)
Executing Hive queries using spark:
High Performance Information Computing CenterJongwook Woo
CSULA
Fetching output in a CSV file for further Visualization
The generated CSV can be directly used for visualization purpose
High Performance Information Computing CenterJongwook Woo
CSULA
Visualizing data with Graphs
$0.00
$5.00
$10.00
$15.00
$20.00
$25.00
Billi
on
s
TOTAL EARNING FROM FEES
High Performance Information Computing CenterJongwook Woo
CSULA
Visualizing data with Graphs
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
Milli
on
s
TOTAL EARNING FROM OTHER EXPENSES
High Performance Information Computing CenterJongwook Woo
CSULA
Visualizing data with Graphs
$0.00 $200.00
TOTAL EARNING FROM FEES
TOTAL EARNING FROM OTHER EXPENSES
$200.25
$0.14
Billions
TOTAL
High Performance Information Computing CenterJongwook Woo
CSULA
Major earning states
California, 9.55%
New York, 10.84%
Pennsylvania, 7.36%
PERCENTAGE OF TOTAL INCOME
High Performance Information Computing CenterJongwook Woo
CSULA
Supervised Learning using Classification:
WEKA framework has been used to classify the states depending on there
total value of earnings.
UserClassifier Algorithm provided by WEKA tool has been used to
generate below graph of classification.
final outcome of the hive script executed in python has been processed
using above mentioned algorithm.
High Performance Information Computing CenterJongwook Woo
CSULA
Classification
The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state
High Performance Information Computing CenterJongwook Woo
CSULA
Visualizing data in Google Fusion Tables
High Performance Information Computing CenterJongwook Woo
CSULA
Employment Analysis – How ?
• Finding data where international student work after their graduation
• Based on the number students employed in current and past years
• Number of employers hiring international students in every filed of the
grad study [Job positions]
High Performance Information Computing CenterJongwook Woo
CSULA
Files on GitHub
High Performance Information Computing CenterJongwook Woo
CSULA
COMING NEXT…….
Predict future incomes and revenues pattern and therefore the
different type of employment opportunities in U.S.A
High Performance Information Computing CenterJongwook Woo
CSULA
References :
• https://nces.ed.gov/ipeds/datacenter/
• https://github.com/priya708/Project-528
• https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0
d743cb8ad9520dbb786/DataSource
• https://cwiki.apache.org/confluence/display/Hive/Tutorial
• https://hortonworks.com/tutorials
• http://www.nafsa.org/
High Performance Information Computing CenterJongwook Woo
CSULA
Thank You !