final presentation v1.8

22
Jobs & Skills Team Grant MMA 865

Upload: chad-koziel

Post on 15-Aug-2015

87 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Final Presentation V1.8

Jobs & Skills

Team Grant

MMA 865

Page 2: Final Presentation V1.8

2

What if you wanted to…

Plan your career How your key skills are trending

Develop labour policy Skill deficits by region or by industry

Train job-ready graduates Add skills to programs and syllabi

(Syllabi is an awesome word)

2014 Aug 16 Team Grant for Queen's School of Business

?

?

?

Page 3: Final Presentation V1.8

32014 Aug 16 Team Grant for Queen's School of Business

SourceExtract

StoreDistill

Analyze

Answer Questions

Page 4: Final Presentation V1.8

4

from linkedin import linkedinimport json

authentication = linkedin.LinkedInDeveloperAuthentication(...)application = linkedin.LinkedInApplication(authentication)

client = pymongo.MongoClient()db = client.jobenginemax_id = db.posting.find({'source':'linkedin'}).sort('raw_data.id',-1).limit(1)[0]['raw_data']['id']

while True: list_of_jobs = application.search_job(selectors= [{'jobs': ['id', 'posting-date‘,...]}], params={'count': 100, 'sort':'DD',...}) for job in reversed(list_of_jobs): if job['id'] <= max_id: continue max_id=job['id'] location=job['locationDescription'] raw_date=job['postingDate'] posteddate=time.strftime("%d/%m/%Y",...)) skills=job['skillsAndExperience'] db.posting.insert({"posted_date": posteddate, "skills": skills, "city": location, "source":'linkedin', "raw_data": job}) time.sleep(300)

from careerbuilder import CareerBuilderimport jsonimport pymongo

cb = CareerBuilder(DEV_KEY)

search = cb.job_search(HostSite='CA', PostedWithin='1')list_of_jobs=search['ResponseJobSearch']['Results']['JobSearchResult']client = pymongo.MongoClient()db = client.jobenginefor job in list_of_jobs:

location=job['Location']posteddate=time.strftime("%m/%d/

%Y",time.strptime(job[‘PostedDate’], "%m/%d/%Y"))skills=job['Skills']['Skill']db.posting.insert({"posted_date":

posteddate, "skills": skills, "city": location, "source": 'careerbuilder', "raw_data": job})

Linked in

2014 Aug 16 Team Grant for Queen's School of Business

to career builder, indeed

Source Extract Store Distill Analyze

from indeed import IndeedClientimport jsonimport pymongoimport time

client = IndeedClient(‘123456')params = { 'l' : "Anywhere", 'co' : "ca", 'userip' : "1.2.3.4", 'useragent' : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)"}search_response = client.search(**params)list_of_jobs = search_response['results']

client = pymongo.MongoClient()db = client.jobengine

for job in list_of_jobs:location=job['city']posteddate=time.strftime("%d/%m/

%Y",time.strptime(job[‘date’], "%a, %d %b %Y %H:%M:%S GMT"))

db.posting.insert({"posted_date": posteddate, "skills": "", "city": location, "source": 'indeed', "raw_data": job})

Page 5: Final Presentation V1.8

5

Results from Canada

60k results per week

300 MB per week

3+ data structures

2014 Aug 16 Team Grant for Queen's School of Business

         "formattedRelativeTime": "5 days ago", 

         "city": "Lillooet", 

         "date": "Thu, 24 Jul 2014 20:21:52 GMT", 

         "formattedLocationFull": "Lillooet, BC", 

         "url": "http://ca.indeed.com/viewjob?jk=7779e5fbf4d0613f&qd=cvKKr6L_4R6jh64NGGBfipMcUh0i4g5C-X18qE0gAzC3Ws-qTrT0d3CswmkqzrsGxdgmiLA9Fpf3adh66N9NEAN9-HvuJGR2pUApIXI2XAs&indpubnum=1243433210984925&atk=18u2anmkg0mqi68p", 

         "jobtitle": "Executive Assistant", 

         "company": "Xaxli'p", 

         "onmousedown": "indeed_clk(this, '834');", 

         "snippet": "The Executive Assistant is responsible for providing administrative and secretarial services and support to the Chief and Council and the Band Administrator... ", 

         "source": "WorkBC", 

         "state": "BC", 

         "sponsored": false, 

         "country": "CA", 

         "formattedLocation": "Lillooet, BC", 

         "jobkey": "7779e5fbf4d0613f", 

         "expired": false, 

         "indeedApply": false

Source Extract Store Distill Analyze

Sample result

Page 6: Final Presentation V1.8

62014 Aug 16 Team Grant for Queen's School of Business

Source A PI

Import IOMongoDB

Source Extract Store Distill Analyze

Python

Hadoop

SAS

Unstructured

Structured

Page 7: Final Presentation V1.8

7

Storage & structure

“Postings” collection Store documents from different sources,

with different structures

Wrapper structure allows uniform retrieval Posted date Skills Source Raw data Location

2014 Aug 16 Team Grant for Queen's School of Business

Source Extract Store Distill Analyze

Page 8: Final Presentation V1.8

8

Challenge & Solution

Identifying new information

Differing data formats

Duplicates between sources

Differing skill set data structures

2014 Aug 16 Team Grant for Queen's School of Business

Source Extract Store Distill Analyze

Page 9: Final Presentation V1.8

92014 Aug 16 Team Grant for Queen's School of Business

import jsonimport pymongo

client = pymongo.MongoClient()db = client.jobengine

# Query to get only the skills and posted_date fieldspostings=db.posting.find({},{"posted_date":1, "skills":1, "_id":0});

# To iterate over each postingfor posting in postings: #Continue processing only if the skills field is not empty if posting['skills'] != "": skills=posting['skills']

#If the skills fields is a list, it will iterate over each element and print the date and the skill, #Otherwise it will just print the date and the content of the skills field if isinstance(skills,list): for skill in skills: print "%s,%s" % (posting['posted_date'],skill.replace(',','').lower()) else: print "%s,%s" % (posting['posted_date'],skills.replace(',','').lower())

from mrjob.job import MRJob

class skillsCount(MRJob): def mapper(self, _, value): date, skill = value.split(",") yield skill, 1 def reducer(self, key, values): yield sum(values), key

if __name__ == '__main__': skillsCount.run()

4 "html"

4 "system integration"

5 "software development"

6 "database"

7 "bookkeeping"

8 "audit"

<date>

<skill>

sort-n

Example: identify in-demand skillsgetPostedDateSkill.py getSkillsCount.py

Source Extract Store Distill Analyze

Page 10: Final Presentation V1.8

10

Trends

2014 Aug 16 Team Grant for Queen's School of Business

Run MR algorithms to return skill mention frequencies by date

Leverage analytics to understand trends, identify seasonality and predict growth / decline

Package to help employers find untapped labour sources and governments target immigration policies

Source Extract Store Distill Analyze

Page 11: Final Presentation V1.8

11

Banks: “communication”

2014 Aug 16 Team Grant for Queen's School of Business

Jun-01 Jul-01 Aug-010

10

20

30

40

50

60

70

Actual

Forecast

Source Extract Store Distill Analyze

Page 12: Final Presentation V1.8

12

Banks: “SAS”

2014 Aug 16 Team Grant for Queen's School of Business

Jun-01 Jul-01 Aug-010

1

2

3

4

5

6

7

8

9

10

Actual

Forecast

Source Extract Store Distill Analyze

Page 13: Final Presentation V1.8

13

Clustering

2014 Aug 16 Team Grant for Queen's School of Business

Run algorithms to return complementary clusters of skills

Analyze for frequency of association to understand relative importance and trends over time

Package to help job seekers learn “next” skills and post-secondary institutions adapt programs and course syllabi

(Used twice in a single presentation!)

Source Extract Store Distill Analyze

Page 14: Final Presentation V1.8

142014 Aug 16 Team Grant for Queen's School of Business

Big data…

Big questions?

Syllabi (third time’s the charm)

Page 15: Final Presentation V1.8

15

Appendix 1: LinkedIn API

2014 Aug 16 Team Grant for Queen's School of Business

from linkedin import linkedin

import json

CONSUMER_KEY='7559rpvtim1fcq'

CONSUMER_SECRET='8mpfyOlPLggQjuvp'

USER_TOKEN='570511eb-3f62-4423-b365-40d78d96a31a'

USER_SECRET='a2795c55-3094-498f-8234-a56a2fc304f0'

RETURN_URL='http://127.0.0.1'

authentication = linkedin.LinkedInDeveloperAuthentication(CONSUMER_KEY, CONSUMER_SECRET,

USER_TOKEN, USER_SECRET,

RETURN_URL, linkedin.PERMISSIONS.enums.values())

application = linkedin.LinkedInApplication(authentication)

profile = application.get_profile(selectors=['id', 'first-name', 'last-name', 'skills'])

print json.dumps(profile, indent=3)

print "*" * 120

jobs = application.search_job(selectors=[{'jobs': ['id', 'customer-job-code', 'posting-date']}], params={'title': 'python', 'count': 2})

print json.dumps(jobs, indent=3)

Page 16: Final Presentation V1.8

16

Appendix 2: CareerBuilder API

2014 Aug 16 Team Grant for Queen's School of Business

from careerbuilder import CareerBuilder

import json

import pymongo

cb = CareerBuilder(DEV_KEY)

search = cb.job_search(HostSite='CA', PostedWithin='1')

list_of_jobs=search['ResponseJobSearch']['Results']['JobSearchResult']

client = pymongo.MongoClient()

db = client.jobengine

for job in list_of_jobs:

location=job['Location']

posteddate=time.strftime("%m/%d/%Y",time.strptime(job[‘PostedDate’], "%m/%d/%Y"))

skills=job['Skills']['Skill']

db.posting.insert({"posted_date": posteddate, "skills": skills, "city": location, "source": 'careerbuilder', "raw_data": job})

Page 17: Final Presentation V1.8

17

Appendix 3: CareerBuilder Result

2014 Aug 16 Team Grant for Queen's School of Business

      "Company": "Robert Half Technology", 

      "CompanyDID": "c8432266b3wfjhdhwpx", 

      "CompanyDetailsURL": "http://www.careerbuilder.ca/jobs/company-name/c8432266b3wfjhdhwpx/robert-half-technology/?sc_cmp1=13_JobRes_ComDet", 

      "DID": "J3G6PM69F3QVJ2MY15G", 

      "OnetCode": "15-1099.04", 

      "ONetFriendlyTitle": "Web Developers", 

      "DescriptionTeaser": "Ref ID: 05090-9688475 Classification: Programmer/Analyst Compensation: DOE Our client is currently looking for candidate with strong understanding of...", 

      "Distance": null, 

      "EmploymentType": "Full-Time Employee", 

      "EducationRequired": "Not Specified", 

      "ExperienceRequired": "Not Specified", 

      "JobDetailsURL": "http://api.careerbuilder.com/v1/joblink?TrackingID=UNTRKD&HostSite=CA&DID=J3G6PM69F3QVJ2MY15G", 

      "JobServiceURL": "https://api.careerbuilder.com/v1/job?DID=J3G6PM69F3QVJ2MY15G&HostSite=CA&DeveloperKey=WDHT5Y26MLSBGLS2HC7G", 

      "Location": "Toronto-M5J 2T3", 

      "LocationLatitude": "43.6432", 

      "LocationLongitude": "-79.3806", 

      "PostedDate": "7/29/2014", 

      "PostedTime": "7/29/2014 8:16:48 PM", 

      "Pay": "N/A", 

    …

Page 18: Final Presentation V1.8

18

Appendix 4: Indeed API

2014 Aug 16 Team Grant for Queen's School of Business

from indeed import IndeedClient

import json

import pymongo

import time

client = IndeedClient(‘123456')

params = {

'l' : "Anywhere",

'co' : "ca",

'userip' : "1.2.3.4",

'useragent' : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)"

}

search_response = client.search(**params)

list_of_jobs = search_response['results']

client = pymongo.MongoClient()

db = client.jobengine

for job in list_of_jobs:

location=job['city']

posteddate=time.strftime("%d/%m/%Y",time.strptime(job[‘date’], "%a, %d %b %Y %H:%M:%S GMT"))

db.posting.insert({"posted_date": posteddate, "skills": "", "city": location, "source": 'indeed', "raw_data": job})

Page 19: Final Presentation V1.8

19

Appendix 5: Indeed Result

2014 Aug 16 Team Grant for Queen's School of Business

         "formattedRelativeTime": "5 days ago", 

         "city": "Lillooet", 

         "date": "Thu, 24 Jul 2014 20:21:52 GMT", 

         "formattedLocationFull": "Lillooet, BC", 

         "url": "http://ca.indeed.com/viewjob?jk=7779e5fbf4d0613f&qd=cvKKr6L_4R6jh64NGGBfipMcUh0i4g5C-X18qE0gAzC3Ws-qTrT0d3CswmkqzrsGxdgmiLA9Fpf3adh66N9NEAN9-HvuJGR2pUApIXI2XAs&indpubnum=1243433210984925&atk=18u2anmkg0mqi68p", 

         "jobtitle": "Executive Assistant", 

         "company": "Xaxli'p", 

         "onmousedown": "indeed_clk(this, '834');", 

         "snippet": "The Executive Assistant is responsible for providing administrative and secretarial services and support to the Chief and Council and the Band Administrator... ", 

         "source": "WorkBC", 

         "state": "BC", 

         "sponsored": false, 

         "country": "CA", 

         "formattedLocation": "Lillooet, BC", 

         "jobkey": "7779e5fbf4d0613f", 

         "expired": false, 

         "indeedApply": false

Page 20: Final Presentation V1.8

20

Appendix 6: getPostedDateSkill

2014 Aug 16 Team Grant for Queen's School of Business

import json

import pymongo

client = pymongo.MongoClient()

db = client.jobengine

# Query to get only the skills and posted_date fields

postings=db.posting.find({},{"posted_date":1, "skills":1, "_id":0});

# To iterate over each posting

for posting in postings:

#Continue processing only if the skills field is not empty

if posting['skills'] != "":

skills=posting['skills']

#If the skills fields is a list, it will iterate over each element and print the date and the skill,

#Otherwise it will just print the date and the content of the skills field

if isinstance(skills,list):

for skill in skills:

print "%s,%s" % (posting['posted_date'],skill.replace(',','').lower())

else:

print "%s,%s" % (posting['posted_date'],skills.replace(',','').lower())

Page 21: Final Presentation V1.8

21

Appendix 7: getSkillsCount

2014 Aug 16 Team Grant for Queen's School of Business

from mrjob.job import MRJob

class skillsCount(MRJob):

def mapper(self, _, value):

date, skill = value.split(",")

yield skill, 1

def reducer(self, key, values):

yield sum(values), key

if __name__ == '__main__':

skillsCount.run()

Page 22: Final Presentation V1.8

22

AttributionsText for Big Data graphic:

http://www.bigdata-startups.com/job-descriptions/

Big Data graphic: http://www.wordle.net/

2014 Aug 16 Team Grant for Queen's School of Business