crawling, parsing and semantic matching of vacancies and cv’s semantic recruitment technology...
TRANSCRIPT
Crawling, Parsing and Semantic Matching of Vacancies and CV’s
Semantic Recruitment Technology
Jakub Zavrel, TextkernelInGRID Workshop 11-2-2014
Textkernel: • Spinoff from R&D in machine learning and language
technology
• Founded 2001, offices in Amsterdam (HQ), Frankfurt, Paris, 45 employees; strong R&D focus
• Deloitte Fast 50 2007, 2010, 30% YoY growth
• Core technology: Understanding unstructured text data. Multi-lingual
Market:
• Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large Employers
• Products:
• Multi-lingual tools (15 languages) to extract CVs and jobs
• Jobfeed: largest real time DB for job market analysis
• Search! & Match! to connect people and jobs
• Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone, XING, SAP, Unisys, Bosch, Axa, Philips, etc. (350 direct, 2000+ indirect),
• Large partner network (HR & recruitment software)
I like programming, but I’m interested do take on more project management responsibility
Is there a job in our organisation that better fits my degree?
I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.
I’d like to do more with my organisational talent.
We are looking to hire:An experienced tech team team lead
Language gap
The ideal candidate has:- min. 5yr of experience- Certfied scrummaster- Exp. w/iOS, Android
Completed academic studies Computer Science or related
30% travel for customer presentations
The Job ad searches directly in a database and identifies relevant candidates (or vice
versa) …
Automatically convert each document into a complete record
Extract! CV/Job Parsing
Extract!
Extract!
Extract!
Extract!
Extract! – Zero data entry job application
Extract!
• Time savings coding CVs and Jobs• If you accept noise, 100% time savings• Structured data allows better search:
Semantic Searching and Matching• Coding enables reporting and statistics
Extract!
• Coding follows Extraction• Customer specific or standard taxonomies• String similarity based normalization• Lot of synonyms per language• Distance = confidences • Problem cases: ambiguity, context, long tail• More complex models can help
(classifiers, multi-variate models)• Semantic matching better (occupation coding errors are
counterbalanced by other variables)
Occupation coding!
• Semantic search:
„Lets you find what you mean not what you type“
Impression...
Search!
Match!
Match!
Semantic Matching Technology:
• Natural Language Processing
• Machine Learning
• Semantic Analysis
• Probabilistic Language Model
• Search Engine
• Multi-lingual taxonomies
• Recruitment knowledge-bases
Demo
Search and analyse real-time online job ads as well as historical
data
Jobfeed
Jobfeed
Jobfeed!
Knowledge of all demand for labour in European job market
– Sales leads for recruitment and staffing companies– Real time labour market analytics tools– Largest database of jobs for matching unemployed– Perfect data source for text mining
Jobfeed!• Real time collection of online job ads from any
(unstructured) source
• Available in NL, DE, FR, IT• Gradually rolling out in rest of Europe• Richly semantically structured data
Jobfeed!
Jobfeed: Multilingual Occupation Taxonomy
Occupations >4000 codes4 languages3 layer hierarchy
>50K synonyms
Link to other concepts:- Skills- Education level- Sector- O*NET- UWV (Dutch Employment Agency)- ROME
Based on millions of jobs, years of customer feedback and experience!
Example: NL: administratief medewerker, EN: administrative assistant, FR: employé administratif, DE: Verwaltungsassistent (m/w).
Group: administrative personnelClass: Administration and Customer ServiceSynonyms: administrative employee, assistant clerk, office support
Skills: ms office, excel, english language, etc
O*NET: 43-9199.00: Office and Administrative Support Workers, All OtherUWV: 1000402563: Administratief medewerker secretariaat
Demo
Jobfeed as material for Research
Frequent words for "Java developer"
en
van
de
een
je
met
in
het
Java
of
Je
op
is
voor
te
ervaring
aan
als
and
software
omteamzijnkennisbijErvaringdiethenaara
jaarjijbentDeveloperHBOhebttowerken
werk
Frequent words for all professions
en
van
de
een
in
het
je
met
op
Je
voor
te
is
of
zijn
aan
bent
naar
bij
om
alservaringdieHethebtdezewerkenzoekDewij
functieonzebentotoverwerkopleidinguitandwerkzaamheden
datbinnenuAlsVoorzelfstandigkennisooksverantwoordelijk
Solution: contrast frequencies
• Observed frequency of w: • O(w) = A• Expected frequency of w: • E(w) = C * B / D• Pick words with highest
score:• score(w) = (O - E)2 / E
Java develo
per jobs
Alljobs
# jobs where
w occurs
A B
Total # jobs C D
Top words for "Java developer"
java
developer
software
spring
scrum
agile
hibernate
ontwikkelaar
u
j2ee
developmentmavenapplicatieservaringwebdeframeworksjbossmbosenior
wijxmljeeojavascriptyoukennisontwikkelenoracleontwikkeling
architectuurwebservicesinformaticawerkzaamhedentechnologiedeveloperseclipsebezithetteam
worijbewijstechniekentomcatthevcazelfstandigarchitectwerklocatiehtml
Building rich skills profiles for thousands of occupations from millions of real time jobs…
… new trends and occupations…
Supply & Demand
• Have: lots of data, technology, ideas
• Want: labor market expertise, students, research
Semantic Recruitment Technology
Thanks!