bigdata and ai in p2 p industry: knowledge graph and inference
TRANSCRIPT
![Page 2: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/2.jpg)
Puhui Finance (www.puhuifinance.com)
Services
爱钱进
普惠信贷
创新资产
普惠财富
• Internet Financing P2P
company, headquarters
in Beijing
• Founded in July 2013
• $50M series A funding in
Dec 2014
• ~5500 employees, 100+
offline stores
Offline Financing
Service
Online Financing
Service
Online Lending
Service
Offline Lending
Service
![Page 3: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/3.jpg)
Puhui Finance (cont.)
Fastest growing p2p
company. Big data
technology is the key
![Page 4: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/4.jpg)
In this talk, I will mainly focus on the
techniques used in lending side risk control.
Similar techniques can be applied to the
financing side.
What the talk is about
![Page 5: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/5.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 6: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/6.jpg)
• Credit system is not mature in China
• Targeting at under-served market, those who don’t have enough credit to borrow from bank
• The data solely from credit history is not enough to build the scoring models
• More efficient application reviewing process is needed as we move more transactions from offline to online
Why big data & AI
![Page 7: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/7.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 8: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/8.jpg)
The central problem is
risk control
The solution is to
use big data
![Page 9: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/9.jpg)
Measure the risk for a person
Individual
Feature
Analysis
Relation
Analysis
?
Knowledge GraphFeature Compute(FC)
Engine
![Page 10: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/10.jpg)
• User explicitly input data (i.e. application form)
• Authorized* user data• Mobile History • Purchasing History• ……
• Open Search• Baidu.com• 360.com • Others (i.e. craigslist)
• 3rd- party data (i.e. blacklist)
Data
Unstructured Data
* User authorizes us to use their data
![Page 11: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/11.jpg)
Feature Compute Engine
The goal is to convert unstructured
data to structured features
![Page 12: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/12.jpg)
Feature Compute Engine
Credit Card
Mobile History
Purchasing
......
Precision Marketing
Fraud Score
Risk Score
Featu
re C
om
pu
te
En
gin
e
Feature Container
(tens of thousands)
Data
....
....
Data
Credit Card
History
Mobile
History
Purchasing
History
Feature Compute
EngineData
Scoring Model
![Page 13: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/13.jpg)
Purchasing History
i.e. Purchasing History
Total amount spent during the last 6 months
User level (i.e. Prime, Normal…)
Total number of transactions during the last 6 months
The length of time he/she uses the account
Total number of transactions related to virtual products
Total number of transactions related to luxury products
………
Few thousand
features
![Page 14: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/14.jpg)
• It is a semantic network
• Based on graph data structure, consists
of points and edges. Point represents
entity, edge represents relationship.
• Knowledge graph connects
heterogeneous information. It provides
the ability to analyze the data from the
perspective of relationship.
What is knowledge graph
![Page 15: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/15.jpg)
Some knowledge graphs
![Page 16: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/16.jpg)
Knowledge graph – search engine
![Page 17: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/17.jpg)
Knowledge graph – search engine
![Page 18: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/18.jpg)
Knowledge graph – recommendation [1]
![Page 19: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/19.jpg)
Storing Knowledge graphRanking DBMS
21 Neo4j (Graph
Database)
32 MarkLogic (XML)
42 Titan (Graph Database)
46 OrientDB (Graph
Database)
61 Virtuoso (RDF)
80 Jena (RDF)
88 Sesmae (RDF)
90 ArangoDB
(GraphDatabase)
120 AllegroGraph (RDF)
Trends for different types of database [2] Graph/RDF database ranking [3]
![Page 20: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/20.jpg)
• Logic-based approach
• Probabilistic approach (i.e. distributed representation)
• Hybrid approach
Key techniques for knowledge graph
Link Prediction
![Page 21: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/21.jpg)
Simple Approach: Pre-define some rules
i.e. (Peter FatherOf Tom) -> (Tom SonOf Peter)
(Peter ColleagueOf Tom), (Sarah ColleagueOf Peter)
-> (Peter ColleaugeOf Sarah)
Logic-based approach
![Page 22: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/22.jpg)
Methods based on distributed representation
• Translating Embedding [4]
• Tensor Factorization (RESCAL) Hybrid approach [5]
• Neural Tensor Network (NTN) [6]
![Page 23: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/23.jpg)
Hybrid Approach – Logic + Probabilistic
Simple Approach:
1. Generating all the new links using pre-define rules
2. Apply Statistical Learning
Advanced Approach (i.e.):
• Incorporation of Rules into Embeddings [7]
• Injecting Logical Background [8]
![Page 24: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/24.jpg)
Use Cases
![Page 25: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/25.jpg)
Connects person, phone, address, email, company……
Domain-specific knowledge graph
![Page 26: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/26.jpg)
10 types of entities
~50 types of relations
~50M entities
0.2B relations
We expect that it will become ~20 times bigger by the end of this year due to the business growth
Domain-specific knowledge graph
![Page 27: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/27.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 28: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/28.jpg)
Applicant shares the
same personal phone
with other applicant
Phone
ApplicantOther
applicant
Personal Phone Personal Phone
Antifraud - rules
![Page 29: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/29.jpg)
Applicant and other
applicant share the
same colleague phone,
but with different
company names
Phone
ApplicantOther
applicant
Colleague phone
Company 1 Company 2
Colleague phone
Antifraud – rules (cont.)
![Page 30: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/30.jpg)
Phone
Applicant
Personal phone
Phone
Phone
Phone
Phone
Phone
Overdue
Overdue
Some of the
applicant’s contacts
didn’t pay back the
loan on time
Antifraud – rules (cont.)
![Page 31: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/31.jpg)
Person 2
Person 1
Triangle relationship
Person 3
Antifraud – cycle detection
![Page 32: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/32.jpg)
Applicant Applicant 2
Parent of Parent of
Applicant 1
Spouse
Inconsistent relations
Antifraud – inconsistent relationship
![Page 33: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/33.jpg)
Antifraud – suspicious group
Person 2
Person 1
Person 3
Share a lot of
common attributes
![Page 34: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/34.jpg)
Knowledge Graph
Visualization • Visualize entities and
relationships
• Design anti-fraud rules
via observational study
Antifraud – design by observation
![Page 35: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/35.jpg)
Rapid change of
relationship structure
within short time period
Antifraud – evolution of graph structure
![Page 36: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/36.jpg)
LR
Decision Tree
Random Forest
SVM
ANN
Models Prediction
Extracted
Features from
Raw Data
Results from
anti-fraud
rules
User direct
attributes
Variables
DNN
Score is used to
directly reject or
accept the loan
Antifraud – fraud score
score
![Page 37: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/37.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 38: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/38.jpg)
The borrowers disappear, all the contact information they
explicitly provided become invalid. How to reach them?
Lost contact recovery – what is it
Implicitly infer potential contact information
![Page 39: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/39.jpg)
Phone
Applicant
Personal phone
Phone
Phone
Phone
Phone
Phone
Rank the phone numbers,
and predict relationship
Building phone network – 1st order extension
![Page 40: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/40.jpg)
Building phone network – 2nd order extension
Phone
Applicant
Personal phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Rank the phone
numbers, and
predict relationship
![Page 41: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/41.jpg)
3rd order ..
Phone
Applicant
Personal phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
Phone
![Page 42: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/42.jpg)
Simple Ranking Criteria
• The total length of time
• The frequency of calls
Advanced Approach
• Learning the ranking score using machine learning approach
Building phone network – Rank
![Page 43: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/43.jpg)
• Total # of times of calling
• Total length of time of
calling
• Total # of times of being
called
• Total # of times of calling
• Average time per call
• Maximum length of time
• # of times of calling
between 0-4am
• # of times of calling
between 4-8am
• ……
Building phone network – Predict the relation
LR
Decision Tree
Random Forest
SVM
ANN
ModelsPrediction of relation
~100 Features
DNN
Relation
With very limited
training data, our
model provides
~30% accuracy
![Page 44: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/44.jpg)
Person
Applicant
Personal phone
Person
Other
applicant
knows?
Other approach – Link prediction (on-going work)
Link Prediction
![Page 45: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/45.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 46: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/46.jpg)
Detect Bad People via Search
From the search results, we label each
entities in the knowledge graph i.e. black,
green etc.
![Page 47: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/47.jpg)
• Baidu.com
• 360.com
• other public websites
Search for basic information….
• Phone number
• Other IDs
Search Fields Search Engines & Public Site
![Page 48: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/48.jpg)
Search for phone number…
![Page 49: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/49.jpg)
Search for Email…
Fraud
![Page 50: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/50.jpg)
• Clustering analysis
• Precision marketing
• ……
Other Applications we are working on
![Page 51: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/51.jpg)
Outline
• Why need Big data and AI
• Intro to FC Engine and Knowledge Graph
• Case 1: Anti-Fraud
• Case 2: Lost Contact Recovery
• Case 3: Detect Bad People via Search
• More use cases
• Challenges
![Page 52: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/52.jpg)
Challenges : Unstructured Data
Unstructured
Data
Images
Text
AudioVideo
Machine Learning
Natural Language
Processing
Data Mining
![Page 53: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/53.jpg)
Challenges : Name Disambiguation
ApplicantOther
applicant
Puhui
Finance
Ltd.
Puhui
Finance
Same company, can
we merge?
It is a very important
problem to deal with!
![Page 54: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/54.jpg)
Challenges : Reasoning
However, It is still an open problem
• Logic-based approach
• Probabilistic approach (i.e. distributed representation)
• Hybrid approach
Link Prediction
![Page 55: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/55.jpg)
Challenges : Insufficient Samples
Big data, but small samples
![Page 56: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/56.jpg)
• Senior/Lead Machine Learning/NLP Engineers
• Senior/Lead Data Engineer/Scientist
• Senior/Lead Architect
• Senior/Lead Software Engineer
We are hiring! (in Beijing)
Open positions, but not limited to….
Contact
Company Website
www.puhuifinance.com
www.iqianjin.com
![Page 58: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/58.jpg)
[1] http://www.datapop.com/
[2] http://db-engines.com/en/blog_post//43
[3] http://db-engines.com/en/ranking
[4] Bordes, Antoine, et al. "Translating Embeddings for Modeling Multi-relational Data." Advances in Neural Information Processing Systems(2013):2787-2795.
[5] Nickel, Maximilian, V. Tresp, and H. P. Kriegel. "A Three-Way Model for Collective Learning on Multi-Relational Data.." International Conference on Machine Learning 2011:809-816.
References
![Page 59: Bigdata and ai in p2 p industry: Knowledge graph and inference](https://reader033.vdocuments.net/reader033/viewer/2022051709/587756601a28ab84388b75ad/html5/thumbnails/59.jpg)
[6] Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng. Reasoning With Neural Tensor Networks for Knowledge Base Completion. Advances in Neural Information Processing Systems(2013)
[7] Wang, Quan, Wang, Bin, and Guo, Li. "Knowledge base completion using embeddings and rules." Proceedings of the 24th International Conference on Artificial Intelligence AAAI Press, 2015.
[8] T Rocktäschel,S Singh,S Riedel. Injecting Logical Background Knowledge into Embeddings for Relation Extraction http://talks.cam.ac.uk/talk/index/58360
References