machine learning and data science develop innovative
TRANSCRIPT
![Page 1: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/1.jpg)
We help industry to design and develop innovative solutions leveraging
Machine Learning and Data Science
Expert ou machine ? Les deux !
![Page 2: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/2.jpg)
Who we are
Guillaume Forcade
CEO & Co-founder
Jean-Baptiste Pautrizel
PhD Physics & Co-founder
Adrien Todeschini
PhD Stat ML & Chief Data scientist
Kévin Baudin
MSc. Stat Finance & Data scientist
Wassek Al Chahid
Full stack developer
![Page 3: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/3.jpg)
What we do
Credit scoring
Recommender systems
Betting fraud detection
Traffic prediction
Advertising targeting
Genomics meta-analysis
Buy/Sell signals
Algo trading
Technical/fundamental analysis
Commodities
GlobalWineScore.com
Noodle.vote
NIPS / ICML
Data Science
Consulting
Quant Finance
ResearchThe Lab
![Page 4: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/4.jpg)
What we useData Engineering
Data collection and storage
Data transformation
Data visualization
Machine Learning and Statistics
Supervised learning: classifiers, regressorsUnsupervised learning: dimensionality reduction, clusteringTrees and ensemble methodsTopic modelingLatent factor modelsMatrix factorizationBayesian inferenceProbabilistic programmingNon parametric statisticsMissing data imputationDecision theoryMonte-Carlo simulationScore aggregation
Programming
Languages and tools
Web development
Cloud computing
![Page 5: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/5.jpg)
GlobalWineScore.com
![Page 6: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/6.jpg)
![Page 7: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/7.jpg)
Genomics Meta-analysis
![Page 8: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/8.jpg)
WineRec
![Page 9: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/9.jpg)
Nips17.ml
![Page 10: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/10.jpg)
Noodle.vote
![Page 11: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/11.jpg)
Noodle.vote
![Page 12: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/12.jpg)
Market predictions
![Page 13: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/13.jpg)
Algo Trading
![Page 14: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/14.jpg)
Preacor.fr
![Page 15: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/15.jpg)
Preacor.fr
![Page 16: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/16.jpg)
Preacor.fr
![Page 17: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/17.jpg)
![Page 18: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/18.jpg)
Data ~8 ans d’historique, ~35000 dossiers, ~70 colonnes
Dossiernumero_dossierdate_creationetape_dossier
nb_emp
Revenusrevenu_allocrevenu_mensuel
catprotype_contratemployeur
anciennete_job
Créditsnb_prets_immocrd_prets_immomensu_prets_immonb_prets_consocrd_prets_consomensu_prets_conso
Chargescharge_pensioncharge_impotscharge_loyer
charge_garde_enfantscharge_autre
charge_autre_nature
(Co)Emprunteurcp, villenaissance
isproprio, isloc, ishebergesitu
nbenfantsbanque
anciennete_banque
Patrimoinemnt_epargnenb_biens
valeur_biensrevenu_biens
Projetcp_bien_projet
ville_bien_projetsurface_bien_projetterrain_bien_projet
usage_projetisprimoaccedant
Financementmnt_terrainmnt_logementmnt_travaux
mnt_frais_agencemnt_frais_notairehonos_courtiermnt_apportmnt_financebanque_pret
nb_prets_differe_partielduree_mois_financement
refus_motif
![Page 19: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/19.jpg)
Variables absentes
● Type profession libérale● Comportement bancaire : solde moyen, découverts● Taille de l’entreprise● Liasses fiscales entrepreneurs● Faire appel à un constructeur● Assurance dommage ouvrage
![Page 20: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/20.jpg)
Variable à prédire
![Page 21: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/21.jpg)
● type_op = Achat ancien avec travaux, Achat ancien sans travaux, Achat neuf clés en main, Achat neuf en VEFA, Achat terrain + construction
● refus_motif = Fonctionnement des comptes, Taux endettement, Reste à vivre, Hors critères, Prime trop élevée● On obtient ~13000 dossiers, ~8% de refus (Déséquilibre)
Filtrage des cas pertinents
![Page 22: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/22.jpg)
Pipeline
Import
Clean
Transform Explore
Impute
Model
Evaluate
![Page 23: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/23.jpg)
Nettoyage
![Page 24: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/24.jpg)
Transformation
Feature engineering
● Pourcentage d’endettement● Reste à vivre par personne● Valeur patrimoine net● Revenu annuel net total● Pourcentage des travaux sur prix du bien● Pourcentage d’apport sur coût du projet● Epargne après opération sur coût du projet
![Page 25: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/25.jpg)
Imputation
![Page 26: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/26.jpg)
Problèmes rencontrés
● Dossiers non pertinents → écartés
● Données aberrantes → seuillées
● Données non typées, mal encodées → parsing
● Données manquantes → imputation
● Classes déséquilibrées → oversampling
● Variables absentes ou sous-représentées → modèle bayésien
![Page 27: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/27.jpg)
Inférence bayésienne
![Page 28: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/28.jpg)
Régression logistique bayésienneVraisemblance
A priori
A posteriori
![Page 29: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/29.jpg)
JAGS codemodel {
# likelihood for (i in 1:n) { y[i] ~ dbern(prob[i])
logit(prob[i]) <- ( beta_0 + inprod(X[i,], beta) + inprod(X_coemp[i,], beta[1:p_coemp] ) / scale[i] }
# prior beta_0 ~ dnorm(0, 1/sigma^2)
for (k in 1:n_var) { for (j in from[v]:to[v]) { beta[j] ~ dnorm(mu[j], 1/(sigma_var[k])^2) } }
}
![Page 30: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/30.jpg)
Coefficients
![Page 31: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/31.jpg)
Coefficients
![Page 32: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/32.jpg)
Coefficients
![Page 33: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/33.jpg)
Coefficients
![Page 34: Machine Learning and Data Science develop innovative](https://reader030.vdocuments.net/reader030/viewer/2022012800/61b5b2b02f852b34b67bf939/html5/thumbnails/34.jpg)
Comparaison