journey to auto model training at scale nlp text
TRANSCRIPT
NLP text recommender system: Journey to auto model training at scale
Aditya Sakhuja
Engineering Lead, Salesforce Einstein@sakhuja
Agenda● Goal● Scenario● Approach & Metrics● ML System Architecture
○ Recs Serving○ Feature Engineering○ Model Training
● ML System Evolution● Training CI, Deployments & Rollbacks● Cloud Native● Challenges & Takeaways
Customer Service Agent Assist
Goal
Agents rely on traditional search results for finding relevant answers to often long and time sensitive customer questions.
Scenario
Approach
Business Metrics● Agent Time to Resolution● Agent Time spent per case● Case-Article Attach Rate
● # of recommendations served● MAO, MAU● Serving Latency
ML System Architecture
Recommendations ServingLayer 1 : Candidate Generation● NLP : Extract POS, NER, Noun and key terms from user query● IR specific Query Formulation ● Candidates Generated
Layer 2 : Ranking Model● <question, article> pairwise feature generation ● Candidates evaluated by model● Candidates above the threshold are recommended
Recommendations Serving
Data Prep & Feature Engineering
● Multi tenant data ingestion pipeline● Data Cleansing and Sanity checks● Precompute TDF, Corpus Statistics● Feature Vectors computation● 100+ of NLP features across different statistical feature categories● Serving Training Drift
Model Training● Ranking Model● Auto tuned hyperparams● Auto Model comparison● Metrics
○ AUC○ F-Measure○ Precision, Recall○ Hit Rate @K
Model Auto Training Pipeline
ML System Evolutionversion 0● Heuristic based answer recommendations POC. First pilot sign up.● Communities use case: community selected bestAnswer, as positive label.● Generic model trained on open source dataset Stanford SQuAD
version 1● Ranking model : <question, answer> pairwise probability● Notebooks based on-demand training● Static configured data filtering
ML System Evolutionversion 2● Dynamically configured training dataset attributes● Model retraining ● Multilingual Support● Multitenant Auto-trained models● Observability● Trained Model Deployments & Rollbacks
Model Deployment, CI & Rollbacks
Cloud Native Training
Challenges● Data
○ Privacy and sharing compliances – GDPR, HIPAA, Accessibility○ Freshness / Hydration○ Handling encrypted data at rest and in motion○ Too sparse, not meeting thresholds ○ Too dense, training performance SLA not met
● Custom, non standard fields and datatypes● Building ML Infrastructure along the way● Training Serving Skew● Cold start problem
Takeaways● Start small, Ship and Iterate● Prioritize ML infrastructure● Start with simple interpretable models● Scale model learning to the size of your data● Prioritize Observability● Prioritize Data privacy over model quality
Thank you!
Feedback
Your feedback is important to us.
Don’t forget to rateand review the sessions.