![Page 1: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/1.jpg)
1
Auto Content Moderation in C2C e-Commerce
Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango(Mercari, inc)
2020 USENIX Conference on Operational Machine Learning JULY 27–AUGUST 7, 2020
![Page 2: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/2.jpg)
2
1. Content Moderation
2. Auto Content Moderation in C2C e-Commerce
3. Task design and model strategy
4. Offline/online evaluation
5. System architecture
6. Business Impact
Contents
![Page 3: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/3.jpg)
3
Identify potentially unsafe or inappropriate content in service
● App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale
● YouTube Community Guidelines enforcement● AI advances to better detect hate speech by Facebook● Advances in content understanding, self-supervision to protect people by Facebook● Facebook Transparency Report● A Safe and Secure Marketplace by Mercari● etc.
Content Moderation
![Page 4: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/4.jpg)
4
The Mercari app is a C2C marketplace where individuals can easily sell used items
What is Mercari?
Japan
U.S.
Monthly active users: 16+ Million
Total number of items: 1.5+ Billion
![Page 5: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/5.jpg)
5
Why Content Moderation in C2C e-Commerce?
C2C e-Commerce
Sellers Buyers
We want to decrease risk for customer and marketplace
Sellers unintentionally violate policy. Buyers buy violated items without knowing
Policy case: counterfeits, weapons, etc.
![Page 6: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/6.jpg)
6
Content Moderation system
C2C e-Commerce
Sell items Discover
ModeratorManual review
ModerationService
Hide items & Alert
marketplace
Sellers Buyersscreened
![Page 7: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/7.jpg)
7
Concept of Moderation Service: Rule based
ModerationService
Rule basedPros● Easy to develop and can be
quickly released to production
Cons● Hard to manage● Difficult to cover the
inconsistencies in spellingse.g. {NIKE, nike, ないき, ナイキ}
ModeratorManual review
![Page 8: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/8.jpg)
8
Concept of Moderation Service: ML
ModerationService
Rule basedPros● Automatically learns the features
of items deleted by moderators● Adapts to spelling inconsistencies
Cons● Model update is hard● Concept drift
(a.k.a. training-serving skew)
ModeratorManual review
Machine Learning
![Page 9: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/9.jpg)
9
How to create the data for ML
Rule basedModerator
Machine Learning
Sell items
Report itemsHide items & Alert
PositiveDeleted items by Moderator
NegativeNot deleted items by Moderator
Dataset
Moderation Service
Review
![Page 10: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/10.jpg)
10
Task Design
● Data is highly imbalanced● Each violated topic’s total
number of alerts is bounded by moderator team
All models trained as one-vs-all ● No side-effect when deploying
a trained model to other class● Hard to improve performance
for each topic in a multi-class model
Negative Violated Topic A
Violated Topic N
...
Positive
ModelA
ModelB
...
counterfeits weapons
![Page 11: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/11.jpg)
11
Multimodality of content
Case of items
Items have multimodal data● Image● Text● Category● Brand● Price, etc.
We use multimodal model to improve model performance.See our article: https://tech.mercari.com/entry/2019/09/12/130000
![Page 12: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/12.jpg)
12
Model selection based on dataset size
● Gradient Boosted Decision Trees (GBDT)
→ Efficient for training and inference when training data size is not large
*Image feature is not used in GBDT
● Gated Multimodal Unit (GMU)
→ Potentially most accurate using multimodal data
![Page 13: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/13.jpg)
13
Offline evaluation
Metric is Precision@K: K is the bound on the daily total number of alerts
in each violated topic decided by Moderators
2020-07-13
Current model’s prediction result In production
Top K
Evaluate new model against current model using the same item ids
item ids same as production top K
2020-07-13
New model’s prediction result In test dataset.
e.g.
![Page 14: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/14.jpg)
14
Online evaluation
→ Faster decision making leads to efficient operation
Current Model
New Model
Same trafficModerator
Manual review
Each model alert number: K/2Metrics: Precision@K/2
After a certain time after a new model is released, we decide which model should be deprecated based on the above metrics.
Classic A/B testing can take several months. It was difficult to collect enough transactions for t-test.
![Page 15: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/15.jpg)
15
Offline/online evaluation result
Algorithms Offline Online
GBDT +18.2% Not Released
GMU +21.2% +23.2%
Table shows the relative performance gain of offline evaluation metric is precision@K ,online evaluation metric is precision@K/2
on one violated topic
Baseline model is Logistic regression that was already released in production
![Page 16: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/16.jpg)
16
Container based Training Pipeline
Data Load
Write manifest files containing requirements like CPU, GPU and Storage
CPU CPU or GPU
Training OfflineEvaluation
CPU
BigQueryBigQuery
![Page 17: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/17.jpg)
17
Serving system architecture
Message queue
Message queue
proxy layer prediction layer
.
Preprocessing + inferenceContainer
PodGBDT
based model
Preprocessing Container
..Proxycontainer
subscribe
publish
Pod
Inference Container
Caffe2
PodDeep Learningbased model
We manage over 15 Machine Learning models in production
PodDeep Learningbased model
![Page 18: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/18.jpg)
18
Horizontal Pod Autoscaler by kubernetes
● Reliable system: Traffic changes with time, HPA can adopt to varying traffic
● Cheaper billing cost: Reduce to 1/6 by HPA
Billing cost transition after applying HPA
Billing cost
day
Each color is each machine learning model
![Page 19: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/19.jpg)
19
Impact of Machine Learning system
Discovered 100 violating items
ModeratorManual review
ModerationService
Rule based
Machine Learning
Hide & Alert
+Discovered 554 violating items
Machine Learning system
has increased coverage by 554% ↑ over rule based approache.g.
![Page 20: Auto Content Moderation in C2C e-Commerce (Mercari, inc) … · 2020. 7. 18. · in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference](https://reader035.vdocuments.net/reader035/viewer/2022081621/612677cc7244e22780525e4b/html5/thumbnails/20.jpg)
20
If you have a question to this talk
First author is Shunya UETA, please e-mail: [email protected]
Acknowledgements
Co-Authors: Suganprabu Nagarajan, Mizuki Sango
Contributter:
● Abhishek Vilas Munagekar, Yusuke Shido, Vamshi Teja Racha, Sumit Verma and Keisuke Umezawa for their contribute to this system
● Dr. Antony for his feedback about the paper● Yushi Kurita, Yuki Ito as Product Manager, All Trust and Safety project member and
all Customer Service as Moderator to success this project.
Question and Thanks collaborator