Download - AI based language learning tools
![Page 1: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/1.jpg)
Oct.28.2017
Ewa Szymanska, PhD
Head of Rakuten Institute of Technology Singapore
![Page 2: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/2.jpg)
2Source: https://unsplash.com/ by Element5 Digital
![Page 3: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/3.jpg)
3
I am watching shows in Chinese to get used to ‘actual’ spoken Mandarin, and not just what I see in my textbooks
“
” VIKI user
![Page 4: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/4.jpg)
4* Images from Rakuten VIKI, Rakuten TV
![Page 5: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/5.jpg)
5
1.8 billion people are learning foreign languages
Source: The Washington Post: https://www.washingtonpost.com/news/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts
Languages with most
native speakersMost commonly studied
foreign languages
![Page 6: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/6.jpg)
6
Online individual language learning market is growing at 12% CAGR
Source: Rosetta Stone Investor Day 2017
![Page 7: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/7.jpg)
7
I. Entertaining Content II. Global Users III. Technology
*Photo by Jakob Owens on Unsplash
![Page 8: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/8.jpg)
8
Interactive
subtitles
Video
dictionary Quizzes1 23
* Images from Rakuten VIKI
![Page 9: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/9.jpg)
9
Interactive subtitles1
Fast adoption
30,000 DAU
– daily active users
High engagement
Korean Learn Mode
users view 10% more
than Viki average
High satisfaction
83 NPS
– net promoter score
*cnet.com @ CBS Interactive Inc. Apr 13, 2017; Keia.org, Korean Economic Institute, Apr 2017; Forbes Oct 24, 2017; The Verge, Sep 28, 2017
![Page 10: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/10.jpg)
Shows availability
“Daughter
Back”
“Return of
Happiness”
“Ice and Fire
of Youth”
“My Love
from the Star”
“Boys Over
Flowers”
“Descendants
of the Sun”
Learn Chinese (Japan) Learn Korean (USA)
* Images from Rakuten VIKI
[ Learn Mode collection on viki.com ]
![Page 11: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/11.jpg)
11
• 60,000+ quizzes taken
• 35,000+ users completed the quiz
• Very positive social media engagement:
2 Drama Vocab Quiz [ languagequiz.viki.com ]
![Page 12: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/12.jpg)
12
3 Video-based Dictionary
Integrate with the classroom curriculum:
![Page 13: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/13.jpg)
13
“ If you talk to a manin a language he understands,that goes to his head.
If you talk to him in his language,
that goes to his heart. ”
- Nelson Mandela
![Page 14: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/14.jpg)
14
![Page 15: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/15.jpg)
Oct 28, 2017
Stanley Kok
Principal Research ScientistRakuten Institute of Technology (Singapore)
![Page 16: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/16.jpg)
you16
你 是 辣妹 , 也是 名门贵 族
你是辣妹,也是名门贵族
你 是 辣妹 , 也是 名门贵族are (a) hot chick and also (of) the gentry
Splitting a sentence into pieces, each preserving
its original semantics
you are (a) hot chick and also tribe
![Page 17: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/17.jpg)
17
努力的人才会成功
努力 的 人 才 会 成功only hardworking people will succeed
努力 的 人才 会 成功hardworking talent will succeed
![Page 18: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/18.jpg)
18
![Page 19: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/19.jpg)
Tokenization
19
Dictionary
Lookup
![Page 20: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/20.jpg)
20
Many open-source tokenizers available
Good, but not perfect
Different mistakes
Why not use more (or all) of them to improve
tokenization?
Strengths of one tokenizer overcomes
shortcomings of another
![Page 21: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/21.jpg)
21
How to quantify “goodness” of tokenization?
Take human learner’s perspective
#Dictionary look-ups needed to understand all tokens
Non-existent tokens assumed to need large #lookups (10)
你 是 辣妹 你 是 辣 妹 你 是辣 妹
hot
chickareyou
younger
sister
spicyareyou younger
sister?you
1 + 1 + 1 = 31 + 1 + 1 + 1 = 4
1 + 10 + 1 = 12
![Page 22: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/22.jpg)
22
Can do better than picking lowest cost
tokenization from tokenizers
Treat common tokens as “anchor points”
Pick best tokens from remaining ones
![Page 23: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/23.jpg)
23
你 是 辣妹 也是 名门贵 族
你 是辣 妹 也是 名门贵族
你 是 辣妹 也是 名门贵族
you are hot chick
and also tribe
youyounger
sisterand also (of) the gentry
(15)
(14)
(5)
![Page 24: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/24.jpg)
24
Dictionaries are important for language learning
Manual approach provides high-quality dictionary,
but not scalable
About 7000 languages in the world
About 49 million bilingual dictionaries
Thus need automatic approach
![Page 25: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/25.jpg)
25
Lots of online dictionaries available
Could we automatically learn new dictionaries
from them?
Focus on Chinese-English (C-E) & Korean-
English (K-E) bilingual dictionaries
![Page 26: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/26.jpg)
26
Lots of dictionaries online
Some are C-E and K-E, but many are not
Many dictionaries are C-X and X-E
Use language X as bridge/pivot
C-X + X-E => C-E, e.g.,
辣妹->fille sexy + fille sexy ->hot chick
=> 辣妹-> hot chick
![Page 27: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/27.jpg)
27
Take 2 hops for now
Chinese-English dictionary has 750K entries
90% correct
Korean-English dictionary has 100K entries
99% correct
![Page 28: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/28.jpg)
28
Learn bilingual dictionary using
Using seed lexicon
Monolingual data (plentiful)
Maps bi-lingual phrases to vector space
dolphin
海豚
东京Tokyo
Sushi
寿司
![Page 29: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/29.jpg)
29
![Page 30: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/30.jpg)
30
![Page 31: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/31.jpg)
31
Artifact of standard machine translation pipeline
Parallel sentences aligned word for word
Compute probability of mapping tokens of a
source language to those of a target language
A correct source token will be more
consistently aligned to its corresponding
target token(s)
Add high-probability mappings to dictionary
![Page 32: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/32.jpg)
32
Chinese English P(C|E) P(E|C) AveProb
辣妹 hot chick 0.8 0.9 0.85
是辣 is curry 0.1 0.1 0.1
![Page 33: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/33.jpg)
33
Chinese-English Dictionary
3 million Chinese tokens (Jan’17)
89% in dictionary
Korean-English Dictionary
4 million Korean tokens (Jan’17)
86% in dictionary
![Page 34: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/34.jpg)
34
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
#KoreanTokens vs. #Defintions
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#ChineseTokens vs. #Definitions
![Page 35: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/35.jpg)
35
Match parallel sentences to
Phrase table
Dictionary
![Page 36: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/36.jpg)
36
他放弃梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit, 0.83
放弃 abdicate 0.68
Phrase Table
![Page 37: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/37.jpg)
37
他放弃梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
Best Match
![Page 38: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/38.jpg)
他放弃梦想
He gave up his dreams
best match
38
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
best match
Chinese English
放弃 abandon
放弃 give up
放弃 abdicate
Dictionary
![Page 39: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/39.jpg)
Drama Vocabulary Quiz
Liling Tan
Rakuten Institute of Technology (Singapore)
28 Oct 2017 @ Rakuten Tech. Conference
![Page 40: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/40.jpg)
40
Overview• Introduction
•Demo
•How did We Create the Quiz?
![Page 41: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/41.jpg)
41
Introduction•Quizzes are fun and could be viral
•But manually creating quizzes is tedious
•We created #DramaVocabQuiz that generates new vocabulary quizzes automatically
![Page 42: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/42.jpg)
42
![Page 43: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/43.jpg)
43
![Page 44: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/44.jpg)
44
![Page 45: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/45.jpg)
45
![Page 46: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/46.jpg)
46
![Page 47: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/47.jpg)
47
![Page 48: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/48.jpg)
48
How do we Generate
Quizzes
Automatically?
![Page 49: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/49.jpg)
49
Korean Drama Word List
• The word 미남 [minam] “handsome guy” can be followed by multiple suffixes at once -이시라구요 [-issilaguyo] to form a single word meaning “someone said that he is handsome”.
• We only extract the root word 미남 [minam], and count it as a unique word type
![Page 50: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/50.jpg)
50
Korean Drama Word List
![Page 51: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/51.jpg)
51
Korean Drama Word List
![Page 52: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/52.jpg)
52
Korean Drama Word List
![Page 53: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/53.jpg)
53
Splitting Word List into
3 Difficulty Levels
↑
![Page 54: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/54.jpg)
54
Generate the Distractors
• Distractor 1: Select the top 5th to 20th closest words (cosine)
• Distractor 2: Use Distractor 1 as negative and question word as positive, select 1st to 20th closest word (cosmul)
References:
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.
• Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.
![Page 55: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/55.jpg)
55
Language Leaners Like Quizzes!!
• 60,000+ quizzes taken
• 35,000+ unique users completed quiz
• 16% of the users repeated quiz
![Page 56: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/56.jpg)
56
Word Frequency is a Good Indicator of Difficulty
10
8
6
4
2
0
Easy Medium Hard
Easy = Frequent words
Medium = Less Frequent
words
Hard = Least Frequent
words
![Page 57: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/57.jpg)
57
Conclusion
Watch Drama,Learn Language
Quiz: https://languagequiz.viki.com
Techblog: https://techblog.rakuten.co.jp/2017/05/26/lang-quiz/
![Page 58: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/58.jpg)
Oct.28.2017
Pang Zineng
Senior Technologist
Rakuten Institute of Technology Singapore
![Page 59: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/59.jpg)
59* Images from Rakuten VIKI
![Page 60: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/60.jpg)
60
clipspages
Web Search In-Video Search
* Images from Rakuten VIKI
![Page 61: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/61.jpg)
61
Web Search In-Video Search
•The meta data of the site
•The meta data of the page
•The word tokens in the page
•The topic of the page
•The originality of the page
•Hyperlinks (page rank)
• The meta data of the video
•The meta data of this clip
(timestamp, length, URI, etc.)
• The caption text of the clip
• The frames & audio signal
•Complexity of the sentence
•Diversity of the clips
site
identifier
page
identifier
content
ranking
search
relevancy
video
identifier
clip
identifier
search
relevancy
content
ranking
* Images from Rakuten VIKI
![Page 62: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/62.jpg)
62
Job:
• Make some data ready for consumption.
Questions:
• How does the data come?
• What needs to be done for it to be ready?
• How will the data be consumed?
database
Pre-
processing
function
Trigger /
monitor
function
Raw Data
Data access
function
FTP API
Data provider
Data consumer
![Page 63: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/63.jpg)
63
Job:
• Let outsider use a function.
Questions:
• How frequently will the function be used?
• What data does the function need?
Application
logic
API
Endpoint
Web Application
API Cache
Request
Queue
Application Cache
Internal/External Data
![Page 64: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/64.jpg)
64
Rakuten TV
video contents
Other
video contents
Rakuten VIKI
video contents
Search
function
3rd Party Platform
Motion Dictionary
* Images from Rakuten VIKI
![Page 65: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/65.jpg)
65
Japanese
Dictionary
Data
dictionary
function
voice
function
3rd party
solution
Korean
Dictionary
Data
Chinese
Dictionary
Data
3rd party
solutionopen source
framework
Interactive Subtitles
(version 2)
Interactive Subtitles
(version 3)
* Images from Rakuten VIKI
tokenization
function
Korean
Tokenization
Data
Chinese
Tokenization
Data
Japanese
Tokenization
Data
open source
frameworkopen source
framework
open source
framework
Korean
Tokenization
Data
Chinese
Tokenization
Data
In-house
solutionIn-house
solution
![Page 66: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/66.jpg)
66
Japanese
Dictionary
Data
dictionary
function
voice
function
3rd party
solution
Korean
Dictionary
Data
Chinese
Dictionary
Data
3rd party
solutionopen source
framework
Interactive Subtitles
(version 2)
Interactive Subtitles
(version 3)
* Images from Rakuten VIKI
tokenization
function
Japanese
Tokenization
Data
open source
framework
Global
Tokenization
Data
In-house
solution
Global
Dictionary
Data
In-house
solution
Korean
Tokenization
Data
Chinese
Tokenization
Data
In-house
solutionIn-house
solution
![Page 67: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/67.jpg)
67
Take
Quiz
function
Vocab Quiz
(version 1)
* Images from Rakuten VIKI
Chinese
Quiz Data
Korean
Quiz Data
![Page 68: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/68.jpg)
68
Chinese
Quiz Data
Take
Quiz
function
voice
functionVocab Quiz
(version 2)
* Images from Rakuten VIKI
Korean
Quiz Data
![Page 69: AI based language learning tools](https://reader031.vdocuments.net/reader031/viewer/2022022415/5a6479967f8b9a3b568b47ad/html5/thumbnails/69.jpg)
69
Fast iteration in R&D won’t be possible
if we had many things bundled or coupled.
-- PangVocab Quiz
• https://languagequiz.viki.com/
Learn Mode (PC/Mac only)
• https://www.viki.com/collections/316981l-learn-the-basics-chinese
• https://www.viki.com/collections/316939l-learn-the-basics-korean
Motion Dictionary
• TBD