rnn & nlp application - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/nlp/rnn_nlp.pdf · long...
TRANSCRIPT
RNN & NLP Application
강원대학교 IT대학
이창기
차례
• RNN
• NLP application
Recurrent Neural Network
• “Recurrent” property dynamical system over time
Bidirectional RNN
• Exploit future context as well as past
Long Short-Term Memory RNN• Vanishing Gradient Problem for RNN
• LSTM can preserve gradient information
LSTM Block Architecture
Gated Recurrent Unit (GRU)
• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟
• 𝑧𝑡 = 𝜎 𝑊𝑥𝑥𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧
• ℎ𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ
• ℎ𝑡 = 𝑧𝑡ℎ𝑡 + 1 − 𝑧𝑡 ℎ𝑡
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
차례
• RNN
• NLP application
Sequence Labeling – RNN, LSTM
Word embedding
Featureembedding
FFNN(or CNN), CNN+CRF (SENNA)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
x(t-1) x(t ) x(t+1)
y(t+1)y(t-1) y(t )
RNN, CRF Recurrent CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
x(t-1) x(t ) x(t+1)
y(t+1)y(t-1) y(t )
C(t)x(t ) h(t )
i (t )
f (t )
o(t )
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
LSTM RNN + CRF LSTM-CRF (KCC 15)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
LSTM-CRF
• 𝑖𝑡 = 𝜎 𝑊𝑥𝑖𝑥𝑡 + 𝑊ℎ𝑖ℎ𝑡−1 + 𝑊𝑐𝑖𝑐𝑡−1 + 𝑏𝑖
• 𝑓𝑡 = 𝜎 𝑊𝑥𝑓𝑥𝑡 + 𝑊ℎ𝑓ℎ𝑡−1 + 𝑊𝑐𝑓𝑐𝑡−1 + 𝑏𝑓
• 𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖𝑡 ⊙ tanh 𝑊𝑥𝑐𝑥𝑡 + 𝑊ℎ𝑐ℎ𝑡−1 + 𝑏𝑐
• 𝑜𝑡 = 𝜎 𝑊𝑥𝑜𝑥𝑡 + 𝑊ℎ𝑜ℎ𝑡−1 + 𝑊𝑐𝑜𝑐𝑡 + 𝑏𝑜
• ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑐𝑡)
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦
• 𝑠 𝐱, 𝐲 = 𝑡=1𝑇 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡
• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log 𝐲′ exp(𝑠(𝐱, 𝐲′))
GRU+CRF
• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟
• 𝑧𝑡 = 𝜎 𝑊𝑥𝑧𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧
• ℎ𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ
• ℎ𝑡 = 𝑧𝑡 ⊙ ℎ𝑡−1 + 𝟏 − 𝑧𝑡 ⊙ ℎ𝑡
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦
• 𝑠 𝐱, 𝐲 = 𝑡=1𝑇 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡
• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log 𝐲′ exp(𝑠(𝐱, 𝐲′))
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
Bi-LSTM CRF
• Bidirectional LSTM+CRF
• Bidirectional GRU+CRF
• Stacked Bi-LSTM+CRF …
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
bh(t-1) bh(t ) bh(t+1)
Stacked LSTM CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
bh(t-1) bh(t ) bh(t+1)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
h2(t-1) h2(t ) h2(t+1)
LSTM CRF with Context words= CNN + LSTM CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1)y(t-1) y(t )
x(t-2) x(t+2)
• Bi-LSTM CRF =~ LSTM CRF with Context > LSTM CRF
Neural Architectures for NER(Arxiv16)
• LSTM-CRF model + Char-based Word Representation– Char: Bi-LSTM RNN
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)
• LSTM-CRF model + Char-level Representation– Char: CNN
NER with Bidirectional LSTM-CNNs(Arxiv16)
LSTM RNN 기반 한국어 감성분석
• LSTM RNN-based encoding– Sentence embedding 입력
– Fully connected NN 출력
– GRU encoding 도 유사함
x(1) x(2 ) x(t)
h(1) h(2 ) h(t)
y
Data set Model Accuracy
MobileTrain: 4543Test: 500
SVM (word feature) 85.58
CNN(relu,kernel3,hid50)+Word embedding(word feature)
91.20
GRU encoding + Fully connected NN 91.12
LSTM RNN encoding + Fully connected NN 90.93
Neural Machine Translation
T|S 777항공편 은 3시간 동안 지상 에 있 겠 습니다 . </s>
flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0
777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0
is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0
on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0
the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0
ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0
for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0
three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0
hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0
. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0
</s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3
Recurrent NN Encoder–Decoderfor Statistical Machine Translation (EMNLP14)
GRU RNN EncodingGRU RNN DecodingVocab: 15,000 (src, tgt)
Sequence to Sequence Learning with Neural Networks (NIPS14 – Google)
Source Voc.: 160,000Target Voc.: 80,000Deep LSTMs with 4 layersTrain: 7.5 epochs (12M sentences, 10 days with 8-GPU machine)
Neural MT by Jointly Learning to Align and Translate (ICLR15)
GRU RNN + Alignment EncodingGRU RNN DecodingVocab: 30,000 (src, tgt)Train: 5 days
Abstractive Text Summarization (한글 및 한국어 16)
로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.
RNN_search+input_feeding+CopyNet
End-to-End 한국어 형태소 분석(동계학술대회16)
Attention + Input-feeding + Copying mechanism
Sequence-to-sequence 기반 한국어 구구조 구문 분석(한글 및 한국어 16)
입력 예시 1 43/SN 국/NNG <sp> 참가/NNG
입력 예시 2 4 3 <SN> 국 <NNG> <sp> 참 가 <NNG>
43/SN 국/NNG+ 참가/NNG
NP NP
NP
(NP (NP 43/SN + 국/NNG) (NP 참가/NNG))
GRU
GRU
GRU
GRU
GRU
GRU
x1
h1t-1
h2t-1
yt-1
h1t
h2t
yt
x2 xT
ct
Attention + Input-feeding
입력
선 생 <NNG> 님 <XSN> 의 <JKG> <sp> 이 야 기 <NNG> <sp> 끝 나 <VV> 자 <EC> <sp> 마 치 <VV> 는 <ETM> <sp>
종 <NNG> 이 <JKS> <sp> 울 리 <VV> 었 <EP> 다 <EF> . <SF>
정답(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S
(NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
RNN-search[7] (Beam size 10)
(S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
RNN-search + Input-feeding +
Dropout (Beam size 10)
(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
Sequence-to-sequence 기반 한국어 구구조 구문 분석(한글 및 한국어 16)
모델 F1
스탠포드 구문분석기[13] 74.65
버클리 구문분석기[13] 78.74
형태소 + <sp> RNN-search[7] (Beam size 10)
87.34(baseline)
87.65*(+0.31)
형태소의 음절 + 품사태그 + <sp>
RNN-search[7] (Beam size 10)87.69(+0.35)
88.00*(+0.66)
RNN-search + Input-feeding (Beam size 10)88.23(+0.89)
88.68*(+1.34)
RNN-search + Input-feeding + Dropout (Beam size 10)
88.78(+1.44)
89.03*(+1.69)
Neural Responding Machine for Short-Text Conversation (ACL 15)
Neural Responding Machine – cont’d
실험 결과 (ACL 15)
Short-Text Conversation(동계학술대회16)
- Data: 클리앙 ‘아무거나 질문 게시판’- 77,346 질문-응답 쌍- 학습:개발:평가 = 8:1:1
이미지 캡션 생성 소개
• 이미지 내용 이해 이미지 내용을 설명하는 캡션자동 생성– 이미지 인식(이해) 기술 + 자연어처리(생성) 기술
• 활용 분야– 이미지 검색
– 맹인들을 위한 사진 설명, 네비게이션
– 유아 교육, …
기존 연구• Multimodal RNN (M-RNN) [2]
Baidu CNN + vanilla RNN
CNN: VGGNet
• Neural Image Caption generator (NIC) [4] Google CNN + LSTM RNN
CNN: GoogLeNet
• Deep Visual-Semantic alignments (DeepVS) [5] Stanford University RCNN + Bi-RNN
alignment (training) CNN + vanilla RNN
CNN: AlexNet
AlexNet, VGGNet
RNN을 이용한 이미지 캡션 생성(동계학술대회 15)
• CNN + RNN
– CNN: VGGNet 15 번째 layer (4096 차원)
– RNN: GRU (LSTM RNN의 변형)• Hidden layer unit: 500, 1000 (Best)
• Multimodal layer unit: 500, 1000 (Best)
– Word embedding• SENNA: 50차원 (Best)
• Word2Vec: 300 차원
– Data set• Flickr 8K : 8000 이미지 * 이미지 캡션 5문장
– 6000 학습, 1000 검증, 1000 평가
• Flickr 30K : 31783 이미지 * 이미지 캡션 5문장
– 29000 학습, 1014 검증, 1000 평가
– 4가지 모델 실험• GRU-DO1, GRU-DO2, GRU-DO3, GRU-DO4
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
• GRU-DO1 • GRU-DO2
• GRU-DO3 • GRU-DO4
RNN을 이용한 이미지 캡션 생성(동계학술대회 15)
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
Flickr 30K B-1 B-2 B-3 B-4
m-RNN (Baidu)[2] 60.0 41.2 27.8 18.7
DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7
NIC (Google)[4] 66.3 42.3 27.7 18.3
Ours-GRU-DO1 63.01 43.60 29.74 20.14
Ours-GRU-DO2 63.24 44.25 30.45 20.58
Ours-GRU-DO3 62.19 43.23 29.50 19.91
Ours-GRU-DO4 63.03 43.94 30.13 20.21
Flickr 8K B-1 B-2 B-3 B-4
m-RNN (Baidu)[2] 56.5 38.6 25.6 17.0
DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0
NIC (Google)[4] 63.0 41.0 27.0 -
Ours-GRU-DO1 63.12 44.27 29.82 19.34
Ours-GRU-DO2 61.89 43.86 29.99 19.85
Ours-GRU-DO3 62.63 44.16 30.03 19.83
Ours-GRU-DO4 63.14 45.14 31.09 20.94
Flickr30k 실험 결과
A black and white dog is jumping in the grass
A group of people in the snow
Two men are working on a roof
신규 데이터
A large clock tower infront of a building
A man in a field throwing a frisbee
A little boy holding a white frisbee
A man and a woman are playing with a sheep
한국어 이미지 캡션 생성
한 어린 소녀가풀로 덮인 들판에서 있다
건물 앞에 서 있는한 남자
분홍색 개를 데리고 있는 한 여자와 한 여자
구명조끼를 입은한 작은 소녀가 웃고 있다
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
Residual Network + 한국어 이미지 캡션 생성(동계학술대회16)