supervised learning: artificial neural networkseugene/cs572/lectures/03302016-anns... ·...

59
Supervised Learning: Artificial Neural Networks Some slides adapted from Dan Klein et al. (Berkeley) and Percy Liang (Stanford)

Upload: others

Post on 24-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Supervised Learning: Artificial Neural Networks

Some slides adapted from Dan Klein et al. (Berkeley) and Percy Liang (Stanford)

Page 2: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Plan

• Project 4: http://www.mathcs.emory.edu/~eugene/cs325/p4/

• Review: Perceptron, MIRA

• New: Neural Networks (Supervised version)

– +Hidden Layer

– +Training algorithms (forward-, backword-propagation)

3/30/2016 CS325: Artificial Intelligence, Spring 2016 2

Page 3: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Single Neuron as Linear Classifier

• Inputs are feature values

• Each feature has a weight

• Sum is the activation

• If the activation is:– Positive, output +1

– Negative, output -1

f1f2f3

w

1w

2w

3

>0?

Page 4: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Weights

• Binary case: compare features to a weight vector

• Learning: figure out the weight vector from examples

# free : 2

YOUR_NAME : 0

MISSPELLED : 2

FROM_FRIEND : 0

...

# free : 4

YOUR_NAME :-1

MISSPELLED : 1

FROM_FRIEND :-3

...

# free : 0

YOUR_NAME : 1

MISSPELLED : 1

FROM_FRIEND : 1

...

Dot product

positive means the

positive class

Page 5: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Binary Decision Rule

• In the space of feature vectors

– Examples are points

– Any weight vector is a hyperplane

– One side corresponds to Y=+1

– Other corresponds to Y=-1

BIAS : -3

free : 4

money : 2

...0 1

0

1

2

free

mo

ney

+1 = SPAM

-1 = HAM

Page 6: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Learning: Binary Perceptron

• Start with weights = 0

• For each training instance:

– Classify with current weights

– If correct (i.e., y=y*), no change!

– If wrong: adjust the weight vector

Page 7: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Learning: Binary Perceptron

• Start with weights = 0

• For each training instance:

– Classify with current weights

– If correct (i.e., y=y*), no change!

– If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is -1.

Page 8: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Multiclass Decision Rule

• If we have multiple classes:– A weight vector for each class:

– Score (activation) of a class y:

– Prediction highest score wins

Binary = multiclass where the negative class has weight zero

Page 9: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Learning: Multiclass Perceptron

• Start with all weights = 0

• Pick up training examples one by one

• Predict with current weights

• If correct, no change!

• If wrong: lower score of wrong answer, raise score of right answer

Page 10: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

MIRA*: Fixing the Perceptron

• What about “outliers” – correct or incorrect data points with abnormal feature values?

• MIRA: choose an update size that fixes the current mistake…

• … but, minimizes the change to w

• The +1 helps to generalize

* Margin Infused Relaxed Algorithm

wtf?

Page 11: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Minimum Correcting Update

min not =0, or would not have made an error, so min will be where equality holds

Page 12: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Maximum Step Size

• In practice, it’s also bad to make updates that are too large

– Example may be labeled incorrectly

– You may not have enough features

– Solution: cap the maximum possible value of with some constant C

– Corresponds to an optimization that assumes non-separable data

– Usually converges faster than perceptron

– Usually better, especially on noisy data

Page 13: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Linear Separators

• Which of these linear separators is optimal?

Page 14: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Support Vector Machines

• Maximizing the margin: good according to intuition, theory, practice

• Only support vectors matter; other training examples are ignorable

• Support vector machines (SVMs) find the separator with max margin

• Basically, SVMs are MIRA where you optimize over all examples at once

MIRA

SVM

Page 15: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Classification: Comparison

• Naïve Bayes– Builds a model training data

– Gives prediction probabilities

– Strong assumptions about feature independence

– One pass through data (counting)

• Perceptrons / MIRA:– Makes less assumptions about data

– Mistake-driven learning

– Multiple passes through data (prediction)

– Often more accurate

Page 16: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

16

Different Non-Linearly

Separable Problems

StructureTypes of

Decision Regions

Exclusive-OR

Problem

Classes with

Meshed regions

Most General

Region Shapes

Single-Layer

Two-Layer

Three-Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

AB

B

A

AB

B

A

AB

B

BA

BA

BA

Page 17: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Non-Linear Separable Data

• Demo: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

3/30/2016 CS325: Artificial Intelligence, Spring 2016 17

layer_defs = [];

layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});

layer_defs.push({type:'fc', num_neurons:1, activation: 'sigmoid'});

layer_defs.push({type:'softmax', num_classes:2});

net = new convnetjs.Net();

net.makeLayers(layer_defs);

trainer = new convnetjs.SGDTrainer(net, {learning_rate:0.01, momentum:0.1,

batch_size:10, l2_decay:0.001});

Page 18: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

18

Multilayer Perceptron (MLP)

Output Values

Input Signals (External Stimuli)

Output Layer

Adjustable

Weights

Input Layer

Page 19: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Slide 19

The Key Elements of Neural Networks

• Neural computing requires a number of neurons, to be connected together into a neural network. Neurons are arranged in layers.

• Each neuron within the network is usually a simple processing unit which takes one or more inputs and produces an output. At each neuron, every input has an associated weight which modifies the strength of each input. The neuron simply adds together all the inputs and calculates an output to be passed on.

Inputs Weights

Output

Bias1

3p

2p

1p

fa

3w

2w

1w

bwpfbwpwpwpfa ii332211

Page 20: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Slide 20

Activation functions

Page 21: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

21

Sigmoid Unit

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=1

net=i=0n wi xi

o

o=(net)=1/(1+e-net)

(x) is the sigmoid function: 1/(1+e-x)

d(x)/dx= (x) (1- (x))

Derive gradient decent rules to train:

• one sigmoid function

E/wi = -d(td-od) od (1-od) xi

• Multilayer networks of sigmoid units

backpropagation:

Page 22: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Online Demo: Add 2nd Layer

• Demo: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

3/30/2016 CS325: Artificial Intelligence, Spring 2016 22

layer_defs = [];

layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});

layer_defs.push({type:'fc', num_neurons:5, activation: 'sigmoid'});

layer_defs.push({type:'softmax', num_classes:2});

net = new convnetjs.Net();

net.makeLayers(layer_defs);

trainer = new convnetjs.SGDTrainer(net, {learning_rate:0.01, momentum:0.1,

batch_size:10, l2_decay:0.001});

Page 23: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

2-Layer Neural Network

3/30/2016 CS325: Artificial Intelligence, Spring 2016 23

Page 24: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Hidden Layers

• Intermediate hidden units: learned features– activation vector h behaves a like our feature vector f(x) for

linear classifier. But, h is “learned” automatically.

• What kind of features that can be learned? – must be of the form x

3/30/2016 CS325: Artificial Intelligence, Spring 2016 24

Page 25: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Slide 25

Feedforword NNs

• The basic structure off a feedforward Neural Network

• When the desired output are known we have supervised learning or learning with a teacher.

Page 26: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

(Feed-forward) Neural Network

x1

x2

x3

+1

Σ

w1

w2

w3

b

Output

Page 27: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

(Feed-forward) Neural Network

x1

x2

x3

+1

Σ

w1

w2

w3

b

Output

Weights

bias

Page 28: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

(Feed-forward) Neural Network

x1

x2

x3

+1

a2

w11

w21

Output

Stack many non-linear units (neurons)

a2

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

wij- weight between i-th

input and j-th hidden

unit

Page 29: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Neural networks

• Very expressive– Able to learn highly non-linear functions

• Supervised training– Binary classification

– Multi-class classification

– Regression / composite loss

• Unsupervised training– Dimensionality reduction

– Complex sequence modeling

– High level feature learning

Page 30: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Backpropagation Algorithm – Main

Idea – error in hidden layers

The ideas of the algorithm can be summarized as follows :

1. Computes the error term for the output units using the

observed error.

2. From output layer, repeat

- propagating the error term back to the previous layer

and

- updating the weights between the two layers

until the earliest hidden layer is reached.

Page 31: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Backpropagation Algorithm

• Initialize weights (typically random!)

• Keep doing epochs

– For each example e in training set do

• forward pass to compute– O = neural-net-output(network,e)

– miss = (T-O) at each output unit

• backward pass to calculate deltas to weights

• update all weights

– end

• until tuning set error stops improving

Backward pass explained in next slideForward pass explained

earlier

Page 32: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Backward Pass

• Compute deltas to weights

– from hidden layer

– to output layer

• Without changing any weights (yet), compute the actual contributions

– within the hidden layer(s)

– and compute deltas

Page 33: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training Learning Details

• Method for learning weights in feed-forward (FF) nets

• Can’t use Perceptron Learning Rule– no teacher values are possible for hidden units

• Use gradient descent to minimize the error– propagate deltas to adjust for errors

backward from outputsto hidden layers

to inputs

forward

backward

Page 34: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Error Surface

Error as function of

weights in

multidimensional space

error

weights

Page 35: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Gradient

• Trying to make error decrease the fastest

• Compute:• GradE = [dE/dw1, dE/dw2, . . ., dE/dwn]

• Change i-th weight by• deltawi = -alpha * dE/dwi

• We need a derivative!

• Activation function must be continuous, differentiable, non-decreasing, and easy to compute

Derivatives of error for weights

Compute

deltas

Page 36: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Can’t use LTU

• To effectively assign credit / blame to units in hidden layers, we want to look at the first derivativeof the activation function

• Sigmoid function is easy to differentiate and easy to compute forward

Linear Threshold Units Sigmoid function

Page 37: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Loss Function

x1

x2

x3

+1

a2

w11

w21

Output

Stack many non-linear units (neurons)

a2

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

Page 38: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Loss Function

x1

x2

x3

+1

a2

w11

w21

Output

Stack many non-linear units (neurons)

a2

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

Page 39: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Loss functions

• Classification

– Cross entropy (binary and multiclass)

• Regression

– Squared deviation (mean squared error)

Page 40: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Loss Function

x1

x2

x3

+1

w11

w21

Output

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

a1

(2)

a1

(2)

a2

(2) a1

(3)

Page 41: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Forward Pass

x1

x2

x3

+1

Output

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

Page 42: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Forward Pass

+1

+1

Layer k-1 Layer k

Page 43: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Forward Pass

x1

x2

x3

+1

Output

Stack many non-linear units (neurons)

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Page 44: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Forward Pass

x1

x2

x3

+1

w11

w21

Output

Stack many non-linear units (neurons)

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

Page 45: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Backpropagation Network training

• 1. Initialize network with random weights

• 2. For all training cases (called examples):– a. Present training inputs to network and calculate

output

– b. For all layers (starting with output layer, back to input layer):

• i. Compare network output with correct output

(error function)

• ii. Adapt weights in current layer

This is

what

you

want

Page 46: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Slide 46

The Learning Rule: Backpropagation

• The delta rule is often utilized by the most common class of ANNs called backpropagational neural networks.

Input

Desired

Output

Page 47: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Back-propagation

• Use gradient descent optimization

f (x) =s (W (2)s (W (1)x+b(1))+b(2))

Loss(x, y;W,b) = y- f (x)2

Page 48: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Back-propagation

• Use gradient descent optimization

f (x) =s (W (2)s (W (1)x+b(1))+b(2))

Loss(x, y;W,b) = y- f (x)2

Page 49: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

What is back-propagation?

• An algorithm to compute (W,b) gradients

Page 50: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Chain Rule for Derivatives

x g f

g(x) f(g(x))

Page 51: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Chain Rule for Derivatives

x

g1

f

f(g1(x) + g2(x))

g2

Page 52: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Chain Rule for Derivatives

x

g1

f

f(g1(x) + … + gk(x))

gk

Page 53: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Training: Forward Pass

x1

x2

x3

+1

w11

w21

Output

Stack many non-linear units (neurons)

+1

Layer I

(input)

Layer 2

(hidden)

Layer 3

(output)

Loss function

Gradients of W’s and b’s

Page 54: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

How do we pick ?

1. Tuning set, or

2. Cross validation, or

3. Small for slow, conservative learning

Page 55: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

How many hidden layers?

• Usually just one (i.e., a 2-layer net)

• How many hidden units in the layer?

– Too few ==> can’t learn

– Too many ==> poor generalization

Page 56: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

How big a training set?

• Determine your target error rate, e

• Success rate is 1- e

• Typical training set approx. n/e, where n is the number of weights in the net

• Example:

– e = 0.1, n = 80 weights

– training set size 800

trained until 95% correct training set classification

should produce 90% correct classification

on testing set (typical)

Page 57: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Back-propagation: Summary

• For each input (x,y)

• Compute forward pass (values of hidden units at all layers) using training data (x,y)

• For each layer

– Compute gradients of outputs w.r.t. inputs

• Compute

• Update

Page 58: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Neural Networks: Summary

x1

x2

x3

+1

Σ

w1

w2

w3

b

Output

Layer 1 Layer 2

Page 59: Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Neural Networks: Applications

• Image Modeling / Classification

• Feature Learning (Deep Learning; many layers)

• Compositional Text Representation

• Sequence Modeling