semi-supervised single-label text categorization using...

Post on 18-Mar-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Semi-supervised Single-label Text Categorizationusing Centroid-based Classifiers

Ana Cardoso-Cachopo Arlindo Oliveira

Instituto Superior Tecnico — Technical University of Lisbon / INESC-ID

SAC-IAR 2007, March 12th

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 1 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Outline

1 Problem Description

2 Characteristics of the Datasets

3 Why use Centroid-based Methods

4 Why use Unlabeled Data

5 Incorporate Unlabeled Data using EM

6 Incrementally Incorporate Unlabeled Data

7 Experimental Results

8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Problem Description

Text Categorization

Single-label

DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19

Characteristics of the Datasets

Train Test Total Smallest LargestDocs Docs Docs Class Class

R8 5485 2189 7674 51 3923

20Ng 11293 7528 18821 628 999

Web4 2803 1396 4199 504 1641

Cade12 27322 13661 40983 625 8473

Numbers of documents for the datasets: number of training documents,number of test documents, total number of documents, number ofdocuments in the smallest class, and number of documents in the largestclass.

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 4 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Centroid-based Methods

Very fast

Good Accuracy

0.0

0.2

0.4

0.6

0.8

1.0Centroid

SVM

k-NN

LSI

Vector

Dumb

R8 20Ng Web4 Cade12

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19

Why use Unlabeled Data

Small amounts of labeled data available

Large amounts of unlabeled data available

Hard or expensive to label new data

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19

Why use Unlabeled Data

Small amounts of labeled data available

Large amounts of unlabeled data available

Hard or expensive to label new data

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19

Why use Unlabeled Data

Small amounts of labeled data available

Large amounts of unlabeled data available

Hard or expensive to label new data

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incorporate Unlabeled Data using EM

If the entire dataset is available from the start, like in a library.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Incrementally Incorporate Unlabeled Data

If the dataset changes over time, like a news feed or the web.

Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:

Classify dj according to its similarity to each of the centroids.

Update the centroids with the new document dj classified in theprevious step.

Outputs: For each class cj , the centroid −→cj .

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19

Experimental Results - Synthetic Dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

Gaussian Distributionsµ1 = 1.0, σ1 = 1.0

µ2 = −1.0, σ2 = 1.0

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 9 / 19

Experimental Results - Synthetic Dataset

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0 5 10 15 20

Acc

ura

cy

Labeled documents per class

µ1 = 1.0, σ1 = 1.0, µ2 = −1.0, σ2 = 1.0

CentroidEMInc

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

Gaussian Distributionsµ1 = 1.0, σ1 = 1.0

µ2 = −1.0, σ2 = 1.0

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 10 / 19

Experimental Results - Synthetic Dataset

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0 5 10 15 20

Acc

ura

cy

Labeled documents per class

µ1 = 1.0, σ1 = 2.0, µ2 = −1.0, σ2 = 2.0

CentroidEMInc

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

Gaussian Distributionsµ1 = 1.0, σ1 = 2.0

µ2 = −1.0, σ2 = 2.0

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 11 / 19

Experimental Results - Synthetic Dataset

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0 5 10 15 20

Acc

ura

cy

Labeled documents per class

µ1 = 1.0, σ1 = 4.0, µ2 = −1.0, σ2 = 4.0

CentroidEMInc

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

Gaussian Distributionsµ1 = 1.0, σ1 = 4.0

µ2 = −1.0, σ2 = 4.0

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 12 / 19

Experimental Results - Real World Datasets

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0 5 10 15 20 25 30 35 40

Acc

ura

cy

Labeled documents per class

R8

CentroidEMInc

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 13 / 19

Experimental Results - Real World Datasets

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

20Ng

CentroidEMInc

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 14 / 19

Experimental Results - Real World Datasets

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

Web4

CentroidEMInc

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 15 / 19

Experimental Results - Real World Datasets

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

Cade12

CentroidEMInc

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 16 / 19

Experimental Results - Real World Datasets

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0 5 10 15 20 25 30 35 40

Acc

ura

cy

Labeled documents per class

R8

CentroidEMInc

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

20Ng

CentroidEMInc

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

Web4

CentroidEMInc

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 10 20 30 40 50 60 70

Acc

ura

cy

Labeled documents per class

Cade12

CentroidEMInc

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 17 / 19

Conclusions and Future Work

If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.

Using unlabeled data degrades performance if the initial model is notprecise enough.

As future work, we plan to extend this approach to multi-labeldatasets.

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19

Conclusions and Future Work

If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.

Using unlabeled data degrades performance if the initial model is notprecise enough.

As future work, we plan to extend this approach to multi-labeldatasets.

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19

Conclusions and Future Work

If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.

Using unlabeled data degrades performance if the initial model is notprecise enough.

As future work, we plan to extend this approach to multi-labeldatasets.

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19

Thank You.

Any Questions?

(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 19 / 19

top related