weave-d - 2nd progress evaluation presentation

51
Weave-D A cognitive approach towards data accumulation and fusion Thushan Ganegedara Ruwan Gunarathne Lasindu Vidana Pathiranage Buddhima Wijeweera

Upload: lasinducharith

Post on 11-Apr-2017

1.043 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Weave-D - 2nd Progress Evaluation Presentation

Weave-D A cognitive approach towards data accumulation and fusion

Thushan GanegedaraRuwan GunarathneLasindu Vidana PathiranageBuddhima Wijeweera

Page 2: Weave-D - 2nd Progress Evaluation Presentation

Why Weave-D?

Growth of amount of information

Handle data

Temporal

Multi-

modal

Prevent catastrophi

c interferenc

e

Incremental learning algorithms

Visualizing information

Intuitive

Simple

Apply previous

knowledge to acquire

new knowledge

Conceptualization

Generalization of

acquired knowledge

????

Page 3: Weave-D - 2nd Progress Evaluation Presentation

What is Weave-D?

Accumulate data (i.e.

Images, Text)

Feature Extraction

Incremental learning

Link generation

Query & Visualize UI

Accumulates temporal, multi-modal or multi source data in an organized manner

Extract information from data (ex. Color, Edge, Shape information of images)

Incrementally learn using IKASL algorithmLinks represent relationships between multi-modal data

Page 4: Weave-D - 2nd Progress Evaluation Presentation

Major Research Problems•Integrating incremental learning algorithm to

the selected artificial perception model

•What are the potential performance improvements for selected unsupervised learning algorithm?

•What are the suitable feature extraction techniques for images and text?

•How to visualize complex learning outcomes to user?

Page 5: Weave-D - 2nd Progress Evaluation Presentation

Major Challenges•General

▫Limited resources and novelty of the algorithms▫Finding suitable datasets

•Image Feature Extraction▫Deciding the best colors space to represent

images▫Researching shape descriptor implementations

•Text Feature Extraction Techniques▫Researching suitable text feature extraction

techniques

Page 6: Weave-D - 2nd Progress Evaluation Presentation

Major Challenges (cont.)•Unsupervised Learning Algorithms

▫Implementing IKASL▫Testing and verifying correctness of IKASL

•Researching information visualization tools to fit our requirements

Page 7: Weave-D - 2nd Progress Evaluation Presentation

Project Scope•The proposed system will be implemented

for handling only Image and Text inputs •System will be designed to be used by

data analysts•System will,

▫Extract feature vectors of images and texts ▫Acquire knowledge using data input to

Weave-D▫Generate links between data

Page 8: Weave-D - 2nd Progress Evaluation Presentation

Project Scope•Persistence technique (ex. SQL DB,XML,

etc.) will be used to store acquired knowledge and generated links

•Provide an interface for users to query/visualize information

Page 9: Weave-D - 2nd Progress Evaluation Presentation

Assumptions & Limitations•Selected features(e.g. color, shape,…) provide a

good representation of the data (e.g. images,…)•In artificial perception model, perception at a

certain layer, can be represented by one most significant feature from the layer below

•Input data should be compatible with feature extractors (i.e. Type, Format, …)

•Tools required (e.g. Feature extraction, Information Visualization) can be utilized in the project with no/slight modifications

Page 10: Weave-D - 2nd Progress Evaluation Presentation

Deliverables•JAVA implementation of the proposed

system including several sub-components

•Documentation

Incrementallearning

Information

persistence

Information linking

Information

visualization

• Research Proposal • Literature Review• Project Scope Document • Architectural Document• Project Report • User Manual

Page 11: Weave-D - 2nd Progress Evaluation Presentation

Deliverables•Image - Color feature extraction tools

(C#)•Image – Shape feature extraction tools

(Matlab)•Image – Edge feature extraction tools

(Java)•Text feature extraction tools (Java)

•Unsupervised learning algorithm testing and visualization tools (GSOM, IKASL algorithms) (Java)

•Modified information visualization tool (Java)

Page 12: Weave-D - 2nd Progress Evaluation Presentation

Feasibility of Deliverables•Feature extraction tools

▫Images (e.g. img (Rummager))▫Text (e.g. Wordnet, uClassify)

•Unsupervised learning algorithm tools▫GSOM▫IKASL

•Visualization▫Arena 3D (with modifications)

Page 13: Weave-D - 2nd Progress Evaluation Presentation

IKASL

GSOM

SOM

Literature ReviewArtificial Perception

ModelUnsupervised

Learning Algorithms

Information Visualization

Feature Extraction Techniques

Other Similar Systems

3D

2DText

Images

Page 14: Weave-D - 2nd Progress Evaluation Presentation

Which Fits Where?

User

Page 15: Weave-D - 2nd Progress Evaluation Presentation

Perception Model

Page 16: Weave-D - 2nd Progress Evaluation Presentation

Artificial Perception Model [1]

•Inspired by human perceptive and cognitive system

•Close resemblance to human brain•Key features

▫Supports multiple modalities▫Ability to generate high-level perceptions

by aggregating input stimuli belonging to multiple modalities

▫Conceptualization of information[1] Bamunusinghe, Jeewanee, and Damminda Alahakoon. "Artificial Visual Percepts for Image Understanding." In Proceedings of the International Conference on Intelligent Systems. 2010.

Page 17: Weave-D - 2nd Progress Evaluation Presentation

Artificial Perception Model

Specialization of vision

Specialization of color

Page 18: Weave-D - 2nd Progress Evaluation Presentation

Unsupervised Learning Algorithms

Page 19: Weave-D - 2nd Progress Evaluation Presentation

Self-Organizing Maps (SOM) [2]

•Visualization technique which  reduces the dimensions of data to help humans understand high dimensional data. 

•Self-Organizing Map (SOM) is a type of unsupervised artificial neural network.

•Topology preserving map

[2] Kohonen, Teuvo. "The self-organizing map." Proceedings of the IEEE 78, no. 9 (1990):1464-1480

Page 20: Weave-D - 2nd Progress Evaluation Presentation

SOM

2-Dimensional output space High dimensional input space

Page 21: Weave-D - 2nd Progress Evaluation Presentation

Growing Self-Organizing Maps (GSOMs) [3]

• GSOM is an extension of Self-Organizing maps (SOM), which is very popular in knowledge discovery applications.

• GSOM algorithm overcomes several limitations of SOM.

• The main advantage of GSOM over SOM is that, GSOM has the ability to grow and modify the shape to represent the data space better.

• Other similar work are,

▫ Growing Cell Structures (GCS’s)▫ Neural Gas Algorithm (NGA)▫ Incremental Grid Growing (IGG)

[3] Alahakoon, Damminda, Saman K. Halgamuge, and Bala Srinivasan. "Dynamic self-organizing maps with controlled growth for knowledge discovery." Neural Networks, IEEE Transactions on 11, no. 3 (2000): 601-614

Page 22: Weave-D - 2nd Progress Evaluation Presentation

GSOM Algorithm•GSOM is an unsupervised neural network,

which is initialized with four nodes and develops to represent the input data space.

•There are three main phases which can be distinguished in GSOM algorithm

▫Initialization phase▫Growing phase ▫Smoothing phase

Page 23: Weave-D - 2nd Progress Evaluation Presentation

Initialization Phase•Starting four nodes will be initialized

with random values from the input vector space.

(0,1)

(1,1)

(0,0) (1,0)

Page 24: Weave-D - 2nd Progress Evaluation Presentation

Growing phase

(0,1)

(1,1)

(0,0) (1,0)

Input

Euclidian Distance

Winner

Neighborhood

Page 25: Weave-D - 2nd Progress Evaluation Presentation

Smoothing phase

•Growing phase stops when new node growth saturates

•Reduce learning rate and fix a small starting neighborhood.

•Find winner and adapt the weights of winner and neighbors in the same way as in growing phase.

Page 26: Weave-D - 2nd Progress Evaluation Presentation

SOM GSOM

Fixed number of nodes & Grid size

Ability to grow and change the shape

Page 27: Weave-D - 2nd Progress Evaluation Presentation

IKASL Algorithm [4]

• Most current Hebbian rule based algorithms do not encompass incremental learning and life-long learning

• Hebbian rule based unsupervised incremental learning algorithm

• Is both stable and plastic• Can be understood as an n-layer structure• A single layer comprises 2 sub-layers

▫Learning Layer▫Generalized Layer

[4] De Silva, Daswin, and Damminda Alahakoon. "Incremental knowledge acquisition and self learning from text." In Neural Networks (IJCNN), The 2010 International Joint Conference on, pp. 1-8. IEEE, 2010.

Page 28: Weave-D - 2nd Progress Evaluation Presentation

IKASL Algorithm

Learn layer (L1)

Generalized layer (G1)

Learn layer (L2)

Input 1

Input 2

Page 29: Weave-D - 2nd Progress Evaluation Presentation

IKASL Algorithm (ctd)

Learn layer (L1)

General layer (G1)

Learn layer (L2)

General layer (G2)

Learn layer (L3)

General layer (G3)

Page 30: Weave-D - 2nd Progress Evaluation Presentation

Feature Extraction

Page 31: Weave-D - 2nd Progress Evaluation Presentation

Image Feature Extraction• In project we do not directly interact with raw

images• There are lots of redundant data in images• The solution is feature extraction techniques

• This transformation process of input data to a set of feature vectors is known as feature extraction

• The Moving Picture Expert Group (MPEG) was established and it has developed several implementations

• In MPEG-7: Multimedia content description interface was created

Page 32: Weave-D - 2nd Progress Evaluation Presentation

MPEG-7 Descriptors [5-7]

•Descriptors: a core set of quantitative measures of audio-visual features

•Some of MPEG-7 Descriptors are,▫Dominant Colour Descriptor

▫Colour Layout Descriptor

▫Edge Histogram Descriptors

[5] Ortiz, Edward, Cesar Pantoja, and María Trujillo. "An MPEG-7 Browser." InLatin American Conference on Networked Electronic Media. 2009.[6] Wu, Peng, Yong Man Ro, Chee Sun Won, and Yanglim Choi. "Texture descriptors in MPEG-7." In Computer Analysis of Images and Patterns, pp. 21-28. Springer Berlin Heidelberg, 2001[7] Chatzichristofis, Savvas A., Yiannis S. Boutalis, and Mathias Lux. "Img (rummager): An interactive content based image retrieval system." In Similarity Search and Applications, 2009. SISAP'09. Second International Workshop on, pp. 151-153. IEEE, 2009.

Page 33: Weave-D - 2nd Progress Evaluation Presentation

Dominant Colour Descriptor

Page 34: Weave-D - 2nd Progress Evaluation Presentation

Colour Layout Descriptor

Page 35: Weave-D - 2nd Progress Evaluation Presentation

Edge Histogram Descriptor[17]

[17] Eitz, Mathias, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. "An evaluation of descriptors for large-scale image retrieval from sketched feature lines." Computers & Graphics 34, no. 5 (2010): 482-498.

Page 36: Weave-D - 2nd Progress Evaluation Presentation

Proportion20018.Jpg, 0.197, 0.162, 0.323, 0.319, 0.437

20019.jpg, 0.340, 0.076, 0.282, 0.303, 0.374

20020.jpg, 0.180, 0.212, 0.333, 0.275, 0.333

20021.jpg, 0.165, 0.222, 0.278, 0.335, 0.409

20024.jpg, 0.324, 0.100, 0.295, 0.281, 0.243

……….

20066.jpg, 0.069, 0.317, 0.257, 0.358, 0.362

Position20018.jpg,

4,4,4,4,4,4,4,3,4,4,4,4,4,4,4,4

20019.jpg, 4,0,4,4,0,4,4,4,4,4,0,4,3,3,4,4

20020.jpg, 4,4,4,4,2,2,4,2,4,4,1,1,4,4,2,4

20021.jpg, 4,3,3,4,4,4,4,4,4,4,4,4,1,1,1,2

20024.jpg, 3,0,2,0,4,4,4,4,4,3,4,1,3,3,0,0

……….

Feature Vectors generated with EHG

20018.Jpg , 0, 0, 1, 1, 1

20019.Jpg , 1, 0, 0, 1, 1

20020.Jpg , 0, 0, 1, 1, 1

20021.Jpg , 0, 0, 1, 1, 1

20024.Jpg , 1, 0, 1, 1, 0

………

………

Existence

Page 37: Weave-D - 2nd Progress Evaluation Presentation

Results (Texture)

Existence Proportion Position

Page 38: Weave-D - 2nd Progress Evaluation Presentation

Shape Descriptors

[8] Bosch, Anna, Andrew Zisserman, and Xavier Munoz. "Representing shape with a spatial pyramid kernel." In Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401-408. ACM, 2007.

•PHOG Descriptor [8]

▫Outcomes Local shape (Given by each divided region) Spatial layout (Given by HOGs of regions of finer

spatial grids)

Page 39: Weave-D - 2nd Progress Evaluation Presentation

Shape Descriptors•GIST Descriptor [9]

▫A holistic representation of an image

▫Spatial Envelope Described by boundary of surface of image and

inner textures Properties

Naturalness, Openness, Roughness, Ruggedness, Expansion

▫Estimating spatial envelope properties By calculating the energy spectrum of the image

(DFT)[9] Oliva, Aude, and Antonio Torralba. "Modeling the shape of the scene: A holistic representation of the spatial envelope." International journal of computer vision 42, no. 3 (2001): 145-175.

Page 40: Weave-D - 2nd Progress Evaluation Presentation

Text Feature Extraction•Suitable text feature extraction techniques

are limited, why?•Technique• document is encoded as a histogram of words

[10]

• select the set of keywords which are usually regarded

as an important keys, to create a feature vector [11]

• using WordNet lexical to create the feature vector [12]

• using uClassify web-service to create the feature

vector

[10] Kaski, Samuel, Timo Honkela, Krista Lagus, and Teuvo Kohonen. "WEBSOM–self-organizing maps of document collections." Neurocomputing 21, no. 1 (1998): 101-117. [11] Chumwatana, Todsanai, K. Wong, and Hong Xie. "A SOM-Based Document Clustering Using Frequent Max Substring for Non-Segmented Texts." Journal of Intelligent Learning Systems & Applications 2 (2010): 117-125. [12] Gharib, Tarek F., Mohammed M. Fouad, Abdulfattah Mashat, and Ibrahim Bidawi. "Self Organizing Map-based Document Clustering Using WordNet Ontologies." International Journal of Computer Science 9 (2012).

Page 41: Weave-D - 2nd Progress Evaluation Presentation

Wordnet Lexical Categories

ActAnimalArtifactFood..Communication

BodyCreationEmotionMotion..Weather

Page 42: Weave-D - 2nd Progress Evaluation Presentation

uClassify output

docs sport games society Recreation

Arts Science

Business Computers

Health Home

doc1 95.5 4.3 0.1 0 0 0 0 0 0 0doc2 0 0 0 0 0 84 16 0 0 0

“Football refers to a number of sports that involve, to varying degrees, kicking a ball with the foot to score a goal. The most popular of these sports worldwide is association football, more commonly known as just "football" or "soccer". Unqualified, the word football applies to whichever form of football is the most popular in the regional context in which the word appears, including association football, as well as American football, Australian rules football, Canadian football, Gaelic football, rugby league, rugby union and other related games. These variations of football are known as football codes.”

http://en.wikipedia.org/wiki/Football

Page 43: Weave-D - 2nd Progress Evaluation Presentation

Information Visualization

Page 44: Weave-D - 2nd Progress Evaluation Presentation

Information Visualization • The process of showing information in more

intuitive manner• Today data analysts preferred to use computer

generated models• Information Visualization can be represented by

following taxonomyVisualizing

Tools

2D

2D perspective

3D

2D perspective

3D perspectiv

e

Page 45: Weave-D - 2nd Progress Evaluation Presentation

ToolsGephi [13]

Arena [16]

3D BioLayout [14]

UbiGraph [15]

Page 46: Weave-D - 2nd Progress Evaluation Presentation

Other Similar Systems

Page 47: Weave-D - 2nd Progress Evaluation Presentation

Existing Similar Systems

•Watson is an artificial intelligence computer system capable of answering questions in natural language.

IBM Watson [17]

[17] IBM Watson. n.d. http://www-03.ibm.com/innovation/us/watson/index.shtml (accessed April 28, 2013).

Page 48: Weave-D - 2nd Progress Evaluation Presentation

Significance of Watson• The ability to discern double meanings of words,

puns, rhymes, and inferred hints.• Extremely rapid responses• The ability to process vast amounts of information

to make complex and subtle logical connections

Limitations of Watson• Cannot process multi-modal data

• Cannot build a higher level perception of its data

• Watson does not learn incrementally• Requires complex infrastructure

Page 49: Weave-D - 2nd Progress Evaluation Presentation

Contributions of Project MembersMajor Task(s) Contributor

Implement SOM RuwanResearch and Implement GSOM ThushanTesting GSOM LasinduResearch IKASL ThushanResearch Fuzzy integral LasinduResearch Image Feature Extraction

Color BuddhimaEdge RuwanShape Thushan

Research Text Feature Extraction RuwanResearch WSD Lasindu

Research Information Visualization Buddhima

Page 50: Weave-D - 2nd Progress Evaluation Presentation

References[13] Gephi, an open source graph visualization and manupulation software. n.d. http://gephi.org/ (accessed April 28, 2013).

[14] BioLayout Express 3D. n.d. http://www.biolayout.org/ (accessed April 28, 2013).

[15] Ubigraph: Free dynamic graph visulization software. n.d. http://ubietylab.net/ubigraph/ (accessed April 28, 2013).

[16] Secrier, Maria. Arena3D. n.d. http://arena3d.org/ (accessed April 28, 2013).

Page 51: Weave-D - 2nd Progress Evaluation Presentation

Thank You!