by santi caballé , claudi paniagua, fatos xhafa, and thanasis daradoumis

29
by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on Grid Computing and its Application to Data Analysis GADA'05 Agia Napa, Cyprus – November, 1-2 2005 A Grid-aware Implementation for A Grid-aware Implementation for Providing Effective Feedback to Providing Effective Feedback to On-line Learning Groups On-line Learning Groups

Upload: jerry

Post on 19-Mar-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Second International Workshop on Grid Computing and its Application to Data Analysis GADA'05 Agia Napa, Cyprus – November, 1-2 2005. A Grid-aware Implementation for Providing Effective Feedback to On-line Learning Groups. by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

by Santi Caballé, Claudi Paniagua, Fatos Xhafa, and

Thanasis DaradoumisOpen University of Catalonia

Barcelona - Spain

Second International Workshop on Grid Computing and its Application to Data Analysis

GADA'05Agia Napa, Cyprus – November, 1-2 2005

A Grid-aware Implementation for A Grid-aware Implementation for Providing Effective Feedback to On-line Providing Effective Feedback to On-line

Learning GroupsLearning Groups

Page 2: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 22

IndexIndex IntroductionIntroduction:: thethe process of embedding information process of embedding information

and knowledge into and knowledge into CSCLCSCL applications. applications. Approach:Approach: need for structuring and processing of need for structuring and processing of

large amounts of group activity information.large amounts of group activity information. Problem:Problem: lack of computational resources. lack of computational resources. Solution:Solution: a Grid-aware approach based on a Grid-aware approach based on the the

Master-Worker paradigm.Master-Worker paradigm. An application:An application: a Grid-based prototype to process a Grid-based prototype to process

group activity log files.group activity log files. Processing results: Processing results: empirical analysis. empirical analysis. ConclusionsConclusions and future work. and future work.

Computer-Supported Collaborative LearningComputer-Supported Collaborative Learning is a paradigm fis a paradigm for or research in educational technology that focuses on the use of research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning.mediation tool within collaborative methods of learning.

B. Wasson (1998)B. Wasson (1998)

In In CSCL CSCL environments, the analysis of the information related environments, the analysis of the information related to the collaborative group activity is crucial for understanding to the collaborative group activity is crucial for understanding collaboration and group processes.collaboration and group processes.

P. Dillenbourg (1999) P. Dillenbourg (1999)

Page 3: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 33

Introduction (I): The process of embedding Introduction (I): The process of embedding information and knowledge into CSCL applicationsinformation and knowledge into CSCL applications

The whole pictureThe whole picture

Four stages in event management: Four stages in event management: Classification, processing, analysis and presentation.Classification, processing, analysis and presentation.

Page 4: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 44

Introduction (II): The process of embedding Introduction (II): The process of embedding information and knowledge into CSCL applicationsinformation and knowledge into CSCL applications

Stage I: ClassificationStage I: Classification

Collection of information.Collection of information. Extraction of actions.Extraction of actions. Identification of events.Identification of events. Categorization according toCategorization according to

• Task performanceTask performance• Group functioningGroup functioning• ScaffoldingScaffolding

Store as system log files.Store as system log files.

Classification in synchronous environments is very similar.Classification in synchronous environments is very similar.

Page 5: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 55

Introduction (III): The process of embedding Introduction (III): The process of embedding information and knowledge into CSCL applications information and knowledge into CSCL applications

Stage II: ProcessingStage II: Processing Obtain event information from Obtain event information from

large log files.large log files. Process log files according to Process log files according to

desired criteria. e.g.desired criteria. e.g.• timetime• workspaceworkspace

Store processing results in a Store processing results in a suitable database.suitable database.

Processing of events needs great computational power.Processing of events needs great computational power.

Page 6: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 66

Introduction (IV): The process of embedding Introduction (IV): The process of embedding information and knowledge into CSCL applications information and knowledge into CSCL applications

Stage III: AnalysisStage III: Analysis Need for extracting Need for extracting

complex knowledge from complex knowledge from the database.the database.

Define consulting criteria.Define consulting criteria. Send criteria and data to Send criteria and data to

external statistics package.external statistics package. Obtain useful statistical Obtain useful statistical

results from the analysis.results from the analysis.

External analysis offers the best existing statistical package.External analysis offers the best existing statistical package.

Page 7: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 77

Introduction (V): The process of embedding Introduction (V): The process of embedding information and knowledge into CSCL applications information and knowledge into CSCL applications

Stage IV: PresentationStage IV: Presentation Predefine an XML coding to Predefine an XML coding to

represent represent ad hocad hoc statistical statistical measurements.measurements.

Structure statistical results Structure statistical results into XML output.into XML output.

Convert XML into desired Convert XML into desired presentation format. presentation format.

Present results to users.Present results to users.Users receive constant knowledge in terms of appropriate Users receive constant knowledge in terms of appropriate feedback to influence their motivation and emotional state.feedback to influence their motivation and emotional state.

Page 8: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 88

Approach (I) Approach (I) MotivationMotivation

Support for real on-line environments with a large Support for real on-line environments with a large number of students and tutors that are number of students and tutors that are geographically distributed.geographically distributed.

High degree of user-user and user-system High degree of user-user and user-system interaction generates lots of event information.interaction generates lots of event information.

Constant provision of complex knowledge to group Constant provision of complex knowledge to group participants.participants.

Need to supply efficient and useful feedback for Need to supply efficient and useful feedback for improving the improving the motivation, emotional state, and motivation, emotional state, and problem-solving abilities of groups in on-line problem-solving abilities of groups in on-line collaborative learning.collaborative learning.

Page 9: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 99

Approach (II)Approach (II)Context at Open University of CataloniaContext at Open University of Catalonia

Group activity at Open University of Catalonia involves Group activity at Open University of Catalonia involves hundreds of students and dozens of tutors in several on-hundreds of students and dozens of tutors in several on-line courses.line courses.

The complexity of the learning practices entails intensive The complexity of the learning practices entails intensive collaboration activity.collaboration activity.

BSCW is used as a groupware system to capture group BSCW is used as a groupware system to capture group activity interaction in log files.activity interaction in log files.

BSCW does not provide log file processing nor statistical BSCW does not provide log file processing nor statistical analysis capabilities.analysis capabilities.

BSCW generates a huge daily single log file and does BSCW generates a huge daily single log file and does not classify nor structure data in any way. not classify nor structure data in any way.

Page 10: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1010

Statement of the problem Statement of the problem Lack of computational resourcesLack of computational resources

Need for processing of a huge amount of event Need for processing of a huge amount of event information gathered in single log files.information gathered in single log files.

Essential to constantly dispose the processing results of Essential to constantly dispose the processing results of group activity in real-time.group activity in real-time.

Event information in log files should be partitioned in Event information in log files should be partitioned in multiple log files according to particular needs.multiple log files according to particular needs.

Event information must be constantly processed in an Event information must be constantly processed in an efficient manner during the processing stage.efficient manner during the processing stage.

Lack of sufficient computational resources is the main Lack of sufficient computational resources is the main obstacle to the constant processing of multiple data log obstacle to the constant processing of multiple data log files in real time.files in real time.

Page 11: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1111

Obtain event information from Obtain event information from large log files.large log files.

Structure the information Structure the information according to particular needs.according to particular needs.

Create log files of different Create log files of different degrees of granularity.degrees of granularity.

Process all log files at the Process all log files at the same time. same time.

Store results in the database.Store results in the database.Need for the processing of all log files to be parallelized.Need for the processing of all log files to be parallelized.

Solution (I)Solution (I)Redefining the processing stageRedefining the processing stage

Page 12: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1212

Solution (II)Solution (II)A Grid-based solutionA Grid-based solution

Grid technology provides broad access to massive Grid technology provides broad access to massive information and computational resources.information and computational resources.

In this context, Grid computing paradigmIn this context, Grid computing paradigm overcomes the lack of computational resources to process a large overcomes the lack of computational resources to process a large

amount of event information.amount of event information. allows processing of the log files taking advantage of the allows processing of the log files taking advantage of the

parallelism inherent in the distributed nature of Grid. parallelism inherent in the distributed nature of Grid. provides load balance in the processing of log files of different provides load balance in the processing of log files of different

granularity.granularity. Master-Worker paradigm using Planetlab platform, a Grid-Master-Worker paradigm using Planetlab platform, a Grid-

based approach for processing log files.based approach for processing log files.

Page 13: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1313

Solution (III) Solution (III) Master-Worker paradigmMaster-Worker paradigm

Distinguishes two types of processors:Distinguishes two types of processors: master:master: performs the control and coordination tasks. performs the control and coordination tasks. workers:workers: perform most of the computational work. perform most of the computational work.

Advantages:Advantages: flexibility: workers can be implemented in different ways.flexibility: workers can be implemented in different ways. scalability: workers can be easily added.scalability: workers can be easily added. separation of concerns: master does coordination and separation of concerns: master does coordination and

workers do specific tasks. workers do specific tasks. Target: parallel applications with weak Target: parallel applications with weak

synchronization and reasonably large grain size.synchronization and reasonably large grain size.

Page 14: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1414

Solution (IV) Solution (IV) ArchitectureArchitecture

The architecture of an application for processing log files.

Page 15: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1515

Solution (V)Solution (V) Implementation (I)Implementation (I)

The workers receive and do the following task (The workers receive and do the following task (MWTaskMWTask) :) :address of the location of the log file; address of the location of the log file; name of the log file;name of the log file;size of the log file;size of the log file;address of the location where the processing routine is found.address of the location where the processing routine is found.url of the database where the processed informationurl of the database where the processed informationwill be stored;will be stored;

The master processor (The master processor (MWDriverMWDriver) is programmed as follows:) is programmed as follows:whilewhile (true) (true) dodocheckcheck for new log files generated from the Collaborative Learning for new log files generated from the Collaborative Learning

Application Server; Application Server; updateupdate the list of the <log file description> for the new incoming the list of the <log file description> for the new incoming

log files;log files;for each for each newnew log file log file generate generate a task;a task;submit submit the newly generated;the newly generated;

Page 16: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1616

Solution (VI)Solution (VI) Implementation (II)Implementation (II)

The worker processor (The worker processor (MWWorkerMWWorker) is programmed as follows:) is programmed as follows:receive receive the task; the task; receive receive the specified log file from the specified location in the task the specified log file from the specified location in the task

description;description;runrun the processing routine on the log file; the processing routine on the log file;send send the master the task’s report (execution time,…) on completion;the master the task’s report (execution time,…) on completion;send send the database the processing results;the database the processing results;

Efficiency issues:Efficiency issues: weak synchronization between master and worker ensures the weak synchronization between master and worker ensures the

application runs without loss of performance. application runs without loss of performance. log files with different granularity allow an efficient load balance log files with different granularity allow an efficient load balance

among workers and minimizes data transmission.among workers and minimizes data transmission. number of workers can be adapted dynamically when a new number of workers can be adapted dynamically when a new

resource appears. resource appears.

Page 17: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1717

A Grid prototype (I)A Grid prototype (I)An application for processing log files An application for processing log files

EventExtractor EventExtractor : an : an ad hoc ad hoc application for extracting event application for extracting event information from BSCWinformation from BSCW converts event information into well-formatted data.converts event information into well-formatted data. stores the extraction results in a database.stores the extraction results in a database. needs a lot of time to process sequentially.needs a lot of time to process sequentially.

MW model: appropriate in this context given thatMW model: appropriate in this context given that log files of different granularity are processed.log files of different granularity are processed. workers are not synchronized between them.workers are not synchronized between them. communication load between master and workers are low.communication load between master and workers are low.

Planetlab platform: using a real Grid environmentPlanetlab platform: using a real Grid environment by installing the Globus Toolkit 3 Grid service container,by installing the Globus Toolkit 3 Grid service container, and deploying the prototype on Planetlab. and deploying the prototype on Planetlab.

Page 18: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1818

A Grid prototype (II)A Grid prototype (II)Master-Worker algorithm (I): overviewMaster-Worker algorithm (I): overview

A minimal Grid implementation made up of:A minimal Grid implementation made up of: the the workerworker as a as a Grid service that does the main work by the next steps:Grid service that does the main work by the next steps:

• wraps the wraps the EventExtractorEventExtractor routine, routine,• publishes an interface that the master calls in order to dispatch a task,publishes an interface that the master calls in order to dispatch a task,• passes a string representation of the events to be processed, andpasses a string representation of the events to be processed, and• returns a data structure containing performance information.returns a data structure containing performance information.

After completion the task, the worker is put back into a queue of idle workersAfter completion the task, the worker is put back into a queue of idle workers the the mastermaster first obtains the event log file to be processed, the available first obtains the event log file to be processed, the available

workers, the task size to be dispatched to workers and the number of workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop:workers to use that put in an idle queue. Then enters the next loop:

• reads a specific number of events from a event log file,reads a specific number of events from a event log file,• calls an idle worker and sends it the events to be processed,calls an idle worker and sends it the events to be processed,The master exits the loop when all events in the current log file haveThe master exits the loop when all events in the current log file havebeen read and all tasks to be dispatched have been finalized.been read and all tasks to be dispatched have been finalized.

Page 19: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 1919

A Grid prototype (III)A Grid prototype (III)Master-Worker algorithm (II): the MasterMaster-Worker algorithm (II): the Master

The Master implements the The Master implements the EventExtractorMasterEventExtractorMaster interface with interface with a single operation to call the a single operation to call the worker’s worker’s processEvents processEvents operationoperation returns performance statistics returns performance statistics

about the execution.about the execution. The The EventExtractorMasterImpEventExtractorMasterImp class class

aggregates an instance of aggregates an instance of EventExtractorMasterDispatcherEventExtractorMasterDispatcher to to dispatch all tasks to available dispatch all tasks to available workers.workers.

Page 20: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2020

A Grid prototype (IV)A Grid prototype (IV)Master-Worker algorithm (III): the TaskMaster-Worker algorithm (III): the Task

private void _dispatchEventsToWorker(String events, long nEvents,double workerDBInsertTime, EventExtractorMasterStatsBean masterStats)

throws Exception {EventExtractorWorker worker = null;worker = m_queue.getNextWorker();this.beforeDispatch(worker);EventExtractorWorkerStatsBean workerStats = worker.processEvents(events.toString(), workerDBInsertTime);this.afterDispatch(worker);this.decrementPendingDispatchs();

}

This operation synchronously sends a sequence of events (single task) to anThis operation synchronously sends a sequence of events (single task) to an available worker.available worker.

Page 21: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2121

A Grid prototype (V)A Grid prototype (V)Master-Worker algorithm (IV): the TaskMaster-Worker algorithm (IV): the Task

Two strategies to dispatch tasks to workersTwo strategies to dispatch tasks to workers by blocking up to the queue of idle workers is empty.by blocking up to the queue of idle workers is empty. by implementing the queue of idle workers with the by implementing the queue of idle workers with the round-robin round-robin scheme. scheme.

Page 22: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2222

A Grid prototype (VI)A Grid prototype (VI)Master-Worker algorithm (V): the WorkerMaster-Worker algorithm (V): the Worker

The worker grid service implements The worker grid service implements the the EventExtractorWorkerEventExtractorWorker interface interface which has only a single operation: which has only a single operation: processEvents(String events, double dbInsertTimeInMs);

The implementation parses the The implementation parses the events passed in order to extract the events passed in order to extract the required informationrequired information

processEvents returns a data structure returns a data structure with performance information about the with performance information about the task executed (elapsed time, number of task executed (elapsed time, number of events and bytes processed).events and bytes processed).

Page 23: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2323

A Grid prototype (VII) A Grid prototype (VII) Test batteryTest battery

An An ad hocad hoc test battery was designed made up of: test battery was designed made up of: exhaustive collection of log filesexhaustive collection of log files

• from the spring term of a course with 140 students arranged in 5-from the spring term of a course with 140 students arranged in 5-member groups and 2 tutors.member groups and 2 tutors.

a selected sample of a few log filesa selected sample of a few log files• as a representative stratum of file size and event complexity.as a representative stratum of file size and event complexity.

All test battery was processed by the All test battery was processed by the EventExtractor EventExtractor on single-processor nodes of Planetlabon single-processor nodes of Planetlab involving usual configurations.involving usual configurations. with different work load.with different work load. repeating the execution several times.repeating the execution several times.

Page 24: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2424

Experimental results (I) Experimental results (I) Sequential approachSequential approach

File size (KB) Number of events Processing time(sec) Processing time

0

500

1000

1500

0 5000000 10000000 15000000

File size (bytes)

Tim

e (s

ec)

Comparison scale for 8 representative log filesComparison scale for 8 representative log files Results of over 100 log filesResults of over 100 log files processedprocessed

Sequential processing shows that the processing time is linear Sequential processing shows that the processing time is linear on the log file size processed.on the log file size processed.

Page 25: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2525

Experimental results (II)Experimental results (II) Parallel approach (I)Parallel approach (I)

The parallel processing results were obtained by The parallel processing results were obtained by running tests for different task sizes and number of workersrunning tests for different task sizes and number of workers observing efficiency and speed-up for each set of workers observing efficiency and speed-up for each set of workers

Observed speed-up and efficiency for 5-event task and different number of workersObserved speed-up and efficiency for 5-event task and different number of workers

Task Size = 5 events

0

2

4

6

8

2 4 8 16

Number of Workers

Spee

d-up

Observed Speed-up Observed Efficiency

Page 26: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2626

Experimental results (III) Experimental results (III) Parallel approach (II)Parallel approach (II)

2 Workers

00,20,40,60,8

11,21,41,61,8

2

1 5 10 25 50 100 250

Task Size (number of events)

Spee

d-up

4 Workers

0

0,5

1

1,5

2

2,5

3

3,5

1 5 10 25 50 100 250

Task Size (number of events)

Spee

d-up

8 Workers

0

1

2

3

4

5

6

1 5 10 25 125

Task Size (number of events)

Spee

d-up

Reasonable speed up is achieved in every test Reasonable speed up is achieved in every test however, parallel efficiency tends to decrease with the however, parallel efficiency tends to decrease with the

number of workers.number of workers.

Observed speed-up with increasing number of workersObserved speed-up with increasing number of workers

16 Workers

012345678

1 5 25 63

Task Size (number of events)

Spee

d-up

Page 27: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2727

Experimental results (IV)Experimental results (IV)Analysis of the resultsAnalysis of the results

Apart from very small task sizes, the speed up observed showed the feasibility of the parallelization.Apart from very small task sizes, the speed up observed showed the feasibility of the parallelization. small task sizes were affected by the transmission time.small task sizes were affected by the transmission time.

The more workers used in our tests the further to the maximum was the speed up achievedThe more workers used in our tests the further to the maximum was the speed up achieved trade off between number of workers and task size.trade off between number of workers and task size.

Results were a little biased due to the homogeneous behaviour observed in Planetlab Results were a little biased due to the homogeneous behaviour observed in Planetlab they should be adjusted to the dynamic workload of a real Grid.they should be adjusted to the dynamic workload of a real Grid.

Results are dependent on the low complesity of the BSCW’s lof filesResults are dependent on the low complesity of the BSCW’s lof files event complexity is the key to take advantage of the Grid.event complexity is the key to take advantage of the Grid.

Page 28: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2828

Conclusions and future workConclusions and future work Efficient embedding of information and knowledge into group Efficient embedding of information and knowledge into group

activity is a crucial factor for the success of the online activity is a crucial factor for the success of the online collaborative learning activity.collaborative learning activity.

Strong need for computational resources to process large Strong need for computational resources to process large amounts of group activity log data.amounts of group activity log data.

Grid-aware application based on the Master-Worker paradigm Grid-aware application based on the Master-Worker paradigm for processing log files of group activity in an efficient yet for processing log files of group activity in an efficient yet simple manner.simple manner.

According to the results, the benefits of Grid enhances According to the results, the benefits of Grid enhances depending on the volume and complexity of event log files to depending on the volume and complexity of event log files to be processed.be processed.

We plan to improve our prototype in terms of communication We plan to improve our prototype in terms of communication master-workers, fault-tolerance and dynamic discovery of idle master-workers, fault-tolerance and dynamic discovery of idle workers.workers.

Page 29: by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and  Thanasis Daradoumis

GADA'05GADA'05 2929

Thank you !Thank you !

Questions?Questions?