the journal of macrotrends in technology and...

8
Pijitra Jomsri, JMTI Vol 4 Issue 1 2016 93 MACROJOURNALS The Journal of MacroTrends in Technology and Innovation Enhancement of Indexing for Social knowledge- Sharing Community Pijitra Jomsri Department of Information Technology, Suan Sunandha Rajabhat University, Bangkok, Thailand Abstract Internet technology provides an efficient way to store and share information. Search engines and social bookmarking systems are important tools for web assets discovery. This research investigated two different indexing approaches applied to Diigo – a social bookmarking system for knowledge-sharing. The indexing approaches here are known as: Tag only and Tag with Title. Two indexing approaches were evaluated using mean values of Normalized Discount Cumulative Gain (NDCG). The results suggested that indexing using “Tag, Title” performed the best. The initial evaluation on this research implementation implied that these designs might improve the accuracy and efficiency of web resource searching on social bookmarking system which can applies technique in other domains. Keywords: social bookmarking; indexing; knowledge-sharing community 1. Introduction Nowadays numbers of people using the internet to exchange information are increasing. Thus, a search engine is one important tool that supports users to search for documents on the internet. A social bookmarking system is also an important tool that allows people to share interesting web resources. It not only provides web resource sharing functions but also allows people to create a set of tags attached with the web resource. Diigo (https://www.diigo.com) is social bookmarking which a multi-tool for personal knowledge management dramatically improve your workflow and productivity easy and intuitive, yet versatile and powerful. The name Diigo is the abbreviation of' Digest of Internet Information, Groups and Other stuff. Use Diigo provides social annotation service. One can highlight text passages and add notes on any web page that one is reading at any time. Web page, one can not only read public comments published by other, but can also carry out discussion and interaction with others. Diigo can not only be a powerful personal tools and social sharing platform for

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

93

MACROJOURNALS

The Journal of MacroTrends in Technology and Innovation

Enhancement of Indexing for Social knowledge-Sharing Community Pijitra Jomsri Department of Information Technology, Suan Sunandha Rajabhat University, Bangkok, Thailand

Abstract Internet technology provides an efficient way to store and share information. Search engines and social bookmarking systems are important tools for web assets discovery. This research investigated two different indexing approaches applied to Diigo – a social bookmarking system for knowledge-sharing. The indexing approaches here are known as: Tag only and Tag with Title. Two indexing approaches were evaluated using mean values of Normalized Discount Cumulative Gain (NDCG). The results suggested that indexing using “Tag, Title” performed the best. The initial evaluation on this research implementation implied that these designs might improve the accuracy and efficiency of web resource searching on social bookmarking system which can applies technique in other domains.

Keywords: social bookmarking; indexing; knowledge-sharing community

1. Introduction

Nowadays numbers of people using the internet to exchange information are increasing. Thus, a search engine is one important tool that supports users to search for documents on the internet. A social bookmarking system is also an important tool that allows people to share interesting web resources. It not only provides web resource sharing functions but also allows people to create a set of tags attached with the web resource.

Diigo (https://www.diigo.com) is social bookmarking which a multi-tool for personal knowledge management dramatically improve your workflow and productivity easy and intuitive, yet versatile and powerful. The name Diigo is the abbreviation of' Digest of Internet Information, Groups and Other stuff. Use Diigo provides social annotation service. One can highlight text passages and add notes on any web page that one is reading at any time. Web page, one can not only read public comments published by other, but can also carry out discussion and interaction with others. Diigo can not only be a powerful personal tools and social sharing platform for

Page 2: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

94

knowledge worker, along with its development, the whole Web can be a writable, participatory and interactive media.

While the primary goal of these applications is to serve the needs of individual users, the tags of each web resource, links to knowledge-sharing community for each particular case, should also help other users to categorize, browse, and find items. The tags can also be used for information discovery, sharing, and community ranking. The tags can be useful for tasks such as search, navigation or information extraction. Therefore, it is interesting to investigate how well a set of tags for the link to knowledge-sharing community on Diigo contribute to search results.

In this research, the social tagging were investigated to improve knowledge-sharing indexing and proposed indexing method using tagging information together with a title of knowledge (TT). Researcher refer to it as a “Tag with Title” indexing method. To evaluate the proposed indexing method, it was compared with tagging information only indexing method or “Tag Only” indexing method (T).

The paper is structured as follows. First, we discuss related work in Section 2. We then describe Framework for social tagging based knowledge-sharing searching in Section 3. The Section 4-5 is Result and Discussion. Finally, Section 6 contains the Conclusion and Future work.

2. Related Work

Researchers who studied Diigo include: Zhou Peng (2010) analyzed the functions and features of Diigo and collaborative learning, and the author design a collaborative learning model under the Diigo environment (Peng and et al.,2010). Some researcher exploring whether the web 2.0-based note-sharing cooperative learning method worked more effectively than classroom cooperative learning with Student’s Team Achievement Division (STAD) method and the traditional lectures in teaching Chinese rhetoric comprehension by using Diigo and Google Doc as tools (Chen and et al.,2011). Khoii investigated the impact of learning with Schoology (the LMS selected for this study) on learners’ autonomy and use of reading strategies while incorporating Diigo, a social bookmarking website (Khoii and et al.)

Researchers who studied and improved social tagging: Suchanek found that tags are “meaningful” and the tagging process is influenced by tag suggestions (Suchanek and et al. 2008) while Thom-Santelli explored the use of tags for communication in these systems in social tagging (Thom-Santelli and et al. 2008). Gelernter compares the information retrieval value of the cloud format tags and the tag words themselves as found in the LibraryThing catalog. Results also show that, whether searchers are working toward research or personal ends, high recall matters (Gelernter, J. (2007). .A. Budura present HAMLET to promote an efficient and precise reuse of shared metadata in highly dynamic where tags are scarce (Budura and et al. 2008). J. Gelernter (2009) offers a method of evaluating user tag preference and the relative strength of social tag vs. LCSH string retrieval performance. Choochaiwattana examined the use of social annotations to improve the quality of web searches. LI (2008) use the self-organizing characteristics of SOM neural networks to classify the popular tags in "Del.icio.us" website. Jomsri (2009, 2015) investigated three different indexing approaches applied to CiteULike. The preliminary results illustrated that indexing using “Tag, Title, with Abstract” performed the best.

Page 3: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

95

3. Framework for Social Tagging Based knowledge-sharing Searching

In this section, the experimental design and evaluation method were discussed. The experiment was divided into five steps follow to Fig.1.

Fig. 1. Framework for Social Tagging Based knowledge-sharing Searching.

A. Research Methods

1) Crawler: A knowledge-sharing crawler is a small computer program that browses directly to the knowledge-sharing sharing systems of the WWW in a predetermined manner. The knowledge-sharing crawler is responsible for gathering knowledge-sharing information such as author, tags used, etc. This useful information helps the system to determine a user's interests and also helps the system to create index for each knowledge-sharing. Java programming is used to implement a crawler on this framework.

2) Knowledge corpus: the corpus is a collection of knowledge-sharing extracted from the knowledge sharing system.

Knowledge-sharing data were crawled from Diigo between January and September 2015. The final set consisted of 32,450 records related to computer science.

3) Indexer: TF-IDF (term frequency–inverse document frequency) will be used for creating indices. TF-IDF is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.

In this experiment, three different indexers were developed. The equation (1), and (2) show a modified Term Frequency/Inverse Document Frequency (tf/idf) formula for the different indexers, where T is “Tag only”, TT is “Tag with Title”:

dtd

T

n

ntfidf

ik jk

ji

ji

:

log,

,

, (1) (1)

Page 4: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

96

dtd

TT

n

ntfidf

ik jk

ji

ji

:

log,

,

, (2)

Let ni,j be the number of occurrences of the considered term in document dj , | T | is total number of “Tag Only” documents in the corpus, | TT | is total number of “Tag and Title” documents in the corpus, dtd i : is number of documents where the term ti appears (that is

0ijn ). If the term is not in the corpus, this will lead to a division-by-zero. It is therefore

common to use dtd i :1 .

4) Search Function: Cosine similarity is a similarity measurement between two vectors of n dimensions. The concept is finding the cosine of the angle between two vectors. This measurement is often used to compare documents in text mining. Given two vectors of attributes, A and B, the cosine similarity, θ, is calculated by the attributes dot product divided by the magnitude as Equation (3).

BA

BAsimilarity

.)cos(

5) Ranking: The score of similarity measurement can be used for ranking mechanism.

Knowledge-sharing Searching

Two search engines based on the two indexers were developed. Subjects can see: titleID of the document, title name that can link for link obtaining data from Diigo.

4. Experimental Setting

Thirty subjects who were lecturers and students from Suan Sunandha Rajabhat University were asked to be participants. In the experiments, each subject was assigned to find knowledge using our search engines. Each subject was given two questions. They formulated their own queries according to the given questions. They were asked to use same query for each search engine. Then, they were asked to rate the relevancy of the search result set on a five-point scale: Score 0 is not relevant at all, Score 1 is probably not relevant, Score 2 is less relevant, Score 3 is probably relevant, Score 4 is extremely relevant.

The top 20 search results of each search engine were displayed for relevancy judgment and relevancy ratings for each query are considered to be perfect.

The evaluation Metric use NDCG (Normalized Discounted Cumulative Gain) as originally proposed by Jarvelin and Kekalainen (aschke and et al. 2007).

(2)

(3)

Page 5: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

97

5. Experimental Result

This section separate in to two parts: first is results from the experiment and the second is the discussion.

Results

The results of the average NDCG score of the first 20 rank of T is “Tag only” indexing method, TT is the “Tag with Title” indexing method are shown in Fig.2.

The x-axis represents the first 20 documents of the search results, whereas the y-axis denotes the NDCG score.The result from this figure suggests that “Tag with Title” indexing method seems to outperform other ranking methods.

Fig.2 Comparison of the average NDCG for two indexing methods.

Furthermore, a paired-sample T test is employed for top 20 ranks. Assume that the sample comes from populations that are approximately normal with equal variances. Level of

significance is set to 0.05 (=0.05).

The pair differences were used to find the differences among the three rankings method. The results from Table I indicate that a set of mean difference search results provided by the TT is the “Tag with Title” indexing method at k=1-20. The TT is the “Tag with Title” indexing is statistically difference from the set of search results provided by the Tag only approach.

RESULT OF DECISION TREE

Rank

Indexing Mean Differe

nce

Std. Erro

r

Sig. (2-

tailed) (K) (I) (J) (I-J)

1-20 TT-

indexer

T-index

er 0.0050

0.0078

0.04

Discussion

There are some indications that results from the proposed heuristic ranking method “Tag with Title” can improve knowledge searching on social bookmarking. This might be because the

Page 6: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

98

method utilizes the information of user behavior. The result can implied that T indexer for this particular study is still important. Finally, the chosen experimental factor can help the system to adjust the ranking and improve search results of knowledge searching.

6. Conclusion and Future Works

This preliminary study focuses on the comparison of a heuristic search engine. Here, the

heuristic indexer implemented is using “Tag with Title”. Thirty subjected are assigned to investigate the system obtained from the search engines. Each subject specified three different queries. Each query is applied with these two search engines. The first 20 documents for each search engine for relevancy are displayed. Finally, the subjects were asked to rate the relevancy of the search results on a five-point scale.

The results show that TT indexer returns a higher NDCG score. This implies that TT has a better. To further analyze the results, a paired-sample T-test is utilized. However, the number of subjects is considered to be small in the experiment. In order to confirm the finding, more subjects may be needed in the experiments.

In addition, the experiment should be extended to different search domains. Improving indexing not only enhances the performance of academic knowledge searches, but also all document searches in general. Future research in the area consists of extending the scale of experiments, developing ranking, as well as optimizing the parameters.

ACKNOWLEDGMENT

The authors would like to thank Suan Sunandha Rajabhat University for scholarship support.

REFERENCES

Peng, Z., Mei, L., Yuhua, N., and Yi, Z. (2010). The Application of the Diigo-based Collaborative Learning

Model in the Course"Fundamentals of Computers. International Coriference on Educational and Information Technology (ICEIT 2010). 446-449.

Chen, C., Wang, C. ,Shih, J. (2011). The Effects of Employing Web 2.0-based Notesharing Strategy in Teaching Chinese Rhetoric for Elementary School Students. Electrical and Control Engineering (ICECE). International Conference ,6902 – 6906.

Khoii, R., Ahmadi, N., Gharib, M. The Effects of Integrating Diigo Social Bookmarking into Schoology Learning Management System on EFL Learners’ Autonomy and Use of Reading Strategies. International conference ICT for language learning.

Suchanek, F. M., Vojnovi´c, M., and Gunawardena, D. (2008). Social Tags: Meaning and Suggestions. CIKM’08, Napa Valley, California, USA. 26–30 October 2008.

Thom-Santelli, J. and Muller, M. J., Millen, David R. (2008). Social Tagging Roles: Publishers, Evangelists, Leaders. CHI 2008, Florence, Italy, 5-10 April 2008.

Gelernter, J. (2007). A Quantitative Analysis of Collaborative Tags: Evaluation for Information Retrieval—a Preliminary Study. International Conference on Collaborative Computing: Networking, Applications and Worksharing.12-15 Nov. 2007, New York, NY. 376-381

Page 7: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied

Pijitra Jomsri, JMTI Vol 4 Issue 1 2016

99

Budura, T., Michel, S., Cudre-Mauroux, P., and Aberer, K. (2008). To Tag or Not to tag-Harvesting Adjacent Metadata in Large-Scale Tagging Systems. SIGIS’08, Singapore, 20-24 July 2008.

Choochaiwattana, W. ,and Spring, M.B. (2009). Applying Social Annotations to Retrieve and Re-rank Web Resources. Proceedings of 2009 International Conference on Information Management and Engineering (ICIME 2009), Kuala Lumpur, Malaysia 3 – 5 April 2009.

LI, B. ,and Zhu, Q. (2008). The Determination of Semantic Dimension in Social Tagging System Based on SOM Model. Second International Symposium on Intelligent Information Technology Application 2008(IITA’08), 20-22 Dec. 2008, Shanghai, 909-913.

Jomsri, P., Sanguansintukul, S. , Choochaiwattana, W. (2009). A Comparison of Search Engine Using “Tag Title and Abstract” with CiteULike – An Initial Evaluation. the 4th IEEE Int. Conf. for Internet Technology and Secured Transactions (ICITST-2009),United Kingdom,2009.

aschke, J¨, Marinho, L. B., Hotho, A., Schmidt-Thieme, L., and Stumme, G. (2007). Tag Recommendations in Folksonomies”, In Proceedings of PKDD 2007, volume 4702 of Lecture Notes in Computer Science, Springer Verlag, pp. 506–514.

Jomsri, P., Prangchumpol, D. (2015). A hybrid model ranking search result for research paper searching on social bookmarking”, 1st International Conference on Industrial Networks and Intelligent Systems (INISCom),IEEE, pp. 38-43

Page 8: The Journal of MacroTrends in Technology and …macrojournals.com/yahoo_site_admin/assets/docs/10TI41Jo...Diigo, a social bookmarking website (Khoii and et al.) Researchers who studied