slr 150613

Upload: zzatiee

Post on 06-Jul-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 SLR 150613

    1/17

    A Systematic Literature Review for Topic Detection in Cyber-crime

    Investigation.

    Abstract

     The most popular social networking sites for chatting are Facebook, Twitter,

     Yahoo Messenger, and Skype. Normally, users use the chat conversation for the

    purpose of communicating to another person besides echanging ideas and

    discussion. !ecently, there are many cybercrime criminal committed crimes

    using chat conversation such as money fraud, se harassment, cyber bully, and

    murder. Therefore, this S"! articles are carried out to investigate the eisting

    research in chat forensics area. S"! is a form of evidence based approach

    applied for a systematic review. This S"! studies had been carried out in thisarticle to investigate the topic detection studies in chat forensics area. The

    method used is based on S"! which had a few guidelines which include de#ning

    the research $uestion, search process, %nclusion and eclusion criteria in the

    study, $uality assessment, data collection, data analysis, and deviations from

    protocol. There are &' publications found and only ( publications are selected for

    the review. )owever, only * publications speci#cally done for chat forensics area

    while other ' publication are done for topic classi#cation in chat message. Thus,

    the number of study for topic detection in chat forensics are considered limited.

    Keyword Systematic literature review, topic detection, chat forensics, cyber+crime investigation, chat message.

    ! Introduction.

    nline social networking is a new innovative technology of communication that

    has been used ehaustively by people in all around the world. -opular social

    networking applications used by users are Facebook, Friendster, and Yahoo )o

    et al., /00(1. !ecently, online chatting is one of online social networking services.

     This service facilitates the user to communicate with other users. nline chats

    also known as %nstance Messaging %M1 and we can de#ne this service as a form

    of computer+mediated communication that occurs in real time and re$uires the

    simultaneous participation of users rebaugh and 2llnutt, /0301. This means

    that the users get the feedback or respond directly without having to wait as

    long as the other persons are still connected with the chat service. The users

    may have connected one to one user or one too many users for a conversation.

    2t the beginning, the chat message was started by the 4N%5 command line

    application and continued with the traditional client+based messaging programs

    before the growth of web+based chat message programs, which have become

    popular these days 6iley  et al., /0071. Most people connected to this online

    social network to build social relationships with people, for eample family,

    friends, and even a new friend. Normally, users used the chat conversation forcommunicating with other people besides echanging ideas and a discussion.

  • 8/17/2019 SLR 150613

    2/17

     The users can echange the tet message, image, and documents by using the

    chat message. )owever, from the criminal point of view, online social networking

    is one of the methods to commit the crime. They easily hide their identity by

    using virtual identities, which mean the criminals may use fake information about

    themselves. Therefore, searching evidence for identifying the criminals and their

    activities becomes a di8cult process.

    2t present, there are 3*,07&,000 Malaysian have subscribed Facebook, which

    makes it number 37 in the world ranking of Facebook users by country

    Facebook, /03/1. Facebook has been adding chat functionality in their features,

    which released on 2pril /*, /007. Facebook chat currently support instant

    messaging clients such as Yahoo Messenger, Skype, 2" %nstant Messenger, and

    "ive Messenger. This is an attractive target for perpetrators to commit crimes.

    9lectronic or computing component is not re$uired to commit some crimes such

    as murder, drug tra8cking and kidnapping. Nevertheless, technology+based

    systems include chat messaging client can play a role in facilitating crimes and

    other common criminal activities. %nstant messaging clients provide ideal

    settings for gathering intelligence, and such information may enable criminals to

    eecute their crime, for instance by determining that someone is a :suitable;

    victim. nline chatting might also utiliected when collectingas much information as possible from a suspect and victim machines 6iley et al.

    (20071, Simon and Slay /03011.

     Therefore, the ob?ective of this article is to make systematic literature review

    S"!1 of eisting studies in the chat message area for forensic investigation. This

    review is done based on the systematic review by 6itchenham /00(1 and @ala et

    al (203*1. This systematic literature review conducted to list down the research

    area for the chat forensicAs investigation, and the techni$ues used for each

    research area. The net sections are the method used in S"! articles, the results

    from the S"!, the discussion from the derived research $uestion, and the

    conclusion.

  • 8/17/2019 SLR 150613

    3/17

    " #et$od.

     This article has been following the guidelines from an S"! article by 6itchenham

    /00(1 and @ala et al. /03*1. This section will show the steps taken to prepare

    this systematic literature review. The steps and guidelines mention by

    6ichenham /00(1 include by de#ning the research $uestion, search process,

    %nclusion and eclusion criteria in the study, $uality assessment, data collection,

    data analysis, and deviations from protocol.

    /.3 !esearch $uestion.

     The research $uestion is an important part in a systematic review since the

    $uestion used as the guideline for the entire process in the study. The research$uestions in this article are following the $uestion structure suggested by

    6itchenham /00B1 which includes the population, intervention, comparison, and

    outcome. The $uestion structure also known as -%C paradigm, which

    implemented in the article written by @ala et a l. (203*1. The de#nitions of each

    category are as followsD

    -opulation -1D The population is the application of area, for eample people,

    pro?ect type, and application types. The contet for this article focuses on the

    chat forensics.

    %ntervention %1D The intervention is the technologies for the software methods,tools, or the procedure for the selected area. %n this contet, the intervention

    used either digital forensicAs tool or stylometric.

    Comparison C1D Comparison used to compare the intervention with the

    procedure or methodology used in the articles. This article will compare the

    limitation of the method used in each eperiment.

    utcome 1D The outcome used to de#ne the e>ect of the technology towards

    each eperiment. The outcome for this articleAs contet is the best method or

    tools used throughout the chat forensicAs area.

     The research $uestions !E1 are addressed in this article as followsD

    !E3D )ow much research article related to chatting forensics were produced

    since /00&

    !E/D )ow much research article related to topic detection for chat forensics were

    produced since /00&

    !E*D Ghat are the techni$ues or methods used in the related study

    !EBD Ghat are the limitations in the study

    2s for the !E3, the $uestion derived from the #rst $uestion structure which is

  • 8/17/2019 SLR 150613

    4/17

    under the population category. The purpose of this $uestion is to analy

  • 8/17/2019 SLR 150613

    5/17

    Tab%e ! - C%ose%y re%ated &eywords

    Keywords C%ose%y re%ated &eywords

    Chat 2" %nstant Messenger, MSN

    Messenger, Yahoo Messenger, %!C

    channel, %nstant messenger, %M,

    Gindows "ive Messenger, -idgin

    Messenger, nline messages, Trillian,

    computer mediated communication,

    social networks, online messages,

    Hoogle talk, Skpe, tetual

    communication, unstructured tet.=igital forensics analysis 2uthorship analysis, stylometric,

    classi#cation techni$ue, contact

    identi#cation, topic identi#cation,

    threat detection.

    Stylometric Griting Style, write print.

     Techni$ue Model, framework.

    Most appropriate model )igh accuracy, compatible, applicable.

    2uthorship analysis 2uthor identi#cation, gender

    prediction, gender identi#cation, author

    attribution.

    =igital forensics Cybercrime investigation, cyber

    forensics.

     Then the closely related review article was selected manually from the search

    result in the digital database source.

  • 8/17/2019 SLR 150613

    6/17

    /.* %nclusion and eclusion criteria.

     The inclusion and eclusion criteria de#ned for specifying the selection of review

    articles later. The article was selected if the title and abstract of the article is

    related to chatting forensicAs study and topic detection on chat message since

    these are the focuses of this article. This criterion is de#ned as the inclusion

    criteria. The eclusion criterion in the contet of this article is de#ned as any

    article, which is not related to digital forensics will be ecluded during the

    selection process. )owever, the article still can be selected for review if the work

    applicable to digital forensicAs area even though the main area for the article is

    not for forensics.

    ' Resu%t.

     Three forms of results shown on this section, which include the summary of the

    search process followed by the result of $uality assessment and $uality factor.

    *.3 Search results.

    2fter thoroughly running the search process, &' published articles found from the

    digital library database. Then the articles are divided into the topic area

    discussed in the articles which include authorship analysis, topic detection, and

    message attribution. The published articles found for authorship analysis are /3

    articles, topic detection had 3/ articles, and message attribution had 3' articles.

    Ghile the results for other articles for chat forensics, which had di>erent topic

    area is combined into JotherK category, which had seven published articles

    found. Table / shows the summary from the search results. The summary shownthe result from the search process which displayed the number of publication

    found based from each digital library database and the number of publication

    found from each area of study.

    Tab%e " - Summary of Searc$ Resu%t

    Database

    name

    (o. of

    pub%icati

    on found

    Aut$ors$i

    p

    ana%ysis

    Topic

    detection

    #essage

    Attributio

    n

    )t$er

    I*** +p%ore 33 B B B 3

    Springer Lin&  3& 7 / B 3

    Science

    Direct

    3' & / 7 3

    AC# digita%

    %ibrary

    / / 0 0 0

    *mera%d 3 0 / 0 0

    ,i%ey 3 3 0 0 0

  • 8/17/2019 SLR 150613

    7/17

    oog%e

    Sc$o%ar

    & 3 / 0 B

    Tota% / "! !" !/ 0

    2lthough the number of articles found for topic detection is 3/, only nine articles

    are selected in for the review. The reason for ecluded three articles out is

    because one unselected article

  • 8/17/2019 SLR 150613

    8/17

    ' and 6ose 0 determination of chat

    conversations; topic in

     Turkish tet based chat

    mediums

    messenger log #les and

    m%!C.

    SL

    Miah etal.

    /033

    =etection of childeploiting chats from a

    mied chat dataset as a

    tet classi#cation task

    Chat+logs from -erverted ustice Foundation

    %ncorporated -F%1, and

    collection of anonymous

    chats from websites like

    httpDwww.fugly.com and

    httpDchatdump.com.

    S

    7

    Chen et

    al.

    /03

    /

    2 Topic =etection Method

    @ased on Semantic

    =ependency =istance and

    -"S2

    2 real world interactive tet

    set collected from a EE

    group named A"inu groupA

    EE chat1.

    S

    (

    M. 2.

    @asher

    and C. M.

    Fung

    /03

    *

    2naly

  • 8/17/2019 SLR 150613

    9/17

    + JNoK indicates that the $uestion is contrary or the author not addressed about

    the $uestion in the article. The score assigned for this answer is 0.

    + J-artiallyK indicates that the content of article may have implicit meaning or

    there is obscurity in the article. The score assigned for this answer is &.0.

     Table B shows the result of the $uality assessment for the reviewed articles. 2ll

    studies scored B and above with three studies scored B, one study scored B.&,

    and one study scored &. The result also shows the percentage of compliance,

    which scored 70O and above.

    Tab%e 2 - 3ua%ity assessment resu%t

    I

    D

    3A! 3A" 3A' 3A2 3A 3A/ Tota% 4ercentage

    of 

    Comp%iance567

    S

    3

    3 3 0.& 3 0.& 3 & 7*.**

    S

    /

    3 3 0.& 0.& 0.& 0 *.& &7.**

    S

    *

    3 3 3 0.& 0.& 3 & 7*.**

    S

    B

    3 0.& 3 3 3 3 &.& (3.'L

    S

    &

    3 3 0.& 0.& 0.& 3 B.& L&.00

    S

    '

    3 3 3 3 0.& 3 &.& (3.'L

    S

    L

    3 3 3 3 3 3 ' 300.00

    S

    7

    3 3 3 0.& 0.& 3 & 7*.**

    S

    (

    3 3 3 3 3 3 ' 300.00

    *.* Euality factors.

     This section had been following the articles by 6ichenham et al. /00(1. The

    relationship between the $uality score for the published articles and the years of

    the published articles are investigated, which shown on the Table &.

  • 8/17/2019 SLR 150613

    10/17

    Tab%e - Average 8ua%ity score for pub%is$ed artic%es by year ofpub%ication.

     1ears

    "99

    /

    "99

    0

    "99

    :

    "99

    ;

    "9!

    9

    "9!

    !

    "9!

    "

    "9!

    '

    (umber of Studies 3 0 / 3 / 3 3 3

    #ean 8ua%ity score & 0 B./& &.& & ' & '

    Standard deviation of

    8ua%ity score

    0 0 3.0' 0 0.L3 0 0 0

     The $uality score for most of the years was 0 because at least one article is

    published for topic detection in chat message each year.

    2 Discussion.

    B.3 )ow much research article related to chatting forensics were produced since

    /00&

     The results from Table / and Table ' shows that there are &' published articles

    found, which related to chatting forensicAs studies. 9ach year there are a number

    of publications published for the respective topic area. The topic areas includeauthorship analysis, topic detection, message attribution, threat detection,

    monitoring system, data forgery detection, and social network security.

     The term of authorship attribution can be de#ned as a process of eamining the

    characteristics of a document to #nd or validate the author of a document. The

    studies for authorship attribution can be divided into three categories Pheng,

    /00'Q rebaugh and 2llnutt, /030Q Nirkhi et al., /03/1D

    R 2uthorship identi#cationD identify the real author of a tet message by

    eamining other samples of tet by a particular author.

    R 2uthorship characteri

  • 8/17/2019 SLR 150613

    11/17

     Topic detection or also known as topic classi#cation is a process to trace the

    main topic discussed in a conversation. The comparison between these two

    areas is that the authorship attribution is more concerns in detecting and

    attributing the author while the topic detection is more focused on the content of 

    the conversation discussed in a chat message.

    Message attribution is a process for eamining the log of chat messages to #nd

    the artifacts left to use as the evidence, for eample time, user name, data

    echange, and %- address =ickson, /00' 31, /00' /11.

    Tab%e / - (umber of studies according to year of pub%ication.

     1ear

    "9

    9

    "9

    9/

    "9

    90

    "9

    9:

    "9

    9;

    "9

    !9

    "9

    !!

    "9

    !"

    "9

    !'

    Tot

    a%

    2uthorship analysis 0 B / * / / L 3 0 /3

     Topic detection 3 / 3 / 3 / 3 3 3 3/

    Message

    attribution

    0 B * 3 3 * / / 0 3'

    ther 3 3 0 0 / 3 / 0 0 L

    (umber of

    pub%ications

    " !! / / / : !" 2 ! /

     Table ' shows that the authorship analysis area had the highest number of 

    publication with /3 studies, followed by the message attribution area with 3'

    studies, and message attribution with 3/ studies, whereas the otherAs topic area

    only had seven published article. The result shows that most studies are done for

    criminal identi#cation purpose.

    B.3 )ow much research article related to topic detection for chat forensics were

    produced since /00&

    Section *.3 mentioned that there are 3/ publications found for topic area but

    only nine publications are used for systematic literature review. Table L shows

    the number of publications which speci#cally addressed for chat forensics and

    the number of publications, which generally focused on topic detection on chat.

  • 8/17/2019 SLR 150613

    12/17

    Tab%e 0 - Average 8ua%ity score for pub%is$ed artic%es based on c$atforensics purpose.

    4ub%is$ed for c$at

    forensics.

    4ub%is$ed genera%%y

    for te

  • 8/17/2019 SLR 150613

    13/17

    contetual features. Three basic approaches tried in the study which is n+grams,

    foul language, and TF+%=F features.

    =i>erent with other studies, Chen et al. /03/1 implement the statistical

    techni$ues for information retrieval, which integrates the semantic dependency

    distance S==1 and probabilistic latent semantic analysis -"S21 for topicdetection in Chinese chat.

    B.3 Ghat are the limitations of the study

    2fter thoroughly eamined the selected S"! articles, there are a few aspects,

    which considered as the limitation of topic detection in chat message for

    forensicAs investigation study.

    "anguage of chat dataD Current studies focused on 9nglish chat data =ong et

    al., 200'Q )ui et al., 2007Q Miah et al., 201/Q and M. 2. @asher and C. M. Fung,

    /03*1 while other languages were Turkish

  • 8/17/2019 SLR 150613

    14/17

    approach had been demonstrated with the best performance for tet

    classi#cation purpose )ui et al., /0071. There are a few limitations from the

    eisting studies were listed in section B.B.

    Appendi< > ?nse%ected studies

    (

    o.

    Aut$or 1ea

    r

    Tit%e Reason for

    re@ection

    3 5iong et

    al.

    /00

    &

    Geb+chat monitor system+research

    and implementation

    Monitoring

    system.

    /

  • 8/17/2019 SLR 150613

    15/17

    3* Chaski /00

    L

     The keyboard dilemma and

    authorship attribution

    2uthorship

    analysis

    3B 6ose et al. /00

    L

    Mining chat conversations for se

    identi#cation

    2uthorship

    analysis

    3& =ickson /00

    L

    2n eamination into trillian basic *.

    contact identi#cation

    Message

    attribution

    3' =ongen /00

    L

    Forensic artefacts left by pidgin

    messenger /.0

    Message

    attribution

    3L =ongen /00

    L

    Forensic artefacts left by windows live

    messenger 7.0

    Message

    attribution

    37 kolica et

    al.

    /00

    L

    4sing author topic to detect insider

    threats from email tra8c

     Topic detection

    on email

    3( 6ucukyilm

    a< et al.

    /00

    7

    Chat miningD predicting user and

    message attributes in computer+

    mediated communication

    2uthorship

    analysis

    /0 %$bal et al. /00

    7

    2 novel approach of mining write+

    prints for authorship attribution in e+

    mail forensics

    2uthorship

    analysis

    /3 6ose et al. /00

    7

    2 comparison of tetual data mining

    methods for se identi#cation in chat

    conversations

    2uthorship

    analysis

    // 6iley et al. /00

    7

    Forensics analysis of volatile instant

    messaging

    Message

    attribution

    /* Mar?uni et 

    al.

    /00

    (

    "eical criminal identi#cation for

    chatting corpus

    2uthorship

    analysis

    /B Cheng et

    al.

    /00

    (

    Hender identi#cation from e+mails 2uthorship

    analysis

    /& )o et al. /00

    (

    %dentifying google talk packets Message

    attribution

    /' Cheng et

    al.

    /00

    (

    Forensics tools for social network

    security solutions

    Social network

    security

    /L Silva et al. /00

    (

    irtual forensicsD social network

    security solutions

    Social network

    security

    /7 rebaugh

    and

    2llnutt

    /03

    0

    =ata mining instant messaging

    communications to perform author

    identi#cation for cybercrime

    investigations

    2uthorship

    analysis

    /( %$bal et al. /03 Mining writeprints from anonymous e+ 2uthorship

  • 8/17/2019 SLR 150613

    16/17

    0 mails for forensic investigation analysis

    *0 Yang et al. /03

    0

    Forensic analysis of popular chinese

    internet application

    Message

    attribution

    *3 )usain I

    Sridhar

    /03

    0

    %forensicsD forensic analysis of instant

    messaging on smart phones

    Message

    attribution

    */ Simon and

    Slay

    /03

    0

    !ecovery of skype application activity

    data from physical memory

    Message

    attribution

    ** 6ontostath

    is et al.

    /03

    0

     Tet mining and cybercrime Crime

    classi#cation

    *B %$bal et al. /03

    3

    2 uni#ed data mining solution for

    authorship analysis in anonymous

    tetual communications

    2uthorship

    analysis

    *& Cheng et

    al.

    /03

    3

    2uthor gender identi#cation from tet 2uthorship

    analysis

    *' 2li et al. /03

    3

    9valuation of authorship attribution

    software on a chat bot corpus

    2uthorship

    analysis

    *L )ariharan

    and

    !ani.6.!

    /03

    3

    Hender prediction in chat based

    medium;s using tet mining

    2uthorship

    analysis

    *7 -eersman

    et al.

    /03

    3

    -redicting age and gender in online

    social networks

    2uthorship

    analysis

    *( =ing et al. /03

    3

    4ser identi#cation for instant

    messages

    2uthorship

    analysis

    B0 -ateriya

    et al.

    /03

    3

    2uthor identi#cation of email forensic

    in service oriented architecture

    2uthorship

    analysis

    B3 Mutawa et 

    al.

    /03

    3

    Forensic artifacts of facebookAs instant

    messaging service

    Message

    attribution

    B/ Simon and

    Slay

    /03

    3

    !ecovery of pidgin chat

    communication artefacts from physical

    memory a pilot test to determine

    feasibility1

    Message

    attribution

    B* Nirkhi et

    al.

    /03

    /

    2nalysis of online messages for

    identity tracing in cybercrime

    investigation

    2uthorship

    analysis

    BB Mutawa et 

    al.

    /03

    /

    Forensic analysis of social networking

    applications on mobile devices

    Message

    attribution

    B& "evendosk /03 Yahoo Messenger forensics on Message

  • 8/17/2019 SLR 150613

    17/17

    i et al. / windows vista and windows L attribution

    B' 2l+Paidy /03

    /

    Forensic analysis of social networking

    applications on mobile devices

    =iscovering

    criminal

    network

    BL Teng and

    "in

    /03

    /

    Skype chat data forgery detection =ata forgery

    detection