crime scene investigation: sms spam data analysis ilona murynets at&t security research center...

51
Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@ att.com Roger Piqueras Jover AT&T Security Research Center New York, NY [email protected] IMC’12, November 14–16, 2012, Boston, Massachusetts, USA.

Upload: vivian-webb

Post on 17-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Crime Scene Investigation: SMS Spam Data Analysis

Ilona Murynets AT&T Security Research

CenterNew York, NY

[email protected]

Roger Piqueras Jover AT&T Security Research

CenterNew York, NY

[email protected]

IMC’12, November 14–16, 2012, Boston, Massachusetts, USA.

Page 2: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Spam is the commonly adopted name to refer to unwanted messages that are massively sent to a large

number of recipients.e-mail spam• 90% of the daily e-mail via the Internet is spam• multiple solutions detect and block • a small amount of spam reaching inboxesSMS spam

?

Page 3: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

SMS-spam• connect aircards & cell to PC• yearly growth larger than 500%• effective anti-abuse messaging filters injected• content-based algorithms (for email) works

less efficientWhy???• acronyms/pruned spellings/emoticons• Shut down/swap SIM

Page 4: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

SMS-spam• consume network resources for legitimate

services otherwise.• user pays at a per received message basis• exposes smart phone users to viruses• fraudulent messaging activities such as

phishing, identity theft and fraudThis paper:• used for SMS spam detection engine

Page 5: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 6: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

three data sets: SMS cell M2M

• tier-1 cellular operator• Call Detail Records (CDR) of 9000 SMS spammer

& 17000 legitimate (cell & M2M)• Mobile Originated (MO):transmitting party• Mobile Terminated (MT):receiver• Spammers identified & disconnected from the

network.• SMS : prepaid cell : postpaid• M2M: TAC

Page 7: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

three data sets for analysis

Page 8: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 9: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

notes

• In all the figures throughout the paper, legitimate cellphone users, M2M systems and spammers (SMS) are represented in green, blue and red, respectively.

Page 10: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Account information

• spammers (99.64%) are using pre-paid accounts with unlimited messaging plans

• SIM cards are constantly switched to circumvent detection schemes

• discard it once an account is canceled and work with a new one

• average age is 7 to 11 days (legitimate user is several months to a couple years)

Page 11: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 12: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Messaging Abuse

Page 13: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Messaging Abuse

• Spammers generate a large load of messages• Spammers not only send but also receive

more than legitimate customers do– opt-out– trick

Page 14: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Messaging Abuse

Actual spam messages often attempt to trick the recipient into replying to the message.Despite a small percentage of users will reply, the large amount ofaccounts targeted in a spam campaign results in many responses.

Page 15: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Messaging Abuse

Page 16: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Messaging Abuse

• legitimate accounts have a small set of recipients. (7 on average)

• spammers hit a couple of thousand victims• legitimate users send multiple messages to a

small set of destinations• spammers send one message to each victim

Page 17: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 18: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Response ratio

Page 19: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Response ratio

• legitimate users, messages are sent in response to a previous message in a sequential way. the response ratio close to 1.

• For spammers the amount of MT SMSs is proportionally very small to the number of transmitted messages. the response ratio is close to 0

Page 20: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 21: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Message timing and time series

Page 22: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Message timing and time series

Page 23: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Message timing and time series

• Inter-SMS intervals for spammers are short less random -- low entropy

• intervals for legitimate messages are less frequently random--higher entropy.

• Messaging activities of certain M2M devices are prescheduled.

Page 24: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Message timing and time series

Page 25: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 26: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 27: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• California, • Sacramento and Orange • Los Angeles• New York/New Jersey/Long Island • Miami Beach• Illinois, Michigan• North Carolina and Texas.

Page 28: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 29: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• The legitimate recipients -- local area (i.e. the area around the subscriber’s home or areas where the subscriber works, used to live or where friends and relatives reside).

• The spam recipients distributed uniformly over the US population.

Page 30: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 31: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• Spammers are characterized by messaging a large number of area codes, always greater than those of cell-phone users and M2M.

Page 32: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 33: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• low entropy (legitimate cell) -- contacts repeatedly the same area codes.

• High entropy (SMS) -- sends messages to a more random set of area codes.

• Network enabled appliances (M2M) -- a predefined set of cell-phones, the entropy is the lowest.

Page 34: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 35: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• linear relation -- SMS spammers• Both M2M systems and cell-phone users

cluster around the bottom-left area of• the graph. • M2M send up to 20000 messages to 1 single

destination???

Page 36: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

Page 37: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Location & targets

• Cellphone users destinations-to-messages ratio and a small set of area codes.

• A great majority of spammers exhibit the opposite behavior.

• bottom-right corner (SMS) target very specific geographical regions. ratio of one destination/message. targeted area codes is limited

Page 38: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 39: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

mobility

Page 40: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

mobility

Page 41: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 42: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Hardware choice

• 1. USB Modem/Aircard A1• 2. Feature mobile-phone M1• 3. Feature mobile-phone M2• 4. USB Modem/Aircard A2• 5. USB Modem/Aircard A3

Page 43: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Outline

• three data sets for analysis • Data analysis– Account information– Messaging Abuse

• Response ratio• Message timing and time series

– The Scene of the Crime• Location & targets• Mobility

– Hardware choice– Voice and IP traffic

Page 44: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Voice call

Page 45: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Voice call

Page 46: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

IP traffic

Page 47: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Voice call

Page 48: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

IP traffic

Page 49: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

STOPPING THE CRIME

• An advanced SMS spam detection algorithm is proposed based on an ensemble of decision trees

• Over 40 specific features are extracted from messaging patterns and processed through a combination of decision trees

Page 50: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

CONCLUSIONS

• pre-paid accounts ---- 7 and 11 days.• large number of messages sent to a wide target(also

receive a large amount)• five different models of hardware• large number of phone calls, very short duration• main geographical sources in US: Sacramento, Los

Angeles-Orange County and Miami Beach• certain networked appliances• have messaging behavior close to that of a spammer.

Page 51: Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security

Thank you !