![Page 1: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/1.jpg)
A Statistical Approach to Typosquatting Detection
DNS Ops Workshop, 4-5 June 2008
Alessandro [email protected]
and Oxford Brookes University
![Page 2: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/2.jpg)
2
Introduction
Typosquatting is the practice of registering a domain name which contains a typographical error if compared to the name of a trademark or a famous domain
• Growing phenomenon over the Internet
– Well-understood from a legal point of view
– Lack of a technical characterisation
• First attempt for
– Technical definition
– Statistical characterisation
![Page 3: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/3.jpg)
3
Typosquatting: gooogle.co.uk
![Page 4: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/4.jpg)
4
Syntactic and Confusing Similarity
gooogle.co.uk
googgle.co.uk
bgoogle.co.uk
google-news.co.uk
google-groups.co.uk
askgoogles.co.uk
Syntactically similar Confusingly similar
GOOGL3.co.ukGO0GLE.co.uk
Visually similar
![Page 5: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/5.jpg)
5
Syntactic Neighbourhood
Given a domain D, the syntactic neighbourhood of D set of all domains in the registry whose edit distance from D is equal to 1
D
Registry
Ndist = 1
![Page 6: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/6.jpg)
6
Syntactic Neighbourhood
Given a domain D, the syntactic neighbourhood of D set of all domains in the registry whose edit distance from D is equal to 1
• Edit distance
– Minimum number of operations needed to transform one string into the other
– An operation is an insertion, deletion, or substitution of a single character
D
Registry
Ndist = 1
![Page 7: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/7.jpg)
7
Syntactic Neighbourhood
Given a domain D, the syntactic neighbourhood of D set of all domains in the registry whose edit distance from D is equal to 1
ominet1mominet
onminet ominet
noominet
d=1 d=2
![Page 8: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/8.jpg)
8
Outline
• Correlation between popularity of a domain nameand size of its neighbourhood
• Presence of “typosquatters friendly” registrars in the neighbourhood of popular domains
anyYdomain
anyA-domain
Aany-domain
…
NEIGHBOURHOOD
amazon
bbc
nominet
yahoo
REST-OF-THE-REGISTRY
ebay
theregisteraol
ANY-DOMAIN (.co.uk)
![Page 9: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/9.jpg)
9
Outline
• Correlation between popularity of a domain nameand size of its neighbourhood
• Presence of “typosquatters friendly” registrars in the neighbourhood of popular domains
anyYdomain
anyA-domain
Aany-domain
…
NEIGHBOURHOOD
amazon
bbc
nominet
yahoo
REST-OF-THE-REGISTRY
ebay
theregisteraol
ANY-DOMAIN (.co.uk)
![Page 10: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/10.jpg)
10
Experimental Setting
• Choose a domain name X
• Compute the distance between X and all domains in the registry
• Compute the size of X’s neighbourhood
• Compute the average size of a neighbourhood for domains of each length
– E.g., bbc.co.uk and allianceandleicester.co.uk have different distributions
![Page 11: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/11.jpg)
11
Experimental Setting
• Only .co.uk web sites considered (March 2008)
– Length refers to the third-level label
• Set of random domains (expected behaviour)
– 1000 domains for each length (random sample)
• Set of top-1000 popular domains (source NetCraft.com)
– Band A: domains with ranking in [1,100]
– Band B: domains with ranking in [101,500]
– Band C: domains with ranking in [501,1000]
![Page 12: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/12.jpg)
12
Size of the neighbourhood for popular domains
0
50
100
150
3 4 5 6 7 8 9 10 11 12
Num. chars (3rd level domain)
Avg. Num. domains
Random
Band A
Band B
Band C
google, amazon
Neighbourhood and Popularity
![Page 13: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/13.jpg)
13
Outline
• Correlation between popularity of a domain nameand size of its neighbourhood
• Presence of “typosquatters friendly” registrars in the neighbourhood of popular domains
anyYdomain
anyA-domain
Aany-domain
…
NEIGHBOURHOOD
amazon
bbc
nominet
yahoo
REST-OF-THE-REGISTRY
ebay
theregisteraol
ANY-DOMAIN (.co.uk)
![Page 14: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/14.jpg)
14
Distribution of Registrars
• Fraction of domain names owned by each registrar
![Page 15: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/15.jpg)
15
Experimental Setting
• Consider only domains in Band A’s neighbourhood
– i.e., any domain at dist=1 from at least one domain in Band A
• Compute the number of domains owned by each of registrars (distribution)
• For each registrar, compute the percent increase wrt to the previous distribution
€
I%=FracDom(BandA) − FracDom(registry)
FracDom(regisry)⋅100
![Page 16: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/16.jpg)
16
Distribution of Registrars (Band A)
Presence of a registrar in the Band A neighbourhood
0%
2000%
4000%
6000%
8000%
10000%
Registrars
Increase percent
Registered domain names (total)
1
100
10000
Registrars
Num. domains
![Page 17: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/17.jpg)
17
Discussion
• Analysis (manual) of 25 registrars whose size is between 100 and 1000 domains
– Big registrars are complex to analyse (not present in this chart)
– Small registrars do not contribute to reliable statistics
Presence of a registrar in the Band A neighbourhood
0%
2000%
4000%
6000%
8000%
10000%
Registrars
Increase percent
Registered domain names (total)
1
100
10000
Registrars
Num. domains
![Page 18: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/18.jpg)
18
Discussion
• One of the big domain names owns the majority of its neighbourhood
• Interesting activity for 6 5 registrars
– A big fraction of their domains syntactically or confusingly similar to popular domain names
• Normal activity for 8 registrars (false positives)
• No relevant findings in the other cases
![Page 19: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/19.jpg)
19
Further Research Directions
• Insight in the typosquatting phenomenon
– Domain name neighbourhood
– First attempt toward statistical characterisation
• More questions than answers
– Name servers used by typosquatters
– Domain names containing common words
– Content of the website
– …
![Page 20: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/20.jpg)
20
Bibliography
A. Banerjee, D. Barman, M. Faloutsos, Laxmi Bhuyan. Cyber-Fraud is One Typo Away. INFOCOM, 2008
Y.Wang, D. Beck, J. Wang, C. Verbowski, B. Daniels. Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting. SRUTI (Usenix WS), 2006.
McAfee. What’s In A Name: The State of Typo-Squatting 2007. http://us.mcafee. com/root/identitytheft.asp?id=safe_typo (valid as in Jan. 2008).
WIPO. DNS Developments Feed Growing Cybersquatting Concerns. http://www. wipo.int/pressroom/en/articles/2008/article_0015.html (valid as in May 2008).
![Page 21: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/21.jpg)
Thank you!!!
![Page 22: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/22.jpg)
Questions?
![Page 23: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/23.jpg)
Backup Slides
![Page 24: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/24.jpg)
24
Length of a domain nameDistrib. of lengths of domain names
0
100
200
300
400
500
600
0 10 20 30 40 50 60
Num. chars
Num domain names (thousands)
0
200
400
40 50 60
• co.uk domains only
• Length always refers to the third level domain
![Page 25: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/25.jpg)
25
Length of a domain name
• 3- and 4- chars domains not meaningful
• Neighbourhood of 5-chars domains is in the 4-chars space
Fraction of namespace registered
0.01%
0.10%
1.00%
10.00%
100.00%
2 3 4 5 6 7
Num. chars (third level domain)
Domains registered
Domains free
![Page 26: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/26.jpg)
26
Distance between domain names
• ~100 domains (for each length) compared against whole dataset
• Average number of domains at a given distance
Avg. number of domains at a given distance
0
200
400
600
800
1000
1200
0 10 20 30 40 50 60
distance
Number of domains (x1000)
3 chars5 chars8 chars10 chars
Statistical characterisation
![Page 27: A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari alessandro@nominet.org.uk and Oxford Brookes University](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649e615503460f94b5c0f9/html5/thumbnails/27.jpg)
27
Top-100 (band A) domain names
• ~10 domains (for each length) compared against whole dataset
• Average number of domains at a given distance
Top-100 domain partitioned on length
0
200
400
600
800
1000
1200
0 10 20 30 40 50 60
distance
Num. domain names (x1000)
3 chars
5 chars
8 chars
10 chars
Statistical characterisation