olav ten bosch msis, dublin, 14-16 april 2014 on the use of internet robots for official statistics
TRANSCRIPT
![Page 1: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/1.jpg)
Olav ten BoschMSIS, Dublin, 14-16 April 2014
On the use of internet robots for official statistics
![Page 2: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/2.jpg)
Overview
– Why internet as a data source (IAD)?– Internet robots, how do they work?– Applications:
‐ Airline tickets‐ Housing market‐ Clothing‐ “Robot assisted data collection”
– Conclusion
![Page 3: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/3.jpg)
Why IAD? (1)
Administrative sources– Tax, social security services– Municipalities/ Provinces– Supermarkets
Surveys
Internet sources
Less!!!
Faster, better,
more efficient
New indicators
![Page 4: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/4.jpg)
4
![Page 5: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/5.jpg)
Which content is original, reliable, stable,representative and accessible?
Internet sources
Why IAD? (2)
– Internet prices for CPI ?– Real estate sites for housing statistics ?– Internet vacancies for job statistics ?– Social media sentiment for consumer
confidence ?– Trade in second-hand goods as
economic indicators ? – Travel activity for tourism statistics ?
![Page 6: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/6.jpg)
Robots / crawlers / bots / spiders / scrapers: how do they work? (1)
Browser
Website
Internet Requests
code,images,
style,data,etc.
Graphicalmarkup
You
Commands
![Page 7: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/7.jpg)
Robots / crawlers / bots / spiders / scrapers: how do they work? (2)
Robot/ spider/ crawler
Website
Internet Requests
Navigation
code,images,
style,data,etc.
Data
You
![Page 8: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/8.jpg)
Robots / crawlers / bots / spiders / scrapers: how do they work? (3)
Robot/ spider/ crawler
Website
Internet Requests
Navigation
code,images,
style,data,etc.
Data
Monitoractively
Generic software for:- site navigation- product details- monitoring
DataData
DataData
Agil
e
![Page 9: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/9.jpg)
Airline tickets (1)Robot collection versus manual collection
0
50
100
150
200
250
11 Feb 03 Mar 23 Mar 12 Apr 02 May 22 May 11 Jun 01 Jul 21 Jul 10 Aug
Ticket price Amsterdam -Milano
Robot
Manual
![Page 10: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/10.jpg)
Airline tickets (2)Price of a ticket over time
-80%
-60%
-40%
-20%
0%
20%
40%
60%
-120 -90 -60 -30 0
Days before departure
Pric
e w
rt a
vera
ge
Barcelona
London
Milaan
Rome
![Page 11: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/11.jpg)
Housing Market (1)
![Page 12: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/12.jpg)
Housing market (2)Dynamics of the ‘database behind’ becomes visible
![Page 13: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/13.jpg)
Clothing (1):
![Page 14: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/14.jpg)
2 sites: very volatile data
Clothing (2):
Challenges:- from volatile data to stable statistics- how to classify multiple less structured
data sources
Seasonal pattern
![Page 15: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/15.jpg)
Robot-assisted data collection (1)
– Use case: few price observations on many sites– Example: price of a cinema ticket– “Robot tool” to automatically check if prices are changed
![Page 16: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/16.jpg)
Robot-assisted data collection (2)
16
![Page 17: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics](https://reader036.vdocuments.net/reader036/viewer/2022062511/5516a3f5550346a25b8b55a3/html5/thumbnails/17.jpg)
Conclusion
– Using internet as a datasource we can measure statistical phenomena in a completely different way
– It is powerful to combine fast internet data with reliable (but slower) administrative data
– We should redesign statistics with the possibilities of internet data in mind
Challenges:– Legal framework– The internet changes continuously: how to turn volatile data sources into reliable statistics?– We need advanced statistical methods, processes and IT