data wrangling
DESCRIPTION
Our data cleaning toolkitTRANSCRIPT
![Page 1: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/1.jpg)
Data wrangling Sometimes we have to do dirty jobs
Michele Mauri DensityDesign Research Lab
![Page 2: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/2.jpg)
Data often is messy and needs to be cleaned or at least converted
![Page 3: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/3.jpg)
![Page 4: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/4.jpg)
![Page 5: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/5.jpg)
![Page 6: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/6.jpg)
My data cleaning toolkit
![Page 7: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/7.jpg)
1. Textwrangler * ** http://www.barebones.com/products/textwrangler/
* (notepad++ for winduz) ** (actually, any advanced texteditor)
![Page 8: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/8.jpg)
1. Textwrangler
useful to: - remove text formatting - clean hidden characters
- replace separator charachters - structure data - apply regexp
![Page 9: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/9.jpg)
2. Open Refine http://openrefine.org/
![Page 10: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/10.jpg)
2. Open Refine
useful to: - convert formats - reconcile data - structure data
- enrich (link) data with freebase - apply GREL functions
![Page 11: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/11.jpg)
3. Data wrangler http://vis.stanford.edu/wrangler/
![Page 12: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/12.jpg)
3. Data Wrangler
useful to: - reformat data values
- correct erroneous or missing values - (re)structure dataset
![Page 13: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/13.jpg)
4. Excel http://office.microsoft.com/en-us/excel/
![Page 14: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/14.jpg)
4. Excel
useful to: - use formulas
- rearrange & filter - pivot tables
![Page 15: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/15.jpg)
5. Code (processing, javascript…)
![Page 16: Data Wrangling](https://reader034.vdocuments.net/reader034/viewer/2022042823/568bd94e1a28ab2034a68fd9/html5/thumbnails/16.jpg)
5. Code
useful to: - do everything