finding stories in spreadsheets

47
@PaulBradshaw Leanpub.com/u/paulbradshaw Birmingham City University, City University London Online Journalism Blog, HelpMeInvestigate Saturday, 10 May 14

Upload: paul-bradshaw

Post on 06-May-2015

1.521 views

Category:

Education


45 download

DESCRIPTION

Presentation at Data Harvest 2014

TRANSCRIPT

Page 1: Finding stories in spreadsheets

@PaulBradshawLeanpub.com/u/paulbradshawBirmingham City University, City University LondonOnline Journalism Blog, HelpMeInvestigate

Saturday, 10 May 14

Page 2: Finding stories in spreadsheets

Show of hands. Who has...- Calculated a proportion- Used a function like SUM- Used pivot tables- Used a function like VLOOKUP

Saturday, 10 May 14

Page 3: Finding stories in spreadsheets

PART ONE:

BASICS.Saturday, 10 May 14

Page 4: Finding stories in spreadsheets

Saturday, 10 May 14

Page 6: Finding stories in spreadsheets

- Make a copy, work on that- Use CTRL+arrow keys to skip to edges of data- Clean first few rows to create single heading row- Remove grand total row- Remove empty rows (Open Refine)

Speed: keyboard shortcuts for checking the data

Saturday, 10 May 14

Page 7: Finding stories in spreadsheets

Numbers Strings Calculations10 John Smith =10+20+30

20 Kate Brown =A2+A3+A430 Mike Moore =SUM(A2:A4)

N/A Kim Smith =COUNT(A:A)

50 =COUNTA(B:B)

Row 1

Column A Column B Column C

Row 3

Row 4

Row 5

Row 6

Row 2

Saturday, 10 May 14

Page 8: Finding stories in spreadsheets

Granular data has row for every payment, person, crime etc.Aggregate has rows for total crimes, payments, etc.Granular always better - can calculate your own aggregates

Two types of datasets:Aggregate and granular

Saturday, 10 May 14

Page 9: Finding stories in spreadsheets

Aggregate data: - put the focus in Rows- numbers (money, crimes) in Values

Granular: pivot tables

Saturday, 10 May 14

Page 10: Finding stories in spreadsheets

Saturday, 10 May 14

Page 11: Finding stories in spreadsheets

= indicates this is a formulaSUM is the function to be applied( contains the ingredients for that formulaD2:D300 this is a range (array) of cells*, separates each ingredient) ends the list of ingredients

Using functions - and arguments

Saturday, 10 May 14

Page 12: Finding stories in spreadsheets

=SUM(D:D) ignores any text/empty cells=MAX(D:D)=MIN(D:D)=AVERAGE(D:D)

More speed: use column ranges

Saturday, 10 May 14

Page 13: Finding stories in spreadsheets

=AVERAGE(D:D) =MEDIAN(D:D) =MODE(D:D) - for ‘most common’: useful for ordinal ratings which shouldn’t be averaged.

Sense-checking: misleading averages

Saturday, 10 May 14

Page 14: Finding stories in spreadsheets

=MAX(D:D)/SUM(D:D) - how much of the total is accounted for by the biggest value?=SUM(D35:D64)/SUM(D:D) - what proportion from one entity?=SUM(D:D)/365 - how much per day? (for annual data)

Combining functions to quickly make numbers meaningful

Saturday, 10 May 14

Page 15: Finding stories in spreadsheets

Org spending £X per dayCompany receives X% of spendingOrg spent £X on Y

Stories you can report quickly

Saturday, 10 May 14

Page 16: Finding stories in spreadsheets

Saturday, 10 May 14

Page 17: Finding stories in spreadsheets

Data health

warning!

Remember the context: e.g. spending over £500, inflationSaturday, 10 May 14

Page 18: Finding stories in spreadsheets

PART TWO:

CHECKINGSaturday, 10 May 14

Page 19: Finding stories in spreadsheets

Saturday, 10 May 14

Page 20: Finding stories in spreadsheets

=COUNT(D:D) =COUNTA(D:D) =COUNTBLANK(D2:D15000) - have to use specific range or blank cells underneath table are counted=COUNTIF(D:D, “Other”)

COUNT functions: Checking data coverage

Saturday, 10 May 14

Page 21: Finding stories in spreadsheets

=COUNTIF(D:D, “Individual”) =COUNTIFS(D:D, “Individual”, B:B,”<10000”)=SUMIF(D:D, “<10000”) =IF(This, then that, otherwise this)

IF functions: Drill down further

Saturday, 10 May 14

Page 22: Finding stories in spreadsheets

=COUNTIF(D:D, “*hire*”) =COUNTIF(D:D, “Scottish*”)=COUNTIF(D:D, “* hire*”)

COUNTIF:Use wildcards - and spaces

Saturday, 10 May 14

Page 23: Finding stories in spreadsheets

Saturday, 10 May 14

Page 24: Finding stories in spreadsheets

=COUNTIF(D2, “*adidas*”) =COUNTIF(D3, “*adidas*”)=COUNTIF(D4, “*adidas*”)...Then sort to bring the 1s to the top

COUNTIF: Test free text data

Saturday, 10 May 14

Page 25: Finding stories in spreadsheets

THE BLACK CROSS

DOUBLE

CLICKSaturday, 10 May 14

Page 26: Finding stories in spreadsheets

Saturday, 10 May 14

Page 27: Finding stories in spreadsheets

PART THREE:

CLEANINGSaturday, 10 May 14

Page 28: Finding stories in spreadsheets

Saturday, 10 May 14

Page 29: Finding stories in spreadsheets

=TRIM(D2)=SUBSTITUTE(D2,“ ”, “”)(Target cell, what you want to substitute, what you want to replace it with)=SEARCH(“Wales”,A2) Gives a position of the first match

Cleaning text:TRIM, SEARCH, SUBSTITUTE

Saturday, 10 May 14

Page 30: Finding stories in spreadsheets

mr SMITH=UPPER(D2) = MR SMITH=LOWER(D2) = mr smith=PROPER(D2) = Mr Smith

Cleaning text:UPPER, LOWER, PROPER

Saturday, 10 May 14

Page 31: Finding stories in spreadsheets

=LEFT(E2,3) = first 3 characters in E2=RIGHT(E2,3) = last 3 characters in E2=MID(E2,10,3) = the 3 characters in E2 starting from position 10

Cleaning text:LEFT, RIGHT, MID

Saturday, 10 May 14

Page 32: Finding stories in spreadsheets

=LEN(E2) = how many characters in E2=LEFT(E2,LEN(E2)-3) = Length of E2 - 3. Grab that many characters. i.e.- If E2 is 5 characters, it will grab the first 2 (5-3=2)- If E2 is 7 characters it will grab the first 4 (7-3=4)

Combine with LEN

Saturday, 10 May 14

Page 33: Finding stories in spreadsheets

=SEARCH(“ ”,E2) = which position is the first space=LEFT(E2,SEARCH(“ ”,E2)) = Grab all characters up to (and including) that space

Combine with SEARCH

Saturday, 10 May 14

Page 34: Finding stories in spreadsheets

=SEARCH(“ ”,E2) = which position is the first space=LEFT(E2,SEARCH(“ ”,E2)) = Grab all characters up to (and including) that space=TRIM(LEFT(E2,SEARCH(“ ”,E2)))

Combine with SEARCH

Saturday, 10 May 14

Page 35: Finding stories in spreadsheets

=ISERROR(D2) = TRUE or FALSESee also:ISNUMBER, ISTEXT, ISNONTEXT, ISLOGICAL, ISEVEN, ISODDISERR (all but N/A)

Finding errors:ISERROR, ISNA, ISBLANK

Saturday, 10 May 14

Page 36: Finding stories in spreadsheets

PART FOUR:

ADDINGSaturday, 10 May 14

Page 37: Finding stories in spreadsheets

Saturday, 10 May 14

Page 38: Finding stories in spreadsheets

Save time typing search URLs

Saturday, 10 May 14

Page 41: Finding stories in spreadsheets

=VLOOKUP(What you’re looking for, what range contains a match & what you want back, which column you want back, nearest match?)=VLOOKUP(D2,Sheet1!D:E,2,false)

Merging data:VLOOKUP

Saturday, 10 May 14

Page 42: Finding stories in spreadsheets

=TEXT(D2, “dddd”) =YEAR(D2)=MONTH(D2) = 1=TEXT(D2, “mmmm”) = ‘January’=TEXT(D2, “mmm”) = ‘Jan’If not formatted as date, use LEFT

Convert dates to years:TEXT functions

Saturday, 10 May 14

Page 43: Finding stories in spreadsheets

=IF(B2>2500,“High”,“Low”)

Convert amounts to categories: nested IF functions

Saturday, 10 May 14

Page 44: Finding stories in spreadsheets

=IF(B2>2500,“High”,“Low”)=IF(B2>2500,“High”,IF(B2<1000,“Low”,“Mid”))

Convert amounts to categories: nested IF functions

Saturday, 10 May 14

Page 45: Finding stories in spreadsheets

=IF(COUNTIF(B2, “*dropped*”), “Dropped”, “Not dropped”)

Can’t use wildcard. Combine with COUNTIF

Saturday, 10 May 14

Page 46: Finding stories in spreadsheets

1. Save time.2. Check your data.3. Clean your data.4. Add to your data.5. Feel clever. But don’t be too clever.

Saturday, 10 May 14

Page 47: Finding stories in spreadsheets

Thank youLeanpub.com/u/spreadsheetstories@paulbradshaw

Saturday, 10 May 14