andrew smith

21
Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011

Upload: pandora-gilmore

Post on 05-Jan-2016

23 views

Category:

Documents


3 download

DESCRIPTION

Andrew Smith. Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011. Describing diet with cluster analysis. Pauline M. Emmett P. Kirstin Newby Kate Northstone World Cancer Research Fund MRC, Wellcome Trust, University of Bristol. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Andrew Smith

Andrew Smith

Describing childhood diet with cluster analysisYoung Statisticians’ meeting. 12th April 2011

Page 2: Andrew Smith

Describing diet with cluster analysis

• Pauline M. Emmett

• P. Kirstin Newby

• Kate Northstone

• World Cancer Research Fund

• MRC, Wellcome Trust, University of Bristol

2

Page 3: Andrew Smith

Outline

• Introductions• ALSPAC• Food frequency questionnaires• Dietary patterns• Cluster analysis

• k-means cluster analysis

• Results• 3 cluster solution• Associations with socio-demographic variables

3

Page 4: Andrew Smith

ALSPAC

• Avon Longitudinal Study of Parents and Children

• Birth cohort study

• 14,541 pregnant women and their children

• www.bris.ac.uk/alspac

4

Page 5: Andrew Smith

Food frequency questionnaires5

Page 6: Andrew Smith

Dietary patterns

• Examine diet as a whole

• Analyse multivariate FFQ data

• Use correlations between foods

• PCA

• Cluster analysis

6

Image: Paul / FreeDigitalPhotos.net

Page 7: Andrew Smith

Cluster analysis

• Separate subjects into

non-overlapping

groups

• Based on ‘distances’

between individuals

• Unsupervised learning

7

Image: Boaz Yiftach / FreeDigitalPhotos.net

Page 8: Andrew Smith

k-means cluster analysis

• Most widely used for dietary patterns

• Number of clusters, k, is specified beforehand

• Minimises – Distance from each subject to his/her cluster

mean– Summed over all subjects in that cluster– Summed over all clusters

8

Page 9: Andrew Smith

k-means cluster analysis9

Page 10: Andrew Smith

Problems with the standard algorithm

• Short-sighted

• Tends to find solutions that are at a local minimum– So run algorithm 100 times and choose solution

that is minimum out of all minima

10

Page 11: Andrew Smith

Standardising the input variables11

Page 12: Andrew Smith

Reliability of the cluster solution

• Split sample in half

• Perform separate analyses on each half

• See how many children change clusters

• Repeat 5 times– 32 out of 8,279 children changed cluster (0.4%)

12

Page 13: Andrew Smith

Processed4177 children13

Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net

Page 14: Andrew Smith

Plant-based2065 children14

Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net

Page 15: Andrew Smith

Traditional British2037 children15

Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net

Page 16: Andrew Smith

Associations with socio-demographic vars

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

Girls 3,115 1 1 1

Boys 2,941 0.82 (0.72, 0.93)

1.03(0.89, 1.20)

1.18 (1.04, 1.34)

16

Page 17: Andrew Smith

Associations with socio-demographic vars

Maternal age

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

< 21 130 1 1 1

21-25 994 0.59 (0.33, 1.07)

1.07 (0.56, 2.05)

1.57(1.02, 2.43)

26-30 2,644 0.52(0.29, 0.92)

1.20(0.64, 2.28)

1.60(1.04, 2.46)

31+ 2,288 0.37(0.21, 0.67)

1.50(0.79, 2.88)

1.77(1.13, 2.76)

17

Page 18: Andrew Smith

Associations with socio-demographic vars

Maternal education

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

CSE 740 1 1 1

Vocational 504 0.84(0.60, 1.17)

1.19(0.82, 1.72)

1.01(0.76, 1.32)

O level 2,163 0.65(0.51, 0.83)

1.46(1.10, 1.94)

1.05(0.86, 1.30)

A level 1,604 0.42(0.33, 0.55)

2.01(1.50, 2.69)

1.18(0.95, 1.48)

Degree 1,045 0.30(0.23, 0.39)

2.75(2.00, 3.76)

1.22(0.94, 1.57)

18

Page 19: Andrew Smith

Associations with socio-demographic vars

Siblings

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

0 older 2,755 1 1 1

1 older 2,317 1.21(1.03, 1.42)

1.12 (0.94, 1.36)

0.73(0.62, 0.86)

2+ older 984 1.58(1.28, 1.97)

0.99(0.76, 1.27)

0.64(0.52, 0.80)

19

Page 20: Andrew Smith

Associations with socio-demographic vars

Siblings

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

0 younger 2,946 1 1 1

1 younger 2,490 1.01(0.86, 1.19)

0.58(0.48, 0.71)

1.69(1.44, 1.99)

2+ younger 620 1.21(0.92, 1.57)

0.43(0.33, 0.58)

1.90(2.50, 2.40)

20

Page 21: Andrew Smith

Summary

• Multivariate methods to compress FFQ data into

dietary patterns

• k-means cluster analysis is widespread but must

be applied carefully

• Processed, Plant-based and Traditional British

clusters in 7-year-old children

• Associated with various socio-demographic

variables

21