1 powerset explorer: a datamining application jordan lee

67
1 Powerset Explorer: A Datamining Application Jordan Lee

Post on 15-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Powerset Explorer: A Datamining Application Jordan Lee

1

Powerset Explorer: A Datamining Application

Jordan Lee

Page 2: 1 Powerset Explorer: A Datamining Application Jordan Lee

2

Background

Page 3: 1 Powerset Explorer: A Datamining Application Jordan Lee

3

Background

PAST– Datamining accomplished with human intuition

Page 4: 1 Powerset Explorer: A Datamining Application Jordan Lee

4

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

Page 5: 1 Powerset Explorer: A Datamining Application Jordan Lee

5

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

FUTURE– Enter PowersetViewer….

Page 6: 1 Powerset Explorer: A Datamining Application Jordan Lee

6

Dataset

Page 7: 1 Powerset Explorer: A Datamining Application Jordan Lee

7

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Page 8: 1 Powerset Explorer: A Datamining Application Jordan Lee

8

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Page 9: 1 Powerset Explorer: A Datamining Application Jordan Lee

9

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Transaction database– Collection of transactions (unordered, possibly repetitive)– Eg. Walmart transaction logs

Page 10: 1 Powerset Explorer: A Datamining Application Jordan Lee

10

Example Dataset

Student enrollment database

Page 11: 1 Powerset Explorer: A Datamining Application Jordan Lee

11

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

Page 12: 1 Powerset Explorer: A Datamining Application Jordan Lee

12

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

Page 13: 1 Powerset Explorer: A Datamining Application Jordan Lee

13

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

– Transaction DB = list of student course schedules

Page 14: 1 Powerset Explorer: A Datamining Application Jordan Lee

14

Example Dataset (cont’d)

72423298 5 676 1701 3046 3900 1327 38578546 7 175 178 1182 1701 3038 680 39127660625 5 326 676 1701 3038 390843359163 3 1177 1699 4317 26495781 6 676 1177 1701 3038 3900 4275 48536452 4 1699 2339 1327 2826 64251972 6 676 1177 1701 3038 3900 2549 23212318 5 676 1701 3040 3813 3900 19820119 5 104 676 1699 3038 3900 65954629 4 480 676 3040 3908 54392012 5 676 1701 3038 3813 3899 85833501 5 676 1699 3040 3813 3900 65136197 5 676 1699 3038 3900 2580

Page 15: 1 Powerset Explorer: A Datamining Application Jordan Lee

15

Why?

Why is this interesting?

Page 16: 1 Powerset Explorer: A Datamining Application Jordan Lee

16

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying

Page 17: 1 Powerset Explorer: A Datamining Application Jordan Lee

17

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying– Student enrollment database -> trends in

enrollment What electives do most undergrad computer science

students take? Departments can determine which joint majors would fit

the student population.

Page 18: 1 Powerset Explorer: A Datamining Application Jordan Lee

18

Why? (cont’d)

Dataset sizes growing exponentially

Page 19: 1 Powerset Explorer: A Datamining Application Jordan Lee

19

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits

Page 20: 1 Powerset Explorer: A Datamining Application Jordan Lee

20

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)

Page 21: 1 Powerset Explorer: A Datamining Application Jordan Lee

21

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)– Information visualization can scale the power of

human intuition

Page 22: 1 Powerset Explorer: A Datamining Application Jordan Lee

22

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Page 23: 1 Powerset Explorer: A Datamining Application Jordan Lee

TreeJuxtaposer

Page 24: 1 Powerset Explorer: A Datamining Application Jordan Lee

24

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals

Page 25: 1 Powerset Explorer: A Datamining Application Jordan Lee

25

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids

Page 26: 1 Powerset Explorer: A Datamining Application Jordan Lee

26

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility

Page 27: 1 Powerset Explorer: A Datamining Application Jordan Lee

27

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility– Marking of groups

Page 28: 1 Powerset Explorer: A Datamining Application Jordan Lee

28

Milestones Status Update

Page 29: 1 Powerset Explorer: A Datamining Application Jordan Lee

29

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

Page 30: 1 Powerset Explorer: A Datamining Application Jordan Lee

30

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”.

Page 31: 1 Powerset Explorer: A Datamining Application Jordan Lee

31

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6)

Page 32: 1 Powerset Explorer: A Datamining Application Jordan Lee

32

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items.

Page 33: 1 Powerset Explorer: A Datamining Application Jordan Lee

33

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints

Page 34: 1 Powerset Explorer: A Datamining Application Jordan Lee

34

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints #6 Increase maximum possible dataset size to at

least 100.

Page 35: 1 Powerset Explorer: A Datamining Application Jordan Lee

35

Difficulties

Page 36: 1 Powerset Explorer: A Datamining Application Jordan Lee

36

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Page 37: 1 Powerset Explorer: A Datamining Application Jordan Lee

37

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

Page 38: 1 Powerset Explorer: A Datamining Application Jordan Lee

38

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

High CPU and memory usage– Solultion: upgrade computer! hack

Page 39: 1 Powerset Explorer: A Datamining Application Jordan Lee

39

Current Status

Reduced database8680433 3 0 7 5 2768129 2 6 4 6385608 5 1 9 10 9 11 147924 5 5 2 9 5 2 234140 3 11 4 8 4331093 4 4 6 0 0 3158394 5 12 1 12 5 4 5797538 6 11 4 3 13 12 4 6243191 1 5 5872060 4 3 8 9 6

Page 40: 1 Powerset Explorer: A Datamining Application Jordan Lee

40

Current Status

Property file– 0 CPSC 325 75.0 3

1 PHIL 327 84.0 1 2 ANTH 329 45.0 2 3 MATH 327 0.0 3 4 PSYC 328 0.0 1 5 ENGL 329 0.0 2 6 APSC 540 0.0 1 7 MECH 541 0.0 1 8 STAT 543 0.0 1 9 SPAN 201 71.0 1 10 FREN 258 76.0 2 11 ECON 260 84.0 1 12 LING 295 42.0 1 13 EECE 302 73.0 1

Page 41: 1 Powerset Explorer: A Datamining Application Jordan Lee

41

Page 42: 1 Powerset Explorer: A Datamining Application Jordan Lee

42

Page 43: 1 Powerset Explorer: A Datamining Application Jordan Lee

43

Page 44: 1 Powerset Explorer: A Datamining Application Jordan Lee

44

Page 45: 1 Powerset Explorer: A Datamining Application Jordan Lee

45

Page 46: 1 Powerset Explorer: A Datamining Application Jordan Lee

46

Page 47: 1 Powerset Explorer: A Datamining Application Jordan Lee

47

Page 48: 1 Powerset Explorer: A Datamining Application Jordan Lee

48

Page 49: 1 Powerset Explorer: A Datamining Application Jordan Lee

49

Page 50: 1 Powerset Explorer: A Datamining Application Jordan Lee

50

Page 51: 1 Powerset Explorer: A Datamining Application Jordan Lee

51

Page 52: 1 Powerset Explorer: A Datamining Application Jordan Lee

52

Page 53: 1 Powerset Explorer: A Datamining Application Jordan Lee

53

Page 54: 1 Powerset Explorer: A Datamining Application Jordan Lee

54

Page 55: 1 Powerset Explorer: A Datamining Application Jordan Lee

55

Page 56: 1 Powerset Explorer: A Datamining Application Jordan Lee

56

Page 57: 1 Powerset Explorer: A Datamining Application Jordan Lee

57

Page 58: 1 Powerset Explorer: A Datamining Application Jordan Lee

58

Page 59: 1 Powerset Explorer: A Datamining Application Jordan Lee

59

Page 60: 1 Powerset Explorer: A Datamining Application Jordan Lee

60

Page 61: 1 Powerset Explorer: A Datamining Application Jordan Lee

61

Page 62: 1 Powerset Explorer: A Datamining Application Jordan Lee

62

Page 63: 1 Powerset Explorer: A Datamining Application Jordan Lee

63

Page 64: 1 Powerset Explorer: A Datamining Application Jordan Lee

64

Page 65: 1 Powerset Explorer: A Datamining Application Jordan Lee

65

Page 66: 1 Powerset Explorer: A Datamining Application Jordan Lee

66

Page 67: 1 Powerset Explorer: A Datamining Application Jordan Lee

67

Questions?