1 the stream star schema stephen a. broeker 1010
TRANSCRIPT
![Page 1: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/1.jpg)
1
The Stream Star Schema
Stephen A. Broeker
10
![Page 2: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/2.jpg)
2
Conclusion
The Stream Star Schema processes data streams in real-time. Up to gigabits per second.
Stream Star performance is O(1).
20
![Page 3: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/3.jpg)
3
phone callsroad trafficnetwork trafficwebsite traffic power suppliescredit card transactionssensor arrays financial markets
are data rich. But real-time analysis po
Large Fast Dynamic Data Streams
30
![Page 4: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/4.jpg)
4
phone callsroad trafficnetwork trafficwebsite traffic power suppliescredit card transactionssensor arrays financial markets
Data rich. But poor in real-time analysis.
Large Fast Dynamic Data Streams
40
phone callsroad trafficnetwork trafficwebsite traffic power suppliescredit card transactionssensor arrays financial markets
![Page 5: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/5.jpg)
5
phone callsroad trafficnetwork trafficwebsite traffic power suppliescredit card transactionssensor arrays financial markets
What are the consequences?
Large Fast Dynamic Data Streams
50
![Page 6: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/6.jpg)
6
hard tosee patterns
Therefore difficult to detect problems.
Large Fast Dynamic Data Streams
60
![Page 7: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/7.jpg)
7
Network monitoring at high speed is difficult:
Packets arrive every nanosecond on a 1Gbps NIC
Must use SRAM for per-packet processing
Traditional solution of sampling is inherently not accurate due to the loss of data.
Challenge of Network Monitoring
70
![Page 8: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/8.jpg)
8
Achieve real-time OLAP for massive data streams.
Achieve cybernetic control for systems that depend on rapid data analysis.
Vision
80
![Page 9: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/9.jpg)
9
Detection
90
![Page 10: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/10.jpg)
10
Forensics
10
![Page 11: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/11.jpg)
11
Data RATES are measured in bits per second.
So, Gigabits (Gb) ≠ Gigabytes (GB).
Data Rates versus Data Storage
Lowercase ‘b’
11
![Page 12: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/12.jpg)
12
Data RATES are measured in bits per second.
Data STORAGE is measured in Bytes.
So, Gigabits (Gb) ≠ Gigabytes (GB).
Data Rates versus Data Storage
Lowercase ‘b’ Uppercase ‘B’
12
![Page 13: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/13.jpg)
13
Ethernet Network Interface Card transferring data at 1 Gbps.
Data accumulates at 450MB per hour.
That’s 10.5 TB per day, 73.8 TB per week!
Data Storage based on Data Rate
13
![Page 14: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/14.jpg)
14
What if BYTES were pennies?
Picturing Orders of Magnitude
X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA
106 = 220 109 = 230 1012 = 240 1015 = 250
14
![Page 15: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/15.jpg)
15
What if BYTES were pennies?
Picturing Orders of Magnitude
X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA
106 = 220 109 = 230 1012 = 240 1015 = 250
15
![Page 16: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/16.jpg)
16
What if BYTES were pennies?
Picturing Orders of Magnitude
X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA
106 = 220 109 = 230 1012 = 240 1015 = 250
16
![Page 17: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/17.jpg)
17
What if BYTES were pennies?
Picturing Orders of Magnitude
X At 1Gbps, 2.2 PB accumulate per month.Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA
106 = 220 109 = 230 1012 = 240 1015 = 250
17
![Page 18: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/18.jpg)
18
What if BYTES were pennies?
Picturing Orders of Magnitude
X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA
1018 = 260
17
![Page 19: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/19.jpg)
19
The network stream is segmented into flows, which are inserted into a database.
Observed database input rate for 1 Gb Ethernet NIC: 700,000 flows per hour.
Existing databases can’t keep up!
From Streaming Data to Database
18
![Page 20: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/20.jpg)
20
Disk Star Schema
STREAM Star Schema
Consider 2 Database Schemas
19
![Page 21: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/21.jpg)
21So where’s the star?
Disk Star SchemaFrom Fact Table to Dimension Tables
Content Table
Sender Table
Subject TableRecipient Table
Destination IP TableContent
Destination IP
Sender
Recipient
Subject
That’s all there is to the “star” concept.
Here’s the star.
20
![Page 22: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/22.jpg)
22
Value of the Disk Star Schema
Conserve Disk Space 21
![Page 23: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/23.jpg)
23
Dimensions
Each Dimension gets a key. 22
![Page 24: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/24.jpg)
24Resulting in a Dimension Table
1NF: No Repeating Groups
23
![Page 25: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/25.jpg)
25Thus deriving a Fact Table.
Substitute Keys for Facts
24
![Page 26: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/26.jpg)
26
Disk Star Schema = Slow data insertion time.
Relational databases are normalized to conserve space. Speed is sacrificed.
So real-time analysis is compromised.25
SlowBottleneck
![Page 27: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/27.jpg)
27
Disk Star Schema
26
![Page 28: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/28.jpg)
28
Disk Star Schema
27
![Page 29: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/29.jpg)
29
Disk Star Schema
28
![Page 30: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/30.jpg)
30
Disk Star Schema
29
![Page 31: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/31.jpg)
31
Dimension table insertion time depends on the table size which is O(log n) where n is the number of records in a table.
Disk Star Schema insertion time, is the sum of all
dimension table insert times O(Ʃ1≤i ≤ l (log ni )) where l
is the number of attributes in the database and ni is the number of values for attribute i.
Can’t fill dimension tables fast enough!
Bottleneck
30
![Page 32: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/32.jpg)
32
1,000,000,000 bit Ethernet NIC (1Gb)
700,000 Observed Flows per hour
460 MBs per hour, 10.5 TBs a day
All we can get is a snapshot-analysis!
Short Pause to Review Numbers
31
![Page 33: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/33.jpg)
33
Disk Star Schema
STREAM Star Schema
Consider 2 Database Schemas
32
![Page 34: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/34.jpg)
34
Stream Star Schema
33
Stream Star Schema
![Page 35: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/35.jpg)
35
34
Stream Star Schema
![Page 36: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/36.jpg)
36
Stream Star Schema
35
Stream Star Schema
![Page 37: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/37.jpg)
37
Disk Star Schema
Nearly 1:1 Correspondence between string attributes and Dimension tables.
36
![Page 38: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/38.jpg)
38
Disk Star Schema
Two kinds of tables - fact, dimension.All string dimensions have dimension tables.Minimize disk space.Dimension tables can be large.
Long insert time = O(Ʃ1≤i ≤ l (log ni ))No string duplication.
37
![Page 39: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/39.jpg)
39
Many:1 38
Stream Star Schema
![Page 40: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/40.jpg)
40
Three kinds of tables - fact, dimension, string.Few dimension tables.Dimension tables are small.Minimizes insertion time.I n s e r t t i m e i s c o n s t a n t.Allow string duplication. Allow string duplication.
39
Stream Star Schema
![Page 41: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/41.jpg)
41
Side x Side Comparison
Slow FastOld New
40
![Page 42: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/42.jpg)
42
Test Results
41
![Page 43: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/43.jpg)
43
Test Results
The magnified area is different because I measured the insert time for (1, 10, 100) as opposed to (1000, 2000, 3000) streams.42
![Page 44: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/44.jpg)
44
Test Results
The magnified area is different because of how MySQL works. I can only present a hypothesis since I don’t have the MySQL source code. But I suspect that MySQL is optimized for less than 100 streams for this problem. 43
![Page 45: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/45.jpg)
45
Conclusion
44
![Page 46: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/46.jpg)
46
Conclusion
The Stream Star Schema processes data streams in real-time. Up to gigabits per second.
Stream Star performance is O(1).
45
![Page 47: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/47.jpg)
47
Hope
Detection
Forensics
RFID
46
![Page 48: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/48.jpg)
48
There’s data flow
47
![Page 49: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/49.jpg)
49
And then there’s DATA FLOW!
48
![Page 50: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/50.jpg)
50
Disk Star Schema handles 3 million flows per hour, about this much.
49
![Page 51: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/51.jpg)
51
The Stream Star Schemahandles 113 million flows per hour!
Disk Star Schema handles 3 million flows per hour, about this much.
50
![Page 52: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/52.jpg)
52Nearly 40x Faster!51
![Page 53: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/53.jpg)
53
For The Future
Implement the Stream Star Schema in the Cloud.
Use multiple Stream Star Schema computer nodes to handle an infinite stream. Storage could be handled similarly to S3.
52
![Page 54: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/54.jpg)
54
For The Future
The Stream Star Schema fully supports the analysis of high-speed data streams thus enabling security applications and forensic processing.
53
![Page 55: 1 The Stream Star Schema Stephen A. Broeker 1010](https://reader035.vdocuments.net/reader035/viewer/2022062318/5519b5a255034660578b4799/html5/thumbnails/55.jpg)
55 END