pattern mining: extracting value from log data
TRANSCRIPT
![Page 1: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/1.jpg)
Pattern Mining: Getting the most out of your log data.
Krishna SridharStaff Data Scientist, Dato Inc. krishna_srd
![Page 2: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/2.jpg)
• Background- Machine Learning (ML) Research.- Ph.D Numerical Optimization @Wisconsin
• Now- Build ML tools for data-scientists & developers @Dato.- Help deploy ML algorithms.
@krishna_srd, @DatoInc
About Me!
![Page 3: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/3.jpg)
45+$and$growing$fast!
About Us!
![Page 4: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/4.jpg)
+ =
Questions?• (Now) We are monitoring the chat window.• (Later) Email me [email protected].
Webinars
![Page 5: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/5.jpg)
About you?
![Page 6: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/6.jpg)
Creating a model pipeline
Ingest Transform Model Deploy Unstructured Data
exploration
data
modeling
Data Science Workflow
Ingest Transform Model Deploy
![Page 7: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/7.jpg)
GraphLab(Create(
Train Model
Pipeline
Deploy Models
Serve Requests
(REST API)
Monitor Services
Get Live Feedback
Update Pipelines
Prototype & Develop Model
Pipelines
Update Live Experiment
Deploy New Pipeline
Dato(Predic2ve(Services(Dato’s Products Dato(Distributed(
We can help!
![Page 8: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/8.jpg)
Log Journey
Lots of data
Insights Profits
![Page 9: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/9.jpg)
Log Mining: Pattern Mining
![Page 10: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/10.jpg)
Logs are everywhere!
![Page 11: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/11.jpg)
Machine Learning in Logs
Source: Mining Your Logs - Gaining Insight Through Visualization
![Page 12: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/12.jpg)
Coffee shop
Coffee Shops Menu
![Page 13: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/13.jpg)
Receipts
Coffee Shops Menu
![Page 14: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/14.jpg)
Coffee Store Logs
![Page 15: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/15.jpg)
Frequent Pattern Mining
What sets of items were bought together?
![Page 16: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/16.jpg)
Real Applications
![Page 17: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/17.jpg)
Real Applications
![Page 18: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/18.jpg)
Real Applications
![Page 19: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/19.jpg)
Log Mining: Rule Mining
![Page 20: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/20.jpg)
Can we recommend items?
Rule Mining
![Page 21: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/21.jpg)
Real Applications
![Page 22: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/22.jpg)
Log Mining: Feature Extraction
![Page 23: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/23.jpg)
Feature Extraction
0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0
Receipt Space Features inMenu Space
ML
![Page 24: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/24.jpg)
3 Useful Data Mining Tasks
Rule MiningPattern Mining Feature Extraction
![Page 25: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/25.jpg)
Demo
![Page 26: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/26.jpg)
ML is not a black-box.Transparency
Learning is also about understanding. Interpretability
Whatever can go wrong, will go wrong. Diagnosis
Moving on
![Page 27: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/27.jpg)
Pattern Mining Explained
![Page 28: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/28.jpg)
Formulating Pattern Mining
N distinct items → 2N itemsets
![Page 29: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/29.jpg)
Formulating Pattern Mining
Find the top K most frequent sets of length at least L that occur at least M times.
![Page 30: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/30.jpg)
Formulating Pattern Mining
Find the top K most frequent sets of length at least L that occur at least M times.
- max_patterns- min_length- min_support
![Page 31: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/31.jpg)
Pattern Mining
N distinct items → 2N itemsets
![Page 32: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/32.jpg)
Pattern Mining: Principles
![Page 33: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/33.jpg)
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D}: 5 is frequentM = 4
{A, D}: 5 is not frequent
![Page 34: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/34.jpg)
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D}: 5 is frequentM = 4
{A, D}: 5 is not frequent
min_support
![Page 35: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/35.jpg)
Principle 2: Apriori principle
A pattern is frequent only if a subset is frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{B, C, D} : 5 is frequent therefore {C, D} : 5 is frequent
{A} : 3 is not frequent therefore {A, D} : 3 is not frequent
M = 4
![Page 36: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/36.jpg)
Two Main Algorithms
• Candidate Generation- Apriori - Eclat
• Pattern Growth- FP-Growth- TopK FP-Growth [GLC 1.6]
![Page 37: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/37.jpg)
Lots of Generalizations
Source: http://www.philippe-fournier-viger.com/spmf/
![Page 38: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/38.jpg)
Candidate Generation
Two phases1. Candidate generation.2. Candidate filtering.
Exploit Apriori Principle!
![Page 39: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/39.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : ? {B} : ? {C} : ? {D} : ?
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 40: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/40.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : ? {B} : ? {C} : ? {D} : ?
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 41: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/41.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 42: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/42.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 43: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/43.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 44: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/44.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 45: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/45.jpg)
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
![Page 46: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/46.jpg)
Pattern Growth
Two phases1. Candidate filtering2. Conditional database constructions.
Avoid full scans over the data & large candidate sets!
![Page 47: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/47.jpg)
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : 1 {AC} : 2 {AD} : 3 {BD} : 4 {CD} : 4
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : 0 {ABD} : 1 {ACD} : 2 {BCD} : 2
{BC} : 2
![Page 48: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/48.jpg)
Pattern Growth - Preprocessing {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
![Page 49: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/49.jpg)
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : ? {B} : ? {C} : ? {D} : ?
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
![Page 50: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/50.jpg)
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
![Page 51: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/51.jpg)
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
![Page 52: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/52.jpg)
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : 2
![Page 53: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/53.jpg)
Pattern Growth
{B} : 4
{ } : 6
Call: Growth(db = DB{}, item = B, freq = {B,C,D})
DB{}
{B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
![Page 54: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/54.jpg)
Pattern Growth
{B} : 4
{ } : 6
Conditional Database ConstructionDB{} DB{B}
{B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{C, D}
{D}
{C, D}
{D}
![Page 55: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/55.jpg)
Pattern Growth
{B} : 4
{ } : 6
Candidate FilteringDB{B}
{C, D}
{D}
{C, D}
{D}
{D} : 4
{C} : 2
DB{}
{B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
DB{B}
Add {BD} as frequent
![Page 56: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/56.jpg)
Pattern Growth - Depth First {C, D}
{D}
{C, D}
{D}
{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : 2
![Page 57: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/57.jpg)
Pattern Growth
Recurse: Growth(db = DB{B}, item = D, freq = {D})DB{B}
{C, D}
{D}
{C, D}
{D}
{B} : 4
{ } : 6
{BD} : 4
DB{BD}
![Page 58: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/58.jpg)
Pattern Growth - Depth First
{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : X {ACD} : ? {BCD} : X
{BC} : 2
![Page 59: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/59.jpg)
Compare & Constrast
• Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data
• Pattern Growth + Fewer passes over the data + Space efficient.
![Page 60: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/60.jpg)
Compare & Constrast
• Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data
• Pattern Growth + Fewer passes over the data + Space efficient.
Better choice
![Page 61: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/61.jpg)
FP-Tree CompressionFigures From Florian Verhein’s Slides on FP-Growth
![Page 62: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/62.jpg)
FP-Growth AlgorithmFigures From Florian Verhein’s Slides on FP-Growth
Two phases1. Candidate filtering.2. Conditional database constructions.
![Page 63: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/63.jpg)
TopK FP-Growth Algorithm
Similar to FP-Growth1. Dynamically raise min_support.2. Estimates of min_support greatly help.
![Page 64: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/64.jpg)
Performance on Website Logs
• 1.5m events• 84k sessions• 3k unique ids
![Page 65: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/65.jpg)
Future Work
![Page 66: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/66.jpg)
Distributed FP-Growth
Partition database on item-ids.
Database
![Page 67: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/67.jpg)
Bags + Sequences
× 2
Itemset: {Item}
Bags: {Item: quantity}
Sequences : (item)
![Page 68: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/68.jpg)
Model built, now what?
![Page 69: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/69.jpg)
Creating a model pipeline
Ingest Transform Model Deploy Unstructured Data
exploration
data
modeling
Data Science Workflow
Ingest Transform Model Deploy
![Page 70: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/70.jpg)
Demo
![Page 71: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/71.jpg)
Summary
Log Data Mining
≠Rocket Science
• FP-Growth for finding frequent patterns.• Find rules from patterns to make predictions.• Extract features for useful ML in pattern space.
![Page 72: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/72.jpg)
SELECT questions FROM audienceWHERE difficulty == “Easy”
Thanks!
![Page 73: Pattern Mining: Extracting Value from Log Data](https://reader033.vdocuments.net/reader033/viewer/2022051300/58f2b7741a28ab716f8b4583/html5/thumbnails/73.jpg)
Extra Slides