6 rules for designing recursive dataflows...4.lean on your legacy(dataflow #2) 5.stop! append,...
TRANSCRIPT
Fastly’s Customers“Spotify’s users expect immediate access to their favorite songs, podcasts and playlists, at home, work, and anywhere in between. Fastly is a critical part of our toolkit, helping us deliver an amazing listening experience. Fastly helps us innovate on content delivery and ensure high quality performance for our users around the world.”
Niklas Gustavsson, Principal Engineer at Spotify.
“I’m a huge fan of Fastly. On election night, we have 100,000 requests per second, and Fastly performed flawlessly - we had no problem at all.”
Nick Rockwell, CTO New York Times
“When you do a Super Bowl ad, over 100 million people are tuning in. For us, working with Fastly was all about setting up the website to be scalable and secure, and to deliver a really seamless experience for anybody that came to the site that day. We had tons of traffic, and the site held up and scaled exactly how we hope it would thanks to Fastly.”
Michael Dublin, CEO Dollar Shave Club
64 Net Promoter Score
Fastly by the Numbers
70M+ Lines of Edge Code Deployed Monthly
400B+ Daily Internet Requests
Fastly’s Domo Ecosystem
Data CardsDataFlows
820 DataSets for 280 DataFlows (2 input 40+)
35 Sources2B Rows (1.8 B from
Data Warehouse)
2500 over 115 dashboards(3 have 60+)
Fastly’s Domo Challenges• External System Empowerment Imbalance vs. Domo
• Larger DataSets inspire more, longer Scrutiny and Runtime
• Over-reliance on one giant DataFlow’s output, prefer smaller, more manageable and logic specific DataFlows
• Data Lineage only looks Forward, wish it looked backwards
RDFs:Defines a creation of interdependent DataFlows that are connected by their reliability of trust, dimensions of data and frequency of updates.
Recursion utilizes the output DataSets from these DataFlows to be input DataSets, redefining these connections.
Recursive DataFlows 101
Reliability
Your final output should be as trusted
as your first input
Frequency
Your business logic should define
your recursion
Dimensionality
Your data should be more than a
single point in time
6 Rules to Designing Recursive DataFlows1. Start with Data you trust
2. Define a Successful Update
3. Focus on your Frequency (DataFlow #1)
4. Lean on your Legacy (DataFlow #2)
5. Stop! Append, De-Duplicate and Iterate
6. End with Data you trust
#1. Start with Data You TrustFor Fastly:• Verified Transaction Data from Accounting• Every Day we pull past 5 days of data - Dynamic Input• Every 15th Day we pull ALL objects - Historic Input
For your organization:• Verified Outputs of Complex DataFlows• Large DataSets with constant refreshes• Entire Imports that requires a new dimension
#2. Define a Successful UpdateFor Fastly:• Our Dynamic Input updates with a Line Unique Key & Date• Our Historic Input doesn’t fail after 14 hours• A null value means no new transactions
For your organization:• A new time period, a new historic set of records• Salesforce Accounts were Updated and pushed to Domo• A purchase order was successfully processed
Pause: Known Domo “Oh no…”s• Outputs of DataFlows can’t be their own inputs
• Naming things is hard, tagging our outputs is key
• There are multiple ways to solve problems, this is one solution
• Focus on the recursion, not supplementary logic
• Start simple, then iterate (make sure it works!)
Frequency / Dynamic
The new record of a new time / event partition
Our 5 Day Update
Legacy / Historic
The old records with all time / event data
Our All Time Update
Recursive DataFlows 201
Recursive DataFlows: Naming and Visual Help• Slides will be available afterwards, focus on process
• Mnemonics and color labeling:• Update • Static • Dashed Boxes = Replaceable• Solid Boxes = Final
Frequency
Legacy
#5. Stop! Append, De-Duplicate and Iterate• Recursions: Go! | Loops: Stop!
• First, Append ALL! Then Shared Rows!
• Determine duplicate Data, De-Duplicate your Data
• Iterate! (Some Variant Examples)
#5. Defining Duplicate Duplicates• For Fastly Netsuite:
• Line Unique Key - Only one Transaction Detail should exist
• Composite Keys:• Identifier + _BATCH_ID_ + _BATCH_LAST_RUN_• Salesforce Account ID + SysModStamp• Indentifier + Yesterday’s Date
• Sanity Check: • Production Output # of rows == Historic Input
#6. Ending with Data We Trust
Parallel Thinking
Label everything, simultaneously
Trust
Consistency is reliability is trustworthy
Automated Updates
When your inputs update, so too
should your RDFs
Fastly Successes with RDF• Netsuite Saved Search Import
• ~500k Rows + ~30k Rows a Month• Critical Financial Data• Was: 13+ Hours, failed often, no error explanation• Now: < 13 min, 2 Imports, 2 DataFlows
• Fastly Service Configuration Imports• 1.7M Rows + 3k Rows a Day• Saves on time / stress with incremental updates• Allows to see changes in configurations over time
Take Home Work from Professor Pugliese:• Build your own Dynamic DataSet:
• Define a Successful Update (Time / Event)• Create an additional Output DataSet Object from DataFlow• Make the update the Focus of your DataFlows, not everything!
• Categorically Manage your DataFlows and DataSets• Use Tags to define Input Sources• Use Nomenclature to define types of DataFlows
• Think broader, smarter about your Domo and how its utilized in your company
Key Takeaways• Process time is a function of DataSet size, input connector and
Domo; DataSet size is the largest one you have control over.
• Recursive DataFlows are a challenging, yet rewarding way to improve the consistency and trust in your DataSets.
• Multiple, logic distinct DataFlows give you more control and insight but burden your mental capacity. Trade offs versus one giant DataFlow that does everything.
--Outro SlideCASE WHEN
(`presentation_rating` = ‘Amazing’ ANDCOUNT(DISTINCT `key_takeaways` > 0)THEN ‘Applause’ELSE `Questions?`
END