Download - DNA - Einstein - Data science ja bigdata
Bigdata -> Data Science -> AI,and some $$$ in betweenDNA’s journey in data science & big data
prologue of prologue
you have to have an idea
THE IDEA
ALL OF THE DATA WE HAVE
PROFIT
some datasome data
some data some data
some data
some data
some data
some report
some report
some report
some report some
report
some report
some report
ONE SOURCE OF TRUTH
+ CUSTOMER FIRST
+ AUTOMATE ALL
THE THINGS
WTF?
PROFIT?activities?
who cares
webdata? who cares
Agenda
Prologue: The big thing(s)
The four things of analytics ~ the roadmap on how to do those things
Achievements
Whats inside: AWS good stuff & hype & love
Culture stuff
Upcoming
prologue
The BIG THING(s)
1. Business: it was the omnichannel customera. the ever-more-demanding, influential and independent customerb. rise of need for analytical insight & datac. demanding inf. management and analytics to be operational, not
finance-drivend. stop sub-optimizing the system (customer)
2. Tech: it was cloud, open-source, and data sciencea. suddenly - endless scale & processing powerb. reduced time-to-environment from weeks to minutesc. reduced costd. ability to create intelligent data products that reduce time-to-insight and
time-to-action
hard for humans data science, machine learning data engineering, data pipelines
easy for humans AI / NLP reporting, basic calculus
hard for machines easy for machines
System requirements- Infinite scale- Process 10’000++ messages per sec- Automated deploy & tests- Version control- Pay-for-use, not for-licence- Real-time pipeline, disaster recovery, exactly-once-quarantees- Real-time analytics, sub-second latency for everything- Infinite processing power for data science stuff & large analytical deployments- Array of libraries to make the data scientist’s life easier - Modular, i can change any part of it, being that software or hardware- Secure, EU referendums and Safe Harbour etc.- Pipeline and persistent storage & data platform can be done from scratch to
production in 6 months - Cant cost really anything, since had to scrape a small budget. 3 developers max.
OKAY! SOUNDS FAIR.
Business requirements- Understand the omnichannel customer- Reduce churn- Increase cross-sales- Increase product usage & increase retention- Increase marketing ROI- Insight should be real-time - Actions should be near-real-time and everyone can do them- Know where to put infrastructure better than before- Make sense of unstructured data & text & speech & so forth- Automate 80% of insight / data that was previously done by hand- Your system shall not cost anything- But it should deliver competitive advantage
OKAY! SOUNDS FAIR.
WHAT WOULD MACGYVER DO?
WHAT WOULD MACGYVER DO?
WOULD HE:a) go and buy a licence and servers
and then wait aroundb) build the damn thing from what
he happens to find with zero cost
WHAT WOULD MACGYVER DO?
YES!b) build the damn thing from what he happens to find with zero cost
Achievements & upcomingDone (within a year):Assisted investments & business (1) operations:
xx-xxx mil. / yearDirectly optimized / machine learning (2) -handled
operations: x-xx mil. / yearMachine learning* & Data Science introducedMarketing efforts from weeks to minutesAutomation from 10% to 80%Conversion on direct channels up from 50 to 300
percentAmount of automated & personalized channels
from 1 to 5 (all)One source of truth & self-made
-> we know how it works Ability to handle all types of data
Upcoming 2017:Artifical intelligence (AI)*Chatbots (AI)“Acquistion” of display advertisingUnderstanding speech (AI)Moving from CPU to GPUDNA.FI fully personalized (w/ new concept)
* Data Science -> Machine Learning -> Artifical Intelligence
whats inside
code! (surprise)
clojurepythonc++tensorflowsyntaxnetsparkscalasqlpostgresredshiftec2Rrandom forests3jenkinsansiblecnn / rnn / lstm
jupyteraerospikekafkasnowplowscikit learnmatplotlibalsk meansmllibnumpy, pandas, scipy… etc
COLLECTreal-time
batchomnichannel
COMBINEdigital to brick n mortar
digital to everythingcontext to everything
customer to everything
COMPUTErecommendations
analysisreports
segmentspredictionsdescriptions
next best actionscustomer journey
EXECUTEchurn prevention
cross-salestargeted marketing
customer service efficiencycustomer experience improvement
omnichannel optimizationreact in real time
product development
CONTROLcontinuous deploymentinfrastructure as code
Customer interface layer
Channel layer
Delivery layer
Data / Machine learning layer
Collecting layer
realtime 1.3T batch ~ 100gb
-> to redshift, we load 5’511’649’731 rows
Why redshift? reporting on top of raw data;17’072’941 rows joined to 110’773’366 rows joined to 24’945’364 rows joined to 2’297’076 rows joined to 1’841’262 rows + some dimensions and result returned in < 10 sec -> no db-admins, no indexes, no “tuning”
Class: TV, LiigaRank: 0.87, 0.90
What happens in social media? What is talked
about?
What’s wrong?
from reporting sales to reporting potential(and the ways of going from potential to sales)
R is still goooood.And jupyter.
ALS recommendations /w 1.3 T data = good
1 0 1 1 0 1 0 0 1 1 1 0 1
ALS recommendations /w 1.3 T data = good
1 0 1 1 0 1 0 0 1 1 1 0 1
culture stuff
more important than you’d think
http://www.slideshare.net/reed2001/culture-1798664/
http://www.slideshare.net/reed2001/culture-1798664/
MacGyver (remember?, what would MacGyver do) = The thinker-doer
- Usually development methods split thinkers (project managers, scrum managers, product owners and the lot) with doers (developers, analysts)
- This is (mostly) shit- You’d need people leading who also know their stuff
- Saves money, time and nerves- People communicate better
- Thinker-doers can communicate with business and translate to development actions, even develop the things themselves
Demos & openness = The secret sauce to success (and freedom to do more stuff)
- We sit on the “business floor”, right in between of basically everyone- And we almost always have something displayed on a screen- We make it easy to come and talk to us- We make demos available to everyone- We connect
- This makes all the difference
always connected kindergarten - no output but loads of fun if done right, ultimate success
forced connection (procedures!) basic IT waterfall project basic IT “agile” project
never connected cave-people? chaos
nothing changes (or we close our eyes that it does)
everything changesall-the-time
business - IT alignment
Bigdata/AI
Business
Directors* are doing their own marketing automation activities without any help
*ping Solita, how many directors code...
And now, we have business even writing their own code! (no, really)
upcoming
1st try: word2vec + naive bayes2nd try: convolutional neural net3rd try: LSTM/RNN
4th try: syntaxnet5th “try”: -> include speech recognition6th try: spaCy
7th try, part I: latent dirichlet allocation8th try: ?
Nth try: ?
Now?
in a good place. can’t fully disclose what we’re running though. :)
basically we can understand both speech and written natural language so that the language can “flow” and it can be in a chat context or in longer formats;
ex:- hi do you happen to have iPhones on stock?
- yea!- cool. what’s the price? <- have to link to previous parts of conversation
NB! this is quite simple in English but tear-your-eyes-off-to-scratch-your-brain* -hard with Finnish. we might be the first ones actually there.
*modified from: Friends, 1995, The One with the Baby on the Bus
Lessons learned
Understand the BIG THINGS (cloud, open source, omnichannel customer, data science, time-to-x)
Sit where business sits. And sit together. DO STUFF TOGETHER.
Don’t use project managers who can’t code (or who are not really good in the subject domain).
Apply advanced analytics to automate 80% of small decisions made all the time.
Continuous communication beats meetings. Don’t meet.
At least start with AI. dont just tweet about that shit.