sds podcast episode 265: data science in the world of … · amazon on who's hired and...

45
SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF BIG DATA

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

SDS PODCAST

EPISODE 265:

DATA SCIENCE IN

THE WORLD OF BIG

DATA

Page 2: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: This is episode number 265 with top instructor in the

space of big data, Frank Kane.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur. And each week we bring you inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today

and now let's make the complex simple.

Kirill Eremenko: This episode of the SuperDataScience podcast is

brought to you by our very own Data Science Insider.

The Data Science Insider is a weekly newsletter for

data scientists, which is designed specifically to help

you find out what have been the latest updates and

what is the most important news in the space of data

science, artificial intelligence and other technologies. It

is completely free and you can sign up at

superdatascience.com/dsi. And the way this works is

that every week there's plenty of updates and

seemingly important information coming out in the

world of technology. But at the same time it is virtually

impossible for a single person on a weekly basis to go

through all of this and find out what is actually really

relevant to a career of a data scientist and what is

actually very important. And that's why our team

curates the top five updates of the week, puts them

into an email and sends it to you.

Kirill Eremenko: So once you sign up for the Data Science Insider, every

single Friday you will receive this email in your inbox.

It doesn't spam your inbox, it just arrives and has the

top five updates with brief descriptions. And that's

what I like the most about it, the descriptions. So you

Page 3: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

don't actually even have to read every single article. So

our team has already read these articles for you and

put the summaries into the email so you can simply

just read the updates in the email and be up to speed

in a matter of seconds. And if you like a certain article,

you can click on it and read into it further.

Kirill Eremenko: And so whether you want great ideas that can be used

to boost your next project or you're just curious about

the latest news in technology, the Data Science Insider

is perfect for you. So once again, you can sign up at

www.superdatascience.com/dsi. So make sure not to

miss this opportunity and sign up for the data science

insider today and that way you will join the rest of our

community and start receiving the most important

technology updates relevant to your career already this

week.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies

and gentlemen, super excited to have you back here on

the show today. And the guest for today is somebody

who I've wanted to interview for quite a while now,

Frank Kane. Frank is an expert in the space of big

data. He worked at Amazon for over a decade and you

might actually know him quite well from his courses

on Udemy where he's one of the top instructors in the

space of data science and big data. And today's

conversation was very interesting because we

approached it from two spaces, from the space of data

science and the space of big data. And in this podcast

you'll find out how the two areas have been different

but are now slowly but surely converging into

something that is very intertwined and why it is

Page 4: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

important or why it is becoming more and more

important for a data scientist to be well adept in the

space of big data as well.

Kirill Eremenko: Also in this podcast we will talk about Frank's

background, which was very interesting spending over

a decade at Amazon and working on lots of different

systems. There you'll find out very useful tips on

recommender systems such as user-based and item-

based collaborative filtering as well as other types of

recommender systems and where this space of

recommender systems is going. So you can probably

already tell that this podcast is quite heavy on

recommender systems. So if that's your thing, then

this podcast is definitely for you. And you also find out

why recommender systems are important across all

spaces, not just in retail, so how many different

industries can use recommender systems.

Kirill Eremenko: We'll also touch on singular value decomposition or

SVD model based methods, deep learning and Amazon

DSSTNE. And finally towards end of this podcast we

will talk about hiring. So Frank had a huge say at

Amazon on who's hired and who's not hired into the

teams and he's got some really exciting tips to share

with you on this podcast. So can't wait for you to

check out all the great insights from Frank here. And

without further ado, I bring to Frank Kane, one of the

top experts and instructors in the space of big data.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies

and gentlemen, super excited to have you back here on

the show because I've got a great colleague of mine

and a great online instructor and entrepreneur. On the

Page 5: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

phone, Frank Kane calling in from Orlando, Florida.

Frank, how are you doing today?

Frank Kane: Doing Great. Kirill, how are you?

Kirill Eremenko: Doing well as well. Such an honor to talk to you again.

We met at Udemy Live, I think it was last year and had

some interesting chats and now we're here on the

podcast. How's things been for you over the past

almost year now?

Frank Kane: Yeah, it's been going great. Things continued to grow

and as I'm sure you know, there seems to be a

boundless demand for online education in the fields of

data science and machine learning and big data. So

we're all kind of riding that wave.

Kirill Eremenko: Yeah, yeah, for sure. And exciting to see new courses

popping up from you. And you mentioned you're

working on some really exciting things right now. What

are the courses that you working on right now?

Frank Kane: Well, I just released a update to our Elasticsearch

course. So kind of lately I've been focusing on the big

data side of things and Elasticsearch is a really

interesting technology that kind of diverged from its

original purpose. That's kind of that the cool thing

about it. So you hear about elastic search and you

think it's just research engines, right? Like powering

search on Wikipedia or something. But it's sort of

morphed into this tool for doing large scale data

analytics and web log dashboards and things like that.

So that's the latest thing I've been up to. Prior to that I

released a new course on recommender systems,

Page 6: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

which my time is something we want to talk about as

well.

Kirill Eremenko: Yeah, very cool. And a big shout out goes to Manning

Publications for helping us arrange this podcast. And

it's really funny, like you mentioned, they reached out

to us to arrange the podcast and to promote your new

work while we already knew each other. So like you

said, it's very serendipitous how these things happen

sometimes.

Frank Kane: Yeah, I love that word. Serendipitous. And I mean,

that's a big part of what we do in recommender

systems too, is what we call serendipitous discovery.

This is like a serendipitous connection, small world

kind of a thing.

Kirill Eremenko: Awesome. Awesome. Okay. So yeah, we've got so much

to talk about. You have such a broad, I mean, such an

interesting career path with your time at Amazon and

how you moved to courses. So to kick us off for, I'm

sure a lot of our listeners, those of you who take my

courses on Udemy or SuperDataScience or those of

you who take Frank's courses, there's a huge overlap

in the sense like there's a lot of people who already

know you, but for somebody who doesn't know you or

doesn't know you well, give us a quick rundown, who

is Frank Kane and what has your career being like?

Where has it taken you?

Frank Kane: Yeah, man, it's a long story. I kind of started off as a

software engineer in the video game development

industry of all things. And from that, I went on to

developing flight simulators and one day I got a call

Page 7: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

out of the blue from amazon.com in Seattle. And they

said, "Hey, you know, we're looking for good engineers.

Do you want to do a phone interview?" I'm like, "Sure."

And next thing I knew I was moving to Seattle. Right?

And they hired me into their personalization

department and that's basically what we call

recommender systems today. So this is back in 2003, I

think. So real early days of this field and we didn't

even call it-

Kirill Eremenko: Yeah. Data science didn't even exist back then.

Frank Kane: That's exactly what I was going to say. That wasn't

even a thing. That term wasn't even coined yet, but we

were doing it. It was kind of a-

Kirill Eremenko: Yeah, the minor seventh year of data science.

Frank Kane: Yeah. And we were kind of inventing it as we went.

Right?

Kirill Eremenko: Yeah.

Frank Kane: So it was exciting to be a part of that. And yeah, I

stuck it out at Amazon for 10 years, almost 10 years

anyway.

Kirill Eremenko: Wow.

Frank Kane: And work my way up from software engineer to a

senior manager. And by the end of my career there, I

was actually running the engineering department of

IMDB.com, which is a subsidiary of Amazon. So that

was fun. It's a big movie website if you're not familiar

with it. But yeah, after 10 years, it was time for

something new. My family was itching to get out of the

Page 8: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

rainy environment of Seattle. So we decided to make a

go of it on our own and packed up and moved down

here to Orlando and been working for myself ever

since.

Kirill Eremenko: Yeah. Orlando is great. Right? I was there once and

you guys have Universal Studios parks, theme parks

there, right?

Frank Kane: Yeah. Universal, Disney World, Sea world. It's a

definitely a fun place to be, especially if you have kids.

Kirill Eremenko: That's awesome. How many kids do you have?

Frank Kane: Two daughters, they're both grown up now, but when

we moved here, they were still young enough to enjoy

it. So it's been fun.

Kirill Eremenko: That's awesome. Okay. And so 10 years in Amazon,

amazing. Really, really cool. And you moved there from

a software engineer or senior manager and then

managing a whole department at IMDB. How was that

like? How was it like working at Amazon?

Frank Kane: It was exciting. I mean, the thing that I love the most

about it was that you're always surrounded by really

smart people and you're never going to have a problem

finding people that are smarter than you to learn from.

Right? So a lot of people say that if you're not learning,

you're in the wrong job, right?

Kirill Eremenko: That's true.

Frank Kane: So you're always learning at Amazon because they're

just so picky about who they hire. And there were just

some amazing people there that you can learn new

Page 9: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

techniques, new ways of thinking from, and not just in

engineering too, right? Also from the business side,

just being able to sort of absorb how Jeff Bezos thinks

in itself is hugely valuable as well. Right? So it was

tricky sometimes.

Kirill Eremenko: Got you. Did you ever get to meet him?

Frank Kane: Yeah, yeah, quite a bit. I mean, back then Amazon was

a much smaller company than it is today. So we were

all in the same building and you'd to find yourself in

the men's room next to him for all you knew. But yeah,

I had a lot of meetings with them and got to talk to

him quite a bit actually.

Kirill Eremenko: What was he like as a person?

Frank Kane: He's intense, but super smart. Definitely the smartest

guy that I've ever met in my life and that's saying a lot.

But yeah, just his ability to sort of analyze any

situation and just be right about it really quickly is

pretty admirable.

Kirill Eremenko: That's awesome. That's fantastic. All right. And so then

you moved to Orlando and you founded Sundog

software. Why the name Sundog?

Frank Kane: Oh, that's a long story. So it actually has nothing at all

to do with data mining, sorry data science or machine

learning. After I left Amazon, I had a noncompete

agreement like a lot of people do, so I couldn't really do

anything directly related to what I was doing an

Amazon. So instead I got into the field of visual

simulation, basically making a three-d simulations of

clouds and weather and oceans for simulation and

Page 10: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

training products. So that's where Sundog Software

came from. A Sundog, if you don't know, it's actually a

atmospheric effect that is like a rainbow on either side

of the sun, under certain conditions. So since I was

building software that stimulates the sky, we kind of

drew our name from that because it was basically the

only thing that wasn't trademarked yet. So that is the

genesis of the Sundog. It wasn't actually named after a

dog.

Kirill Eremenko: Got you. Okay. And so you were providing, creating

this software for simulation and how did that morph

into online education? I'm always curious about these

stories, because so far nothing in your story even

flagged that you are going to be a super successful

online instructor. When did that transition happen?

Frank Kane: Yeah, I didn't see it coming either. So I mean, how did

it go down? Basically, after I quit Amazon and decided

to go on my own, I was kind of freaking out. Right?

Because I left behind these hugely valuable stock

options and stuff and I came down here with enough

money to get by for a while, but I was still pretty

nervous about it. Right?

Kirill Eremenko: Yeah.

Frank Kane: If you've never been self-employed before, it's a very

scary thing to jump into. So I started doing some

freelance work on the side to sort of supplement what I

was getting from selling my own software that I had

written. And one of those freelance gigs was actually

doing curriculum development for a company called

General Assembly in New York City. So they were

Page 11: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

looking for someone to put together a data science

curriculum for an in-person training class that they

were putting together. So I did that and somehow,

because I had this Amazon pedigree, they plastered my

face all over their websites saying, "This course was

developed by an Amazon guy." And basically, so what

happened then was someone from Udemy was trying

to recruit new instructors in the field of machine

learning and data science. And they somehow found

me spelunking on the Internet and gave me a call out

of the blue and said, "Hey Frank, we're looking for

instructors on Udemy to teach big data and data

science topics, want to give it a shot?" And I'm like,

"Oh, why not? How hard can it be?" Right?

Kirill Eremenko: Yeah.

Frank Kane: Little did I know. It's actually really hard. But yeah,

that's kind of how my first online course came to be.

They reached out to me and I said, "Well, let's give it a

shot." And the funny thing is, the first course that I

made was really kind of a flop. The first month that we

put it out, it made like 200 bucks or something. I'm

like, "Well, all that crap." Well, we tried. But after

putting in so much effort into a course, I mean, as you

know, it takes many months to put all of these things

together, right?

Kirill Eremenko: Yeah.

Frank Kane: I didn't want to give up on it that soon, right? So I'm

like, "Fine, I'll try making another course and see if I

can sort of like build up on this and not give up quite

yet." And as a result of that, things started to actually

Page 12: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

take off. So it was just sort of a hockey stick of growth

after that for a few years where you kind of have this

compound interest effect where you make one good

course and the students from that course, there are

people that you can sell your next course too and so

on and so forth. And you just keep building upon that

audience. Right? So that's kind of how it all

snowballed.

Kirill Eremenko: Very, very interesting. Yeah, totally, totally can relate

to that story. It's [inaudible 00:14:55] but I guess as

long as you have that inner drive or you get this feeling

of not just accomplishment but fulfillment when

somebody takes your course and feels that they've

learned something and that they can now use these

skills and especially if they tell you about it, if they

say, "Hey, Kirill or Frank, I took your course and I feel

empowered to do something in my job." Or "I actually

already did something with that knowledge and I

finished the project, I got a promotion or I helped a

colleague learn". It really gives you that additional

inspiration to keep moving forward and not to give up.

Would you say that you get that feelings as well?

Frank Kane: Oh yeah. There's so much to keep you motivated,

right? I mean, like you said, just that positive feedback

of how you're actually changing people's lives in a

positive way. I mean, what's not to love about that?

Linkedin has been great for that. Right? Like I'll see

people posting online, "Hey, I actually got this

certification because of you or I got this job because of

you, or thanks for your career advice on getting

interviewed at Amazon. Thanks to you, I actually got a

Page 13: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

job." I'm like, "That's awesome." Everyone wants to

make the world a better place. Right?

Kirill Eremenko: Yeah.

Frank Kane: Yeah, so that's awesome. And also just the scope of

the impact, right? I mean, I had no idea there was so

huge of an audience for this stuff out there in the

world. And if you think about how many football

stadiums you'd have to fill out to put all of our

students in them at one time, it's some crazy number,

right? It's just hard to visualize even.

Kirill Eremenko: That's crazy. Yeah, I'm looking at your Udemy profile.

You have 248,000 students for those out there, it's like

almost a quarter of a million students. That's crazy,

one fit-

Frank Kane: Yeah, that's not new to me.

Kirill Eremenko: Yeah it's just-

Frank Kane: Then there's also Manning and all the other platforms

that we're on too. So it adds up to quite a bit.

Kirill Eremenko: Yeah, for sure. And so what I wanted to touch on here

is that like our area of expertise and area of where we

teach overlaps to some extent, but it's also slightly

different. So you mostly teach in the space of big data,

plus how it overlaps with data science, machine

learning. And that's what I wanted to touch on. With

the passing years since data science came around, big

data up here. These two have been kind of close and

also the relationship between them has been also

developing over the years. So can you tell us a bit

about that? How has the relationship between big

Page 14: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

data, data science and machine learning on the other

hand, how has it developed over the past couple years

and what is it like now?

Frank Kane: Yeah, I mean, kind of my perception of it is that they

start off going in kind of their own directions, right?

And now they're kind of all starting to converge it

seems. I mean, that's kind of my high level take of it.

So originally when we started teaching data science, it

was all about messing around with Jupyter Notebook

on your own individual PC somewhere or individual

Linux host or whatever. And it's messing around with

smaller data sets. And to be fair, you can analyze a lot

of data on one machine if it's a beefy enough machine.

Then we have machine learning, which is off playing

around with the neural networks and stuff these days.

And you can still do quite a bit on a single GPU or a

machine with multiple GPUs.

Frank Kane: And then almost orthogonally, we have this world of

big data where people are using things like Hadoop-

based platforms like Cloudera or Apache Spark and

things like that to distribute the processing of data at

massive scale. And there's been these efforts to kind of

slap one on top of another. Spark has their Spark

MLlib library, for doing machine learning on spark.

Obviously, tools like Cloudera have tools for doing

large scale data analysis using their platforms. But it's

only recently I think that it's starting to converge.

Right? We have things the data pipeline on... Sorry,

the deep learning pipeline on Spark coming out where

you can actually do large scale machine learning and

Page 15: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

deep learning on Apache Spark. So that's coming

together.

Frank Kane: We have TensorFlow being distributed on clusters,

that's coming together. So it seems like there's still like

10 different ways to do everything, but at least we're

starting to all come together at the same thing, that

it's not just about data science, it's not just about

machine learning, it's not just about big data, it's

about doing machine learning in a big data

environment.

Kirill Eremenko: Right. Why do you think now, why is the time now that

they're converging?

Frank Kane: That's a great question. I mean, I think it's just sort of

a natural process that's happening. There's definitely a

lot of interest in market forces, that are behind this.

But really I think it's just that these technologies have

all been maturing at a similar rate and now they're all

at a point where they're like, "Okay, how do we all get

together and do something even better together?"

Right?

Kirill Eremenko: Yeah. Okay. Fair enough. Fair enough. What has been

your favorite course to teach? What has been your

favorite topic to share with the world?

Frank Kane: Oh, I always have a soft spot for recommender systems

because that was kind of what I specialized in that my

time at Amazon. So if I had to choose one child that I

love the most, it would probably be my recommender

system course.

Page 16: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: Okay. Got you. So you did recommender system at

Amazon, are you able to tell us a bit about that? To go

into a bit of detail about sharing any IP or sensitive

information?

Frank Kane: Yeah, I mean it was seven years ago when I left

Amazon. So everything that I can tell you is well

beyond the range of their nondisclosure agreements

because it's history at this point. Right? But there's

still some good stories about it that I-

Kirill Eremenko: Good, let's talk about it. Sounds like it's still a very

relevant and really cool topic and a lot of companies

really enhance their sales. Netflix, Amazon, online

marketplaces, they... Even Udemy itself, right? You

take a course and then you get recommended to other

courses on what to take. So please do tell us about

that. What was your role at Amazon? I mean, what

kind of recommender systems were you exploring back

then?

Frank Kane: Yeah, I mean, let's see. I mean, originally I was

working on things like people who bought also

bought... I actually ran the team for that for awhile. So

if you're shopping on amazon.com and you're looking

at specific items, there'll be a little widget that says,

"People who bought this also bought this." Or people

who viewed this also bought this. Or something along

those lines. So that was kind of like the heart of the

whole thing and this is all published publicly, so I can

definitely talk about it. So kind of like the main

component of doing any recommender system back in

those days, was this item to item similarities matrix.

Right? So we would take these vectors of everybody

Page 17: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

that bought a given item, right? And make this two-d

matrix, I tried to find similarity distances between

every item based on what customers they had in

common. And by doing that you can create a database.

It's basically like, "Okay, here's item ID, whatever

corresponds to this book, and it is similar to this list of

other books sorted in order by similarity." Right?

Kirill Eremenko: Okay. Could you tell us a bit more about that. So how

is the vector created? What are the dimensions of this

vector?

Frank Kane: Well, it's a very, very sparsely populated matrix, right?

So the main problem of recommender systems is that

most people did not buy most items. So a given person

only bought a very, very small percentage of everything

that Amazon sells. Right?

Kirill Eremenko: Yeah.

Frank Kane: So, basically these are all sparse vectors that you

think of as a matrix, but when you actually get down

to the code of actually constructing that matrix it's not

really a two-d matrix. Basically you have customers on

one dimension and items on the other dimension,

right? And you just try to find how it's all interrelated.

Kirill Eremenko: Got you. Okay.

Frank Kane: Yeah, I mean that's kind of like the building block for

doing other cool stuff because once you know what

items are similar to other items, first of all, that's a

very permanent relationship, relatively speaking. So a

math book will always be similar to another math

book, is how we put it. These relationships aren't going

Page 18: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

to change overnight. So you can get away with

computing that relatively and frequently. Right? And

once you have that, you can actually do things like

build up personalized recommendations by saying,

"Okay, here's the vector of everything that I personally

have liked either by buying it or looking at it or reading

it or something." Some indication of interest, I can go

out and get all the similar items that are similar to

everything that I expressed interest in, de-duplicate

those, score them and that becomes your personalized

recommendations. So that's what we call item-based

collaborative filtering, basically.

Kirill Eremenko: Okay. Got you. So that was back then. How have

recommender systems progressed now, in the courses

for instance, you teach these days, how are they

different?

Frank Kane: Yeah, I mean obviously the thing that's changed

everything has been the advent of deep learning, right?

So, now the modern way of doing it is to actually build

a big deep neural network. And again the challenge

there is getting a neural network to work with sparse

data. But Amazon for one has cracked that nut. They

have a system they've published called DSSTNE. You

can find it on Github, that does that and it works

really, really well. I was actually very impressed with

the results. But it's still hard to beat the old school

way of doing it. Item-based collaborative filtering still

produces great results. So while it is true that a deep

neural network can be a great tool for solving just

about any machine learning problem you can dream

Page 19: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

up, these simpler approaches still give it a run for its

money.

Kirill Eremenko: Yeah. And also they're more cost effective I guess in

terms of computing power and time, to create and

things like that.

Frank Kane: Oh, absolutely. I mean, you'd be amazed how little

computing power we needed to actually produce those

item to item similarities, because it was all very highly

optimized code written in C. It was really, really tight.

But we used to really... A very Amazonian way of

thinking is to really favor simple solutions over more

complex solutions given the choice. Right? So given a

solution that will run on one system versus one that's

going to run on a hundred, if the end result to the

customer is going to be the same, we're going to take

the simpler solution because it's going to be easier to

maintain.

Kirill Eremenko: Yeah. Makes sense. So is that just a question of how,

it's not a same result if it's only 80% of the original

results and that's the question. Do you use the

simplest solution and get 80% of the results or do you

go for the more complex one, aim for the 100% of the

result? That's kind of the trade off, but probably-

Frank Kane: Yeah. I mean we definitely spent a lot of time trying to

squeeze every percentage of approving that we could

get out of it. Because it was such a huge lever. Right? I

mean, you can imagine, I think it's been published

that like 20% of Amazon sales was attributed to

personalization at that time. And that's not really the

real number, which I can't tell you the real number,

Page 20: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

but that's the one that people talked about and it's not

that far off.

Kirill Eremenko: Yeah. It's crazy.

Frank Kane: But, yeah, when you have a lever that big, you think

about how many billions of dollars Amazon makes

every month, about one percent improvement is a

really big deal. Right? So if it really came down to a

more complicated solution will give us a 1% boost in

sales and yeah, we would do that. But generally

speaking, you didn't have to, you know what I mean?

The algorithms themselves can still be relatively simple

and you can still have a simple framework for blending

different algorithms together. So there are ways of

experimenting and trying simple changes and simple

solutions that will achieve those results.

Kirill Eremenko: Got you. And what would you say to somebody who

first of all, do you think any kind of business can

benefit from a recommender system or is it only just

B2C?

Frank Kane: Ooh, well, I wouldn't say any business can, but it's

obviously a useful thing. I mean, it depends mainly on

the size of your catalog, right? So if you like the New

York Times and you have like a jillion articles and

somehow they're all still timely, which isn't actually

the case. Great, a recommender system might help

people find content that's relevant to their interests.

Maybe a magazine would be a more of a relevant

example there, but if you're just running a little like

mom and pop ecommerce store where you're selling

five greeting cards that you've made by hand, a

Page 21: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

recommender system isn't going to be helpful. Right?

You'd be better served just like manually, creating

those pairings, based on your human intuition than by

trying to get built some algorithms. It's not going to

have enough data to work with in the first place.

Kirill Eremenko: Okay. Yeah, I know. It makes sense, makes total

sense. Tell us a bit about the difference between when

you have a recommender system that looks at content,

like for instance, you as an individual, you consume

certain content or you purchase certain items and

then it looks at similarities between items to

recommend to you versus recommender systems that

look at your similarities as an individual to other

individuals. And then it looks like what purchases they

made, what content they consume and makes

recommendations that way.

Frank Kane: Yeah, I mean that's basically what we call a user-

based similar item. User base collaborative filtering as

opposed to item based collaborative filtering. So the

idea of user is collaborative filtering is that instead of

finding similar items, you find similar users by flipping

the problem on its head basically. And then you

recommend stuff that the similar users like that you

didn't indicate an interest in yet. That works too. The

problem is that people are more fickle than things,

right? So before I said that a math book will always be

similar to a math book. But Kirill might not always be

similar to Frank. I might go off and get interested in

astronomy tomorrow and say, forget about all these

data science stuff.

Page 22: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: Which you are, which you are interested in astronomy,

which is really cool.

Frank Kane: That is my latest side hobby for sure. But still sticking

with the big data stuff for now. That's my day job.

Kirill Eremenko: Okay. And so people are more fickle and so therefore

it's harder to create those recommender systems, is

that what you're saying?

Frank Kane: I wouldn't say it's harder. It's actually exactly the same

technique. Just flipping the dimensions, one for the

other, but if the results aren't going to be as good, I

would pause it.

Kirill Eremenko: Got you. Okay. Is there any other types of

recommender system, in addition to the user base and

item-based collaborative filtering, more innovative or

newer experimental types of recommender systems

that you can share with us?

Frank Kane: Yeah, definitely. Before I forget though, on the previous

point, another downside of user based, collaborative

filtering, is that there's usually more users than things

in a given website. So you have a much greater

computing requirements to actually compute user's

similarities and items similarities.

Kirill Eremenko: Interesting. I wouldn't say that about Amazon though.

They have so many things that they sell. I guess it's a

bit debatable question-

Frank Kane: They do. Yeah, I mean, I actually don't know what

their current numbers are, but you're right. It's

probably not that far off at this point. They sell

everything you can imagine they can sell.

Page 23: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: That's crazy-

Frank Kane: I think there's still more people interested in things

that they can buy-

Kirill Eremenko: And there's new things that are popping up. For

instance, I'm here, I'm in Bali right now and people

use this thing called Ali Express from China. I'm not

sure if it's related to Ali Baba or not, but then there's

also Alibaba, there's Ebay and Amazon seems to be... I

was thinking about this the other day, Amazon seems

to be very dominant in the US, Australia, now they are

in Australia as well. Some European countries, but

more in the Asia space, in the Asian market,

something that people don't recognize or realize that

there's these other players that are gaining so much

momentum that are growing so fast that there's some

countries here, where the people haven't even heard of

Amazon and yet they're shopping online, buying

everything. Even in China, what's it called? That

platform WeChat. I think if you can get anything on

WeChat. You can get a car wash through WeChat, it's

ridiculous. It's crazy how big these things have gotten

and yet we just simply don't hear about them for now

until they come and start disrupting the normal world

that we are used to living in.

Frank Kane: Absolutely. I mean, right when I left Amazon was when

they were trying to get into the Asian market a little bit

more. And I mean, it's been a real challenge for pretty

much every US tech company that I can think of.

Right? I mean, it's just a completely different political

climate, completely different culture. And unless you

partner with a big company that's out there existing

Page 24: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

already, which is hard to do by the way, it's hard to

break in there for sure.

Kirill Eremenko: Yeah, yeah, for sure. All right. What about the different

recommender systems like new, innovative?

Frank Kane: Yeah. I mean, kind of the thing that evolved after

collaborative filtering was what we call model-based

methods. So basically matrix factorization. So the idea

is if you can think of the recommendation problem as

multiplying two matrices together, that's basically like

your matrix of interests as an individual by some

matrix that ties those interests to other things. That's

just another way of approaching the problem basically.

So we have things like a SVM that are used for that,

SVD rather. SVD plus plus is a specific variation on

SVD that's used for recommender systems that has

really good results.

Kirill Eremenko: What does SVD stand for?

Frank Kane: Singular value decomposition. So basically it's a

matrix factorization technique. But yeah, I mean that

was basically one of the winning approaches and what

they call the Netflix Prize a while ago, I don't know if

you've ever heard of that one.

Kirill Eremenko: Yeah.

Frank Kane: So Netflix put out this, I think it was a $1,000,000

bounty was it? If I remember right. For anyone that

could like make a recommender system that was, I'd

have to look at the number, but I think it was 10%

better than what they had measured by RMSC score.

And as I recall the winning entry actually used SVD as

Page 25: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

part of their solution. It was actually more of a hybrid

approach. But that was part of how they did it.

Kirill Eremenko: Yeah.

Frank Kane: So that was kind of like the next generation of

recommender algorithms at that point. And after that

we entered the age of deep learning, right? So now it's

all about, "How do I use a neural network to solve this

problem?" And that's where we get into things like a

Amazon DSSTNE. And that's also how companies like

YouTube are doing as well. They published a really

interesting paper that details exactly how they're doing

their recommendations, using a deep neural network.

Kirill Eremenko: Why do you think they're not afraid to disclose their

intellectual property like that?

Frank Kane: Well, I mean, they're part of Google and Google's

always kind of had this open academia-friendly stance,

right? So I think it's mostly just a company culture

thing. Plus they realize that nobody has their data. So

one thing that I learned at Amazon is you can have...

The quality of your data matters way more than the

quality of your algorithm. In Amazon, if you know

everything that everybody's actually bought that

they've actually spent their money on, you're not going

to get better data on their actual interest in that.

Right? So having that powerful interest data to start

with, means that you can do pretty much anything on

the algorithm side and still get awesome results. And I

think the same is probably true of YouTube as well.

They actually know if you're actually watching a video

and for how long did you actually stick with it all the

Page 26: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

way through and they can use that view data to

actually figure out what you're actually interested in.

Right?

Kirill Eremenko: Yeah. This ties into an interesting question, that

value... And this is for real business owners out there

and for heads of departments and executives. The

value is not in your algorithms, the value is in your

data.

Frank Kane: Right.

Kirill Eremenko: I find, still to this day, companies sometimes sit there

and think that they're going to create some miraculous

world changing algorithm. They're super protective of

it. They either patent it, or in most cases they keep it

as, from what I understand that they keep it as a trade

secret so that nobody even [inaudible 00:34:50] get

access to it. But realistically, we live in a world where

Google publishes more than one research paper per

day about machine learning, AI, computer vision deep

learning. So per day, That's crazy. So there's no way,

and that's all open sources. So Python-based,

predominantly TensorFlow or PyTorch for Facebook.

Those things are open source. You can go and

download them and there's no way you're going to beat

Google.

Kirill Eremenko: There's no way you're going to invent something that's

so bespoke that Google's never going to be able to

create that on their side. And it's just going to take so

much resources and effort from the perspective of a

small, medium, even large business. It's just much

easier to go out there, read these research papers,

Page 27: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

track what you need, apply it. It doesn't matter that

it's open source because at the end of the day, the

value's not an algorithm, the value is in the data that

you have.

Frank Kane: Absolutely. I think another motivation for them to

share this research is from a recruiting standpoint too,

right? They want to get smart engineers out there,

learning about how to use their systems and get

excited about them and hopefully they can recruit

them to work at Google. I mean, that's ultimately their

goal. I mean, that's really the number one concern of

these tech companies. They just cannot hire enough

experts in these fields to meet their demand.

Kirill Eremenko: Yeah, yeah, totally. And for recommender systems,

we've seen this evolution that you kindly walked us

through on how they've changed. What I'm noticing is

that they're getting really good. They're getting crazy.

As a user, I go on Netflix and I... Something pops up

and I'm like, "Whoa, that's really cool. I didn't even

know that existed. So glad that I found out about this."

Or I give this example, I think couple podcasts ago

where my mom has a special relationship with

YouTube that she just doesn't even search for videos

herself. She just relies on youtube to recommend

things. And then she already knows she's going to love

it. And she just goes with the ball and just watches

whatever recommendation comes up. And so whenever

somebody else touches her iPad, she gets a bit

protective of it because because she doesn't want-

Frank Kane: Yes, I have a feeling.

Page 28: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: My dad's interest in her youtube because that's going

to mess with her recommender system. So examples

like that illustrate that they've gotten really good, very

powerful and they know sometimes us better than

ourselves. What kind of future do you see for

recommender systems? Where's this whole space

going? If it's already that good, what can we expect to

appear next?

Frank Kane: Well I think you're right in that the algorithms aren't

going to get that much more better. Already I would

say that the difference in quality between deep

learning systems and some of the older systems or

matrix factorization are pretty minimal, quite honestly.

Really comes down to the quality of the data, like you

were saying. So the big leap forward is going to be as

people amass more and more of this data to learn

more and more about you. But now we're like starting

to get into this world of ethics, right? And privacy. So

it's going to be interesting times for sure. Because at

the same time, we don't want these... You don't want

YouTube to know everything about you necessarily,

but you still want good recommendations from

YouTube. Right? You can't have both.

Kirill Eremenko: Yeah.

Frank Kane: So, I'm not really sure how that's going to play out

right now. It's an interesting time for that.

Kirill Eremenko: What do you think of this notion? I was discussing

this with somebody, I think a few podcasts ago as well,

but I'd love your opinion on this, that 100 years from

now, privacy will be such a foreign concept. People will

Page 29: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

be looking back on it and be just thinking, "Why was

this even a thing? What did privacy even mean?

What's the definition of privacy?" Because we're so

rapidly moving to a world where people, especially

millennials, are trading in their privacy and anything,

any information they have on themselves, trading it in

for better services, better products, better user

experiences. And that's not even a question to them.

Kirill Eremenko: So this whole privacy issue, from my conversations, I

see it as, I'm more of a... My generation, older

generations that that's a concern for us. But the new

generations are coming around, they don't really worry

about that stuff so much. So right now, yes, there's

some legal and struggles and barriers that are being

put in place, but there is a theory that in 100 years

from now there will be no such thing and everything

will be completely publicly available, fully exposed.

What do you think?

Frank Kane: Yeah, I mean I think like you said, the younger

generations are already there. They don't really have a

concept of what privacy even means, right? At least

online. They definitely want physical privacy still, but

online, it's not even a thing. It's not a concept. What

does that even mean to them? I don't know. So I think

we're already there to some extent, honestly. The

question is, what do we do with all that information

that people have given up? And if government started

abusing that information, to persecute people or

something, then people are going to care about privacy

real fast. Hopefully that won't happen. But the other

thing too is, we're using all this personal information

Page 30: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

to... This is a very real problem right now, filter

bubbles, trying to create these echo chambers online.

Where we're using a lot of the same technologies that

we developed way back in the day to try to recommend

better books to you to figure out what are your

interests personally and how do we connect you with

more news and information and people and viewpoints

that are consistent with what you already like.

Frank Kane: This is how you end up in these online bubbles, right?

And that's very much a pressing issue right now. And

you have people quitting Facebook because they don't

want any more part of it. So that's what I'm kind of

talking about when I say, it'll be interesting to see

where this all goes. I mean, I myself quit Facebook in

January because of this stuff and I know a lot of my

friends have as well. So as for millennial they-

Kirill Eremenko: Tell a bit more about that, I didn't know you worked at

Facebook.

Frank Kane: Oh, no. I mean I meant I quit Facebook as a user.

Kirill Eremenko: Oh as a user. Okay, yeah.

Frank Kane: Yeah. I deleted my account.

Kirill Eremenko: Yeah. Got you. No, yeah, definitely some of these

things that are very controversial. Yeah, it'll be

interesting to see where it goes. But one question that

you might be able to help guide our audience in the

right direction is, if somebody wants to get into the

space of recommender systems, right? There's lots of

spaces in data science, machine learning, deep

learning that are, sorry and big data that are very

Page 31: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

exciting. But I guess recommender systems is one of

those that is kind of on the verge of these converging

or on the overlap of these converging areas that we

talked about of big data. In recommender systems,

there's often these big data, there's a lot of data.

Kirill Eremenko: At the same time machine learning and data science, it

could be an interesting place for people to dive into if

they want to be in between these fields. So what would

your advice be for somebody who wants to get into

recommender systems, but doesn't have much

experience in the space? Zero to not much. Where

should they start? What should they look into? And in

general, how would you recommend going about

getting into this space of recommender systems?

Frank Kane: Well, I would say first and foremost to be a good

software developer. When I was at Amazon, we hired

software development engineers primarily. We didn't

really care what their specialization was, we just cared

that they were smart enough to write code and do it

well. And we figured if you can do that, you can learn

anything because this stuff changes every freaking

day. Right? So we didn't really focus on hiring people

for specific skills. Like in my case, they hired a guy

that did visual stimulation in video games and just

taught him how to do this stuff when he came in. So

step one is to be a good software engineer and maybe

that means Python, if you want to start off easy, that's

certainly, it's still a great choice, but just get proficient

in some sort of programming, if you aren't already.

Frank Kane: Beyond that, you're going to need some background in

linear Algebra. To understand the algorithms, you

Page 32: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

need to have at least that level of mathematical

background to understand what's going on. Right?

And from there you can start to actually learn the

actual algorithms and techniques, either from my

course or a book or however you want to do it, or

online resources. Everyone learns different ways.

That's cool. And then you can actually start playing

around with small datasets, on your own PC. One that

I like to use is called the Movie Lens Dataset. I don't

know if you know that one. Basically they have...

Really? Yeah, go to a grouplens.org, I think it is. And

they have this a free Dataset of movie ratings that I

love to play with, probably because I used to work at

IMDB. So I have a soft spot for movies but they have

different sizes you can mess with.

Frank Kane: So they have like 100,000 reading data set and then

they have a 20 million dataset. And so you can work

your way up to bigger and bigger data, but you can

start just playing around, on small datasets, get a

sense of how these algorithms work, experiment with

them, try different ways of doing it. That's really what

it's all about. Just experimenting with different ideas

and different tweaks and different parameter tuning

and well, hyper parameter tuning I guess, that's the

technical term for it all these days. And then you can

think about scaling it up, right?

Kirill Eremenko: Yeah.

Frank Kane: So then you can start to think about how do I blend

this with tools like Apache Spark. If I'm going to be

using a neural networks, can I use TensorFlow to

distribute this across a cluster? That would be kind of

Page 33: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

the final stage. And once you're at that stage, I would

say, start messing around and do some freelance

work. Prove that you can actually do this and build

something. And at that point you will probably be able

to find a job in this field.

Kirill Eremenko: So the jobs are there, people want to hire people for

recommender systems?

Frank Kane: Yeah, I mean that's just central to a lot of the big

technical companies out there, right? I mean, Amazon,

we talked about huge part of their revenue, YouTube

huge part of their views. Netflix, it's what they're all

about. Their entire company is about

recommendations, fundamentally. They're just built

around the whole thing. And a lot of people don't

realize that. Yes, I mean, deep neural networks are

hot, but really it's recommender systems that these

companies are built around and they cannot find

enough people who know this stuff.

Kirill Eremenko: Yeah, no, that's really great advice. Thank you so

much. At this stage I wanted to shift gears a little bit

and talk about what you mentioned just before we

started the podcast that at Amazon you were part of

the hiring and recruiting process. We'd love to learn a

bit more about that and maybe there's some tips and

tricks you can share for people to get hired at Amazon

or maybe even beyond that.

Frank Kane: Yeah, definitely. Yeah, so part of my duties at Amazon

is I was what they called a bar raiser. And this is

basically a role where you spend a lot of your time

doing interviews, both phone interviews and in-person

Page 34: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

interviews, mostly in-person interviews. So whenever

there's an interview loop at Amazon there was one

person on that loop called a bar raiser that interviews

you and it's not necessarily someone that's in the team

you're interviewing with or even the same department.

Frank Kane: Their role is to sort of make sure that Amazon

standards for hiring are being applied consistently

across the entire company. So I was that guy. So it

meant that I had veto authority over every hire that

came across my desk basically. And I led all the hiring

discussions where we decided whether or not to hire

someone. Right? So a lot of influence there. And as a

result, I ended up interviewing over a thousand people,

I think while I was there or some crazy number.

Kirill Eremenko: Wow.

Frank Kane: Yeah. So as far as tips go for getting into Amazon, my

number one tip is to always think in terms of the

customer. It's not just lip service when Amazon says

that they're customer focused, it really does permeate

their entire culture. And anytime that you can tie a

question or a problem that you solved from the

viewpoint of the customer, you're going to get major

brownie points. All right. So anytime you're asked to

design a system, work backwards from the customer

experience, start with what will the customer get out of

this system? What did they want to see? What are

their requirements? How fast does it need to be for

them? Right? What results did they want to see? And

then figure out what technology you'd have to build to

deliver that experience. Don't start from the bottom,

don't say, "I know this cool algorithm and I would use

Page 35: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

this cool algorithm and build it out and hopefully

customers would like it." That's the wrong answer. So,

always start with a customer experiences is tip

number one.

Kirill Eremenko: Great tip, great tip. What else?

Frank Kane: Well, you can go online and look for Amazon's

leadership principles. And customer obsession is

number one, but there's others as well. And I would

just encourage you to familiarize yourself with all of

those leadership principles. The other ones are

ownership, invent and simplify, write lot, learn and be

curious, insist on high standards, think big and really

internalize what these all mean and come up with

stories that you can talk about where you've exhibited

these qualities on your own. Because again, you're

going to get a lot of interviews with managers and a

bar raisers like myself who aren't necessarily part of

the team that you're interviewing, you're going to be

put on. And these are the things they're really looking

for. Do you fit with Amazon's culture and way of

thinking?

Frank Kane: Obviously you need to be technically competent as

well. It's going to be a very long and grueling day there

writing code on the whiteboard and solving design

problems on the whiteboard. So by all means, you

have to be ready to do that. You have to have really

strong coding skills, really strong systems design

skills. That's going to be the case for any interview.

But what's different about Amazon is they actually

care about what they say about their values and

Page 36: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

principles that they live by. And you need to

demonstrate that yourself.

Kirill Eremenko: Very interesting. What would you say has been the

biggest mistake that you've seen recurring on the

entries that people make?

Frank Kane: Oh man, you'd be amazed. It's just like not knowing

how to code.

Kirill Eremenko: No way.

Frank Kane: Yeah. You'd be amazed how many, especially in phone

interviews, usually they get weeded out by the time

they actually come in the house.

Kirill Eremenko: Yeah.

Frank Kane: But we used to have a... Have you heard of fizzbuzz?

Kirill Eremenko: Nope.

Frank Kane: Okay. This is one of the interviews questions that we

use for screening out people and it's widely known, so

I'm not giving away anything secret here. The problem

is this, iterate through the numbers one through 100

and write code that if it's an even number of print fizz,

and if it's an odd number of print buzz or something

like that. I forgot the exact structure of the problem,

but it's just that simple, right?

Kirill Eremenko: There's no catch trick?

Frank Kane: No, that's it.

Kirill Eremenko: Okay.

Frank Kane: I'd say about 5% of the people couldn't do it.

Page 37: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: No Way. That's like a five minute exercise.

Frank Kane: Yeah, yeah. You'd be amazed. So make sure you can

write code guys. That's my main tip. But beyond that,

just make sure you're well rested. A lot of people come

in kind of like low energy because they flew from

someplace far away the night before and didn't have

enough coffee or whatever. But you just got to have a

lot of stamina to get through the day, if you do come in

house. So make sure you're arrested, drink whatever

beverages you want to drink to stay alert and whatever

hack you have to do to make sure that you keep your

energy level up throughout a very challenging that

day.

Kirill Eremenko: Very interesting. So know how to code and keep your

energy up. I wasn't expecting those two tips as the

most common mistakes. All right. What would you say

is like the biggest, I don't know biggest advantage of

somebody who comes in for an interview if they have

this skill or have this experience, or can demonstrate

something that... They're almost right away. Everybody

knows, "Okay, this is the person." Have you ever had

that feeling, you see a person, you haven't interviewed

them much, but almost right away you can tell this

person is going to make a great addition to the team.

We definitely want them on board.

Frank Kane: Ooh. I'm always careful in those situations because

sometimes your gut is wrong. Right? I mean, human

brains are fickle things as I'm sure you know, now that

we know how to stimulate them to some extent. So I've

been a manager long enough to know that it is very

easy to make bad hiring decisions on someone that

Page 38: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

looked great on paper or came across as very

charismatic. Right? You really need to separate that

charisma for how are they going to be able to interact

with your team? Are they got to be a "team player"

That doesn't have a huge ego to deal with, things like

that. So I've never been in a situation where like, "Oh

my God, I talked to this person once and we absolutely

have to hire them right now." But after two or three

interviews, yeah, there've definitely been cases where

I'm like, "We really got to get this person here. Pull out

all the stops, make them an awesome offer. Whatever

they want, give them twice that."

Frank Kane: But when it comes to stock grants and things like that,

they often had quite a bit of discretion as to what they

can offer people to get people that they really wanted.

Kirill Eremenko: Got you. And what has been the most common trigger

for you to use your veto power and not hire somebody

that maybe even others thought was kind of those?

Frank Kane: Yeah, I mean after doing that many interviews, you

kind of like learn what a good engineer looks like. And

I guess the thing that would probably give me the most

pause would be someone who pretended that they had

more experience and knowledge than they really did.

They're kind of a little bit deceptive on their resume. I

can uncover that pretty quickly. That's not cool. So

don't do that. Or someone who's coding ability at that

thought just wasn't up to stuff. Right?

Kirill Eremenko: Yeah.

Frank Kane: The main problem that... The reason that the bar

raiser existed is because there's huge pressure to hire

Page 39: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

at Amazon or any big technology company because

there's just not enough good engineers in the world to

go around. And a lot of these teams are really

desperate to fill positions. That is their number one

goal is to just fill seats within their team and get more

engineers working on whatever they have to deliver.

And my role is to make sure that they don't get so

desperate that they lowered their standards. Right? So

that's what that's all about.

Kirill Eremenko: It's interesting, isn't it? That there's so many, as you

say, seats in the companies and they're just so eager

to hire people and on the other hand, we have such a

huge pool of candidates, so many data scientists,

engineers out there who want to get hired. It's just like

the bottleneck is that weeding out process and finding

the talented people, which there's plenty of as well, but

they're rare, right? Compared to millions or hundreds

of thousands of people who want to get hired. Those

hundreds or dozens of people that are really talented

and still also want to get hired. They really need to

stand out somehow for... If they had a beacon above

their head that, "Hey, I'm talented." You'd hire them in

a heart beat. But it's like it's not that case.

Kirill Eremenko: You have to go through this process. So is there

anything that's talented people whom I'm assuming

many of our... Listening to this podcast or most of the

people listening to this podcast are, you care about

their careers already by definition because they've

listening to career advice on these topics. Is there

anything that they can do to help recruiters such as

yourself or such as who you were back in your past life

Page 40: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

of Amazon to identify them to make that whole process

easier and that match happen faster?

Frank Kane: Yeah, I mean, it's like you said, you've got to build that

beacon above your head, right? So here's the reality of

of the situation. Everyone applies to Amazon and

Google and all these big companies and they don't

even look at the resumes that are submitted to them

because there's just so many of them. And weeding

through the mall is impossible. Instead, they will come

to you, right? So you want to make sure that you've

done something that's going to catch the attention of a

hiring manager or recruiter at the company that you

want to get hired at. One way to do that is to know

somebody, right? So if you know somebody who

already works at the company you want to work with,

oftentimes they get referral bonuses, if someone that

they recommend gets hired. And that's probably the

best way to get your foot in the door.

Frank Kane: So really scour your social network, scour Linkedin,

see if you know anybody or if you have a friend of a

friend at the company that you want to get into

because that might be your best way to get noticed.

But beyond that, if you don't, make sure you're

winning coding competitions. Make sure you have stuff

on Github that people can find, get published. Put out

a blog, make sure you're on LinkedIn and having the

right keywords that they're looking for there. Because

the recruiters are looking for you. They're not waiting

for you to come to them. Right? Beyond that, I mean,

obviously the more traditional channels like college

Page 41: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

recruiting is an important source of new hires for

these companies as well.

Frank Kane: Career fairs and stuff at colleges or obviously, if you

graduated from Stanford, you're probably going to get

a call from all of these people, right? But not everybody

can afford to go to Stanford. So for everyone else, you

just have to make sure that your profile stands out

online and your accomplishments are easy for them to

find.

Kirill Eremenko: Fantastic. And I just want to add to that, that in the

process of you putting up all these things online,

whether on GitHub, on Medium, blog posts, videos,

whatnot, you're going to make connections, right?

People who already at Amazon, they're not just sitting

there and wiggling their funds and just doing Amazon

work or whatever other company they're in. They also

go out there and they also read, they also want to

know new, what's been happening in the competition

space, what's new on GitHub, what's new... a

recommender system that somebody is exploring. So

inevitably the more stuff you put out there, sooner or

later somebody from Amazon's going to read it and

they might ask you a question and then you talk to

them and then you can build that connection.

Kirill Eremenko: So you don't have to just go and put yourself as target.

I have to know somebody else. And even if you do like

as Frank, which you said, if even if you do this part of

just building your online presence, eventually you'll

build these connections in a very natural way. And

sooner or later somebody from Amazon or Apple or

whoever else you want to get into is going to come

Page 42: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

across your way. So yeah, these two come hand in

hand and there's a self fulfilling prophecy as long as

you invest the time and effort and energy into it.

Frank Kane: Yeah, I agree completely. I mean everybody at these

companies are invested in hiring. It's not just the

hiring managers and recruiters and if they come

across something you've done online and they like it,

they very well may reach out to you. So you're

absolutely right.

Kirill Eremenko: Fantastic. Well thanks a lot Frank. We've slowly come

to the end of this podcast and super pumped about

the chat that we had. Before I do let you go, please tell

us a couple of places where our listeners, our audience

can follow you, get to know you better and see what

new things you'll get up to in the coming months and

years.

Frank Kane: Yeah, I mean, if you want to check out what I'm up to,

you can head to my website, which is a sundog-

education.com. And from there you can follow me on

whatever social media you wish. And also you'll find...

we've got to give a tip of the hat to Manning

Publications at manning.com and you can find my

couple of new courses from them under their live video

tab there. The Elasticsearch 6 and The Ultimate

Introduction to Big Data are found there.

Kirill Eremenko: Fantastic. And is it okay for our audience to connect

with you on Linkedin as well?

Frank Kane: Absolutely. The more the merrier. So bring them on.

Page 43: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

Kirill Eremenko: Fantastic. Awesome. Well, Frank, thanks so much.

One last question I have for you today is what's a book

that you can recommend to our listeners that's

changed your life?

Frank Kane: Ooh, that's changed my life. The most recent one that I

read is a big thick book called... let's see. I have it right

here, Recommender System Handbook. And it's

basically a huge collection of papers from various

researchers in the field including Netflix and people

like that. So as I was preparing my recommender

system course, that was a hugely valuable resource for

getting caught up on the current state of the art. And

for someone new to the field, I think it's sort of

required reading for figuring out what's out there and

getting a broad lay of the land of the techniques that

are being used today.

Kirill Eremenko: Awesome. Is this by Francesco Ricci? [crosstalk

00:58:38] up on Google.

Frank Kane: Yeah, it's published by Springer.

Kirill Eremenko: Springer, yeah. Published by Springer. Yeah, I found

it-

Frank Kane: Yes, I'll pull it out here at my bookshelf, yeah,

Francesco Ricci, that's right they're the editors.

Kirill Eremenko: Okay.

Frank Kane: It's not cheap but it's worth it.

Kirill Eremenko: Yeah, definitely. Best things in life are sometimes it's

free, sometimes you got to buy them and then they'll

change your life. Okay. On that note, Frank, thanks so

Page 44: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

much once again for coming on the show and sharing

all the insights and knowledge. Really cool chat and

yeah, catch you soon. Maybe at Udemy Live this year.

You're going?

Frank Kane: No, I'm not going this year, but definitely next year.

Kirill Eremenko: Okay, no worries. We'll catch around then. Thanks so

much for coming on the show.

Frank Kane: All right, good talking to you.

Kirill Eremenko: Thank you ladies and gentlemen, boys and girls for

being part of this conversation. My favorite part was

about the convergence of data science and big data.

It's very interesting how these two fields are becoming

more and more intertwined. And of course there were

plenty of other great and useful insights throughout

the podcast. A huge shout out goes to Manning

Publications, which are hosting some of Frank's

courses. So you can find Frank either on Manning

Publications or on Udemy, and if you haven't taken

any of his courses yet, highly recommend checking

them out, especially if you're interested in getting into

the space of big data after today's podcast.

Kirill Eremenko: As usual, you can get all the show notes at

superdatascience.com/265 that's

www.superdatascience.com/265. There, you'll find all

of the resources, materials that were mentioned on

this episode plus the transcript for the episode. And

plus of course, any links to Frank's social media where

you can get in touch with him, you can follow his

career or simply check out his courses. On that note,

thank you so much for being here today. I am very

Page 45: SDS PODCAST EPISODE 265: DATA SCIENCE IN THE WORLD OF … · Amazon on who's hired and who's not hired into the teams and he's got some really exciting tips to share with you on this

grateful that you're part of the SuperDataScience

podcasts and the SuperDataScience journey, and the

community that we're building. If you don't, if you're

not aware of yet, then we actually just launched a

slack channel for SuperDataScience members.

Kirill Eremenko: So if you're a member at SuperDataScience, you must

have gotten an email. Make sure to join that slack

community that we're building, it's not just one Slack

channel, it's actually a multitude of Slack channels in

a Slack community, where you can chat to each other,

to me, to instructors and if you're not a

SuperDataScience member yet, then make sure to

check out superdatascience.com where we're adding

new features all the time. On that note, thank you so

much and I'll look forward to seeing you back here

next time. Until then, happy analyzing.