an introduction to data gravity by john tkaczewski of filecatalyst

25
Introduction to Data Gravity By: John Tkaczewski President of FileCatalyst March 4, 2015

Upload: etcenter

Post on 18-Jul-2015

78 views

Category:

Technology


0 download

TRANSCRIPT

Introduction to Data Gravity

By: John Tkaczewski

President of FileCatalyst

March 4, 2015

Data Gravity

• A term first coined by Dave McCrory circa 2010

• Data is difficult to move around

• Data attracts greater and greater amount of Apps, Services and other tools as it grows

Why is the data “stuck”?

Throughput and latency • As throughput and latency to the Data increase, the gravitational pull

of the data mass also increases

• Which forces the apps and services to move closer to the data

If the model stopped here… all apps and services would end up in a single giant online BLOB (the cloud) to be closer to the data

There are other forces that keep some data away…

Forces that push away

• Privacy

• Security

• Cost

• Features, Convenience

There is a balance between the gravity and the “Forces that push away”

Real Life Scenario USB Thumb Drive VS. Amazon S3

• Unlimited flexible growing storage

• Easy Sharing with the rest of the world

• Security

• Convenience

• Fast Access to Data

• Practically Free

• Can be physically moved

Data Gravity on the Cloud

• Make inbound data as light as possible

• Make outbound data as heavy as possible

• Cost in VS. cost out

• Make Context of the data proprietary (example of a picture on flickr from http://datagravity.org/)

Data Gravity as a computational theory

• Borrows from gravitational theory

• Similarities with the way nations negotiate trade tariffs and trade agreements between countries and cities (ref)

• Shannon’s law how much information can be squeezed down a wire

• Von Newmann Bottleneck, how fast the data can move from Persistent Storage to Memory to CPU cache to CPU

How does accelerated file transfer fit in all of this?

Traditional File Transfers

FTP, SFTP, HTTP, WebDav, SMTP, CIFS etc… • All use TCP

• Provides reliability, error checking, ordered packets in a stream

• Congestion control built in

• Internet could not survive without it

• Works well for most internet traffic, email, web browsing small ad-hoc transfers

Problems with TCP • Flow control limits transmission window, causes dead air with high latency

• Very aggressive in response to network congestion, cannot tune in application layer

• Result is less than ideal performance on wireless, satellite, or long haul links

• Can be tuned but still not ideal for many-one, one-many

File Transfer Acceleration • Ideal for bulk file transfer

• Predictable - Can send at a perfect rate

• Not affected by latency or packet loss

• Congestion Control implemented in application layer

• Tunable congestion control aggression

• Instantly detect link capacity

Overall the effects of Data Gravity are reduced (like Anti-Gravity)

• Data gravity still exists but is reduced by eliminating the latency component

• The gravity continues to exist towards every storage location

• With faster moving data, the owner can now have more choices where to store it.

Cloud growth vs. geographical location of the users

• It’s not always possible to make cloud services available near the all the users

• File Transfer Acceleration can help to reach those far away users at a lower cost then building a new data center

Future … • Cloud services will continue to expand (money maker)

• Local and personal storage will continue to be needed but merely as a cache to what’s on the cloud

• Throughput will continue to increase but the latency will stay the same (speed of light++ anyone??)

• The need for faster file transfers will continue to grow as the cloud, data and links get bigger.

Thank you.