data governance and schema registry for software developers
TRANSCRIPT
1
Simplify Governance of Streaming Data
Gwen Shapira
Product Manager, Apache Kafka™ PMC
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent
Control Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product
Marketing
3
What is Data Governance?
Data Access
Master Data
Management
Data AgilityData Reliability
Data Quality
4
To Grow a Successful Data Platform, You need Some Guarantees
5
If you can’t trust your data,
what can you trust?
7
Schemas Are a Key Part of This
1. Schemas: Enforced Contracts Between your Components
2. Schemas need to be agile but compatible
3. You need tools & process that will let you:• Move fast and not break things
9
Cautionary tale: Uber
Producers
Service 1 Service 2
RedshiftS3
EMR
{created_at: “Dec 1,
2015”}
CREATE (
created_at: STRING )
parseDate(created_at, “%M %d,
%yyyy”)
10
Producers
Service 1 Service 2
RedshiftS3
EMR
{created_at:
1491368429}
CREATE (
created_at: STRING )
parseDate(created_at, “%M %d,
%yyyy”)
Cautionary tale: Uber
11
The right way to do things
MessageCustomer
RoomGifts
BookRoom
LoyalCustomers
12
The right way to do things
MessageCustomer
RoomGifts
BookRoom
LoyalCustomers
BeachPromo If (property.location == beach &&
customer.level == platinum)
sendMessage({topic: RoomGifts, customer_id: customer.id,
room: property.roomnum, gift: seaShell})
13
There is special value in standardizing parts of schema
{ "type": "record",
"name": "Event",
"fields": [{ "name": "headers", "type": {
“name” : “headers”,
“type”: ”record”
“fields”: [
{“name”: “source_app”, “type”: “string”},
{“name”: “host_app”, “type”: “string”},
{“name”: “destination_app”, “type”: [“null,“string”]},
{“name”: “timestamp”, “type”: “long”}
]},
{ "name": "body", "type": "bytes" }] }
Standard headers allowed
analyzing patterns of communication
S3
15
In Request Response World – APIs between services are key
In Async Stream Processing World – Message Schemas ARE the API
…except they stick around for a lot longer
16
Lets assume you see why you need
schema compatibility…
How do we make it work for us?
17
Schema Registry
Elastic
Cassandra
Streams App
Example Consumers
SerializerApp 1
SerializerApp 2
!
Kafka Topic!
Schema
Registry
Define the expected fields for each Kafka topic
Automatically handle schema changes (e.g. new fields)
Prevent incompatible changes
Supports multi-datacenter environments
18
Key Points:
• Schema Registry is efficient – due to caching
• Schema Registry is transparent
• Schema Registry is Highly Available
• Schema Registry prevents bad data at the source
19
Big Benefits
• Schema Management is part of your entire app lifecycle – from dev to prod
• Single source of truth for all apps and data stores
• No more “Bad Data” randomly breaking things
• Increased agility! add, remove and modify fields with confidence
• Teams can share and discover data
• Smaller messages on the wire and in storage
20
But the Best Part is…
RDBMHadoop /
Hive
Kafka w/
Schema
Registry
JDBC
Source
HDFS
Sink
SchemaSchema
Schema
SchemaSchema
21
Everyone Schema Registry
22
Compatibility tips!
• Do you want to upgrade producer without touching consumers?
• You want *forward* compatibility.
• You can add fields.
• You can delete fields with defaults
• Do you want to update schema in a DWH but still read old messages?
• You want *backward* compatibility
• You can delete fields
• You can add fields with defaults
• Both?
• You want *full* compatibility.
• Default all things!
• Except the primary key which you never touch.
23
App Lifecycle
DevelopementSource
RepositoryPull Request
CI Staging / Test
Production
“Nightly build”
Lets Try
Lets Do It!
OptionalGood
Idea
YES Almost
too late
Last
resort
24
Finding Schemas
1. Write Schemas -> Generate Code
2. Get Schemas from registry -> Generate code
3. Write code -> Generate Schemas
25
Summary!
1. Data Quality is critical
2. Schema are critical
3. Schema Registry is used to maintain quality from Dev to Prod
4. Schema Registry is in Confluent Open Source
26
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent
Control Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product
Marketing
27
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and
quality assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
28
30% off conference feecode = kafpcf17
New York City | May 8, 2017
Presented by
29
Thank You!