secured (kerberos-based) spark notebook for data science: spark summit east talk by joy chakraborty
TRANSCRIPT
![Page 1: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/1.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
February 8, 2017
Joy Chakraborty Distributed System Architect
Secured (Kerberos-based)
Spark Notebook for Data Science
Spark Summit East 2017
![Page 2: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/2.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
Speaker Bio I am a Distributed System Architect with 17+ years of application software development experience and 10+ years of experience in designing, architecting and developing Distributed systems. I have a special interest in distributed and parallel computing, and currently work on Cloud and Big Data technologies. I also actively participate in various Software architectural organizations.
I have been working in Bloomberg’s Data Platform team as a Data Engineer since 2014. My responsibility is to store and process petabytes of data reliably, predictably and securely.
![Page 3: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/3.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 3
Agenda Why Secured Data Science Notebook? 1
Design and technologies consideration 2
Integration and Implementation 3
Question/Answers 4
![Page 4: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/4.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
• Create Distributed Data platform to :
– Ingest various data sources across the organization
–Store data at most granular level in consistent format
–Provide tooling across organization to perform Data-exploration, Analysis & Machine learning activities
4
Why Data Science Notebook?
![Page 5: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/5.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 5
Data exploration, Analysis and Machine Learning
Other Sources
Databases
Files
Data
Data
Data
Data
Cluster
![Page 6: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/6.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 6
Data exploration, Analysis and Machine Learning
Other Source
s
Databases
Files
Data
Data
Data
Data
Cluster
![Page 7: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/7.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 7
What are organization requirements for
tooling?
![Page 8: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/8.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
• Spark Notebook for
Web-based
Scala/Python libraries
Templates
Security and login integration
Data discovery
Enhanced SQL support
8
Jupyter Notebook for Spark
![Page 9: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/9.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
• JupyterHub (Notebook web-application for multi-users environment)
• SparkMagic (Spark kernel for Jupyter Notebook supporting Python & Scala)
• Livy (HTTP REST web-service for to submit Spark jobs, managing sessions, etc.)
• HDFS/Yarn (HDFS and Yarn running Spark jobs)
9
Spark Notebooks – Tech Stack
![Page 10: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/10.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 10
JupyterHub – Current State
![Page 11: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/11.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 11
JupyterHub Web Service Yarn Cluster
Livy
JupyterHub – Current State
SparkMagic
Spark-Scala Spark-Python
Spark Job
1. JupyterHub login using OAuth
2. Sends HTTP Request 3. Creates/maintains Spark session and
submits the Spark job to the yarn cluster
xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
Running multiple Notebooks
4. Spark job output 5. HTTP Response
![Page 12: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/12.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
12
Requirement – Kerberos Integration
• Kerberos is a Network Authentication Protocol that works on the basis of 'tickets' to allow nodes communicating over a network to prove their identity to one another in a secure manner. Kerberos uses account databases such as domain’s Active Directory.
![Page 13: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/13.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 13
Current State (with Kerberos) • HDFS supports Kerberos
• Livy Supports Kerberos (configurable in Livy) • Can impersonate a user using HDFS “proxyuser” setting and submit Spark job on behalf of a user
• A superuser with username ‘super’ wants to submit job and access hdfs on behalf of a user1. The superuser has kerberos credentials but user user1 doesn’t have any. The tasks are required to run as user user1. It is required that user1 can connect to the namenode or job tracker on a connection authenticated with super’s kerberos credentials.
• JupyterHub and SparkMagic: No support for Kerberos
<property> <name>hadoop.proxyuser. livyusr.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livyusr.groups</name> <value>LIVY_GRP</value> </property>
![Page 14: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/14.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 14
How Kerberos works in HDFS and Yarn
cluster running Spark Jobs?
![Page 15: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/15.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 15
HDFS/Spark with Kerberos
Client
![Page 16: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/16.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 16
HDFS/Spark with Kerberos
Client
0. Service Principles/Keys
![Page 17: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/17.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 17
HDFS/Spark with Kerberos
Client
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 18: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/18.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 18
HDFS/Spark with Kerberos
Client
1. Client Request Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 19: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/19.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 19
HDFS/Spark with Kerberos
Client 5. Sends Service Ticket and requests for Authentication
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 20: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/20.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 20
HDFS/Spark with Kerberos
Client
Retrieves User roles/permissions
6. User Authenticated using Service Principle/key
5. Sends Service Ticket and requests for Authentication
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 21: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/21.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 21
HDFS/Spark with Kerberos
Client
Retrieves User roles/permissions
6. User Authenticated using Service Principle/key
5. Sends Service Ticket and requests for Authentication
Client/Server session established
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 22: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/22.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 22
Let’s have JupyterHub as Client and
bring SparkMagic and Livy
![Page 23: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/23.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 23
Jupyter + Spark with Kerberos
Client
Retrieves User roles/permissions
6. User Authenticated using Service Principle/key
5. Sends Service Ticket and requests for Authentication
Client/Server session established
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
![Page 24: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/24.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 24
Jupyter + Spark with Kerberos
Client
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
The nature of communication between Browser client and HDFS
will be different
![Page 25: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/25.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 25
Jupyter + Spark with Kerberos
Client
1. Client requests Ticket
2. KDC sends TGT 0. Service Principles/Keys
Also the TGT process between Browser client and KDC will
change.
![Page 26: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/26.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 26
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
0. Service Principles/Keys
1. Client requests Ticket
2. KDC sends TGT
1. KDCAuthenticator: JupyterHub Authentication extensibility point
2. KDCSpawner: JupyterHub per user session extensibility point
![Page 27: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/27.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 27
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
0. Service Principles/Keys
1. Client requests Ticket
2. KDC sends TGT
??? ???
???
![Page 28: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/28.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 28
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
0. LIVY HTTP Service Principles/Keys
0. Service Principles/Keys
??? ???
???
1. Client requests Ticket
2. KDC sends TGT
![Page 29: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/29.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 29
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
0. Service Principles/Keys
??? ???
???
![Page 30: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/30.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 30
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
0. LIVY HTTP Service Principles/Keys
Retrieves User roles/permissions
2. KDC Sends TGT
1. Client requests Ticket (kinit)
4. 401/www-Authenticate: Negotiate
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
Spnego
![Page 31: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/31.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 31
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
4. 401/www-Authenticate: Negotiate
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
![Page 32: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/32.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 32
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
![Page 33: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/33.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 33
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
1. Supports SPNEGO 2. Authenticates user
using HTTP service principle/key
3. Retrieves user-id
![Page 34: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/34.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 34
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
Connection/session established
![Page 35: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/35.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 35
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Send HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
3. Jhub sends URL request (GET)
???
???
0. Service Principles/Keys
Connection/session established
![Page 36: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/36.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 36
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
3. Jhub sends URL request (GET)
0. Service Principles/Keys
???
???
1. Opens Notebook session
2. Encrypts user-id and puts it into env['PROXY_USER']
Connection/session established
![Page 37: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/37.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 37
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
0. Service Principles/Keys
???
???
Connection/session established
![Page 38: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/38.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 38
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
11. Forwards the request to SparkMagic kernel
0. Service Principles/Keys
???
???
Connection/session established
![Page 39: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/39.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 39
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
11. Forwards the request to SparkMagic kernel
0. Service Principles/Keys
???
???
1. SparkMagic reads the encrypted env['PROXY_USER'] and adds it to the Http request body as “proxyUser”.
Connection/session established
![Page 40: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/40.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 40
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
0. Service Principles/Keys
???
Connection/session established
![Page 41: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/41.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 41
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
13
14 11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
13. Uses Livy keytab to asks for HDFS service ticket
14. KDC sends HDFS Service Ticket
0. Service Principles/Keys
???
Connection/session established
![Page 42: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/42.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 42
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
13
14 11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
13. Uses Livy keytab to asks for HDFS service ticket
14. KDC sends HDFS Service Ticket
0. Service Principles/Keys
???
1. Livy decrypts the “proxyUser” and sets the “proxy-user” value for remote Spark-Submit Connection/session established
![Page 43: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/43.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 43
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and asks for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
13
14 11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
13. Uses Livy keytab to asks for HDFS service ticket
14. KDC sends HDFS Service Ticket 15
15. Livy submits remote Spark job using HTTP Spnego with Get- Authorization: Negotiate <HDFS service-ticket>
0. Service Principles/Keys
Connection/session established
![Page 44: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/44.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 44
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and ask for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
13
14 11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
13. Uses Livy keytab to asks for HDFS service ticket
14. KDC sends HDFS Service Ticket 15
15. Livy submits remote Spark job using HTTP Spnego with Get- Authorization: Negotiate <HDFS service-ticket>
0. Service Principles/Keys
Retrieves User roles/permissions
16. User Authenticated using Service Principle/key 16
Connection/session established
![Page 45: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/45.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 45
Jupyter + Spark with Kerberos
Client KDC
Spawner
SparkMagic
JupyterHub
KDC Authenticator
Web Service
Livy
2 1
3
4
5
6
7
8
9
10
0. LIVY HTTP Service Principles/Keys
2. KDC sends TGT
1. Client requests Ticket (kinit)
5. Client sends TGT and ask for JHUB Service Ticket
6. KDC sends Service Ticket
8. Spawns user session
4. 401/www-Authenticate: Negotiate
7. Sends HTTP GET with Get- Authorization: Negotiate <jhub service-ticket>
9. Uses SM keytab to asks for LIVY service ticket (kinit)
3. Jhub sends URL request (GET)
10. KDC sends Livy Service Ticket
11
12
13
14 11. Forwards the request to SparkMagic kernel
12. Submits the Spark request over HTTP to Livy with Get- Authorization: Negotiate <Livy service-ticket>
13. Uses Livy keytab to asks for HDFS service ticket
14. KDC sends HDFS Service Ticket 15
15. Livy submits remote Spark job using HTTP Spnego with Get- Authorization: Negotiate <HDFS service-ticket>
0. Service Principles/Keys
Retrieves User roles/permissions
16. User Authenticated using Service Principle/key 16
Connection/session established Connection/session established
![Page 46: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/46.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 46
Jhub-Kerberos Development Summary • JupyterHub
• KDC Authenticator (configurable using JupyerHub configuration)
• Supports Kerberos-Spnego authentication using HTTP Service Principle and keys
• KDC Spawner (configurable using JupyerHub configuration)
• Encrypts the current user-name and stores it in the “PROXY_USER” environment variable (before spawning a new user child process) which SparkMagic reads/uses later.
• Kinit to get the Livy Service ticket for Spnego Authentication with Livy server.
• SparkMagic • Adds current user-name (reading from “PROXY_USER” environment variable) as “proxyUser” in the Livy HTTP Request body. This
behavior can enabled or disabled (default) by SparkMagic configuration
• Livy changes (configurable using Livy configuration)
• Supports to decrypt the “proxyUser” from the request body & adds to the remote Spark job request for HDFS impersonation
![Page 47: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/47.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved. 47
Jhub-Kerberos Development Setup
• Learnings • KDC Domain controller running the AS and TGS
• Multiple nodes running JupyterHub, Livy and Yarn (Spark) at different DNS farm and networking between these farms
• Creating/modifying key-tabs and principles on demand basis in a corporate environment for dev
• Corporate IT dependency
• How Docker helps • Easy to bootstrap the JupyterHub, Livy, Yarn and KDC using Docker script
• Seamless networking (easy to configure) between Docker instances
• Creating Service principles and key-tabs on demand (without involving corporate IT)
• Custom DNS farm setup for POC and development activities
![Page 48: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/48.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
Q&A
![Page 49: Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East talk by Joy Chakraborty](https://reader031.vdocuments.net/reader031/viewer/2022030307/58e5326e1a28abac7e8b5bd7/html5/thumbnails/49.jpg)
© 2017 Bloomberg Finance L.P. All rights reserved.
THANK YOU Joy Chakraborty
Bloomberg L.P.