the datahub de/blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · the...
TRANSCRIPT
![Page 1: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/1.jpg)
The datahubDe/blending museum data
![Page 2: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/2.jpg)
Setting the stage
In which I’ll describe where we came from
The Datahub Project
In which I’ll show you an aggregation architecture
The story thus far
In which I’ll discuss the construction process
What we learned
In which I’ll conclude with a few take aways
![Page 3: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/3.jpg)
![Page 4: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/4.jpg)
![Page 5: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/5.jpg)
![Page 6: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/6.jpg)
Multiple organisations
Different local traditions, thesauri, various cataloguing
rules (SPECTRUM), organisational contexts,…
Multiple registration systems
TMS, Adlib, CollectiveAccess, closed/open source,
Lack of API’s, non standardised API’s,…
Multiple end user applications
various websites, historically grown, different
contractors, various CMS systems, different ways to
deliver data,…
![Page 7: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/7.jpg)
Manual exchange
Different ways
Excel, CSV, vendor formats. WeTransfer, e-mail,..
Error proneCorrupt exports, wrong data exported, wrong
version passed on, stuff gets lost along the way,…
High overhead costsTime and money (communication, $/hour)
High latencyWhat’s online is not really up to date
![Page 8: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/8.jpg)
Herding cats
![Page 9: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/9.jpg)
A modern ecosystem
![Page 10: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/10.jpg)
The Datahub Project
![Page 11: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/11.jpg)
Aggregator
![Page 12: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/12.jpg)
Local aggregator
![Page 13: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/13.jpg)
![Page 14: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/14.jpg)
![Page 15: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/15.jpg)
![Page 16: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/16.jpg)
![Page 17: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/17.jpg)
![Page 18: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/18.jpg)
![Page 19: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/19.jpg)
![Page 20: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/20.jpg)
![Page 21: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/21.jpg)
![Page 22: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/22.jpg)
![Page 23: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/23.jpg)
![Page 24: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/24.jpg)
The story thus far
![Page 25: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/25.jpg)
Assumptions / Reality
• We had a fixed, limited budget
• Estimated timeline 3 to 6 months.
• A production ready version.
• Contractor delivered a prototype version.
• We over-extended the timing.
• Switch to DIY development after 6 months.
• Scope changes as we went along.
![Page 26: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/26.jpg)
What happened?
• We underestimated the ETL workload
• We overestimated contractor engagement
• We underestimated organisational
complexity
![Page 27: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/27.jpg)
Wicked ETL
• Context really matters
• Getting intimate with the domain takes
time
• Integrating data across network is
challenging.
• Difficult to guestimate complexity up front
![Page 28: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/28.jpg)
Wicked ETL
Context really matters
• MachinesLegacy software, lack of infrastructure,…
• PeopleData means nothing until it gets interpreted.
But, different perceptions of reality…
• ContentDriven by tradition, software, people.
![Page 29: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/29.jpg)
Wicked ETL
Data modelling is a wicked challenge
• Mapping to standardised exchange formats… and their specific data models
• Normalisation and enrichment… are we taking about the same thing?
• Context specific concerns... Copyright, privacy, security, authority
![Page 30: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/30.jpg)
Procurement
• Build-to-print vs build-to-spec.
• You outsource the process, not the project.
• Is contractor service a good fit?
• Relationship with the contractor!
• Procurement is part of the design process
![Page 31: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/31.jpg)
DIY development
• Knowledge domain and technical
experience
• Flexibility to build exactly what you need
• Reduces dependency on a specific
contractor
• Requires in-house competences
• Payroll is a hidden cost
• The Bus Factor risk
![Page 32: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/32.jpg)
Lessons learned
![Page 33: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/33.jpg)
Own your project
Define the project process you’re going to follow
Actively involve your stakeholders
Challenge your own assumptions, but keep your
focus!
Actively be involved in the process
Don’t assume a vendor will solve things for you.
![Page 34: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/34.jpg)
Be mindful about the budget
Fixed price vs Fixed buget
In source talented specialists you need
Identify right profile: IA, Dev, PM, UX,…
Outsource placing the kitchen sink
Stock off-the-shelf website or web app
![Page 35: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/35.jpg)
Document all the things
Be mindful about the human who comes after you!
Don’t do elaborate specifications up front
Nobody is interested in paper tigers.
Make your hands dirty
Try tools up front. Identify the big hurdles early.
![Page 36: The Datahub De/Blending data in museumsrepozitar.techlib.cz/record/1263/files/idr-1263_1.pdf · The datahub De/blending museum data. Setting the stage In which I’lldescribe where](https://reader034.vdocuments.net/reader034/viewer/2022042323/5f0d95f37e708231d43b164e/html5/thumbnails/36.jpg)
Thank you!
https://github.com/thedatahub
https://thedatahub.github.io
http://www.flemishartcollection.be
T: @netsensei