survey on nosql integration
TRANSCRIPT
A survey on NoSQL database integration
Luiz Henrique Zambom SantanaProf. Dr. Ronaldo dos Santos Mello
Profa. Dra. Carina Dorneles
Agenda
• Background• NoSQL
• Global vs. Local
• Model
• Related Works
• Comparison
• Taxonomy
• Conclusions
Background: NoSQL
Sadalage e Fowler, 2012
(http://martinfowler.com/books/nosql.html)
Not only SQL
Nathan Marz, 2014(http://www.slideshare.net/nathanmarz/runaway-complexity-in-big-data-and-a-plan-to-stop-it)
Relational databases will be a footnote in history
Background: Global-as-view Vs. Local-as-view
● GAV
○ mapping from entities in
the mediated schema to
entities in the original
sources
● LAV
○ mapping from entities in
the original sources to
the mediated schema● The latter approach requires more sophisticated
inferences to resolve a query on the mediated
schema, but makes it easier to add new data
sources to a (stable) mediated schema.
Dey, Akon, Alan Fekete, and Uwe Röhm. "Scalable transactions across heterogeneous NoSQL key-value data stores." Proceedings of the VLDB
Endowment 6.12 (2013): 1434-1439.
• VLDB Endowment• Qualis A1
• Impact Factor 1.568
• Why it is important?• Seminal
• Transactions
•“Weak” global-as-view
Zhang, Duo, Benjamin Rubinstein, and Jim Gemmell. "Principled graph matching algorithms for integrating multiple
data sources." (2014).
• IEEE Transactions on Knowledge and Data Engineering (TKDE)• Qualis A1
• Impact factor 2.067
• Why it is important?• Graph matching algorithms
• Entity resolution
• Shows that integration is far more complicated in NoSQL applications
•Local-as-view
Da Silva, Daniel L., et al. "A Computational Framework for Integrating and Retrieving Biodiversity Data on a Large Scale." Big Data (BigData
Congress), 2014 IEEE International Congress on. IEEE, 2014.
• IEEE International Congress on Big Data• No Qualis (yet)
• Impact factor
• Why it is important?• Integrating and Retrieving Biodiversity Data
•Global-as-view
•Resembles the Lambda Architecture
Kiran, V. K., and R. Vijayakumar. "Ontology based data integration of NoSQL datastores." Industrial and Information Systems (ICIIS), 2014 9th International
Conference on. IEEE, 2014.
• 2014 9th International Conference on Industrial and Information Systems (ICIIS)• Qualis B1
• Why it is important?• Intermediate model
• Global-Local-as-view
• Information extraction may require sourcing data from multiple data sources, establishing relationship among them and querying across these data sources together.
Kaur, Karamjit, and Rinkle Rani. "Managing Data in Healthcare Information Systems: Many Models, One
Solution." Computer 3 (2015): 52-59.
• IEEE Computer• Qualis A1
• Impact fator 1.443•Global-as-view
• Why it is important?• Because healthcare data comes from
multiple, vastly different sources, databases must adopt a range of models to process and store it. A polyglot-persistent framework combines relational, graph, and document data models to accommodate information variety
Duggan, Jennie, et al. "The BigDAWG Polystore System." ACM SIGMOD Record 44.2 (2015): 11-
16.• SIGMOD Record
• Qualis A1
• Impact Factor 1.05
• Global-as-view
• A polystore architecture designed to unify querying over multiple data models.•“No one size fits all”
Duggan, Jennie, et al. "The BigDAWG Polystore System." ACM SIGMOD Record 44.2 (2015): 11-16.
• Why it is important?• Twitter guys and Stonebraker
• Deals with the entire complexity
• Introduces the Island abstraction
• Model cast between the DBMS
Comparativo
Year Main author Summary NoSQL Taxonomy
2013 Dey Transactional access Key/Value Schema unification > Poliglot
2014 Zhang Graph match Graph Schema unification > Unified Language
2014 Da Silva Biodiversity databases integration
Document Applicational integration > CAP
2014 Kiran Ontology as canonical model
Column-oriented Schema unification > Unified Language
2015 Kaur Medical Virtually any of database Applicational integration > CAP
2015 Duggan BigDAWG Virtually any of database Federation >Indepent access
Conclusions
• The problem is real•Important for many fields
• Most of the solutions uses Global-as-View
• Most of the solutions exposes a REST API as unified access
• Many works cites also SQL and NoSQL integration
• Concerns• The solution have to scalable
• The solution cannot be difficult to setup
• BigDAWG is the most complete approach