Data quality in Real Estate
out of 14
Post on 18-Mar-2018
- Data Quality In Real Estate Dimitris Kontokostas, Andy van der Hoeven, Samur Araujo Amsterdam, Sep 14th 2017, LDQ Workshop, SEMANTiCS Conference
- About Geophy ● Goal to map all buildings in the world ● Provide a quality score for each building ○ Based on location, building status, history, environmental metrics, etc ● Semantic platform ○ RDF eases the data integration process ● Team of 45 with aim to double by next year
- Real Estate is a very complex domain Really!
- Possible constraints on addresses? ● An address will start with, or at least include, a building number. ● When there is a building number, it will be all-numeric. ● No buildings are numbered zero ● Well, at the very least no buildings have negative numbers ● A building number will only be used once per street ● A building will only have one number ● A building name won't also be a number ● [...] https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/
- Geophy [set of] ontologies ● 13 ontologies (+ 9 external) ● 125 Classes ○ Buildings ○ Addresses ○ Companies ○ [...] ● 720 properties ○ 500 datatype ○ 160 relation properties ● Growing...
- Quality is expensive ● Quality of source data ○ Free, open, closed data sources, etc. ● Data clean up process ○ Violations, deduplication, precision, etc. ○ How much time and effort can one afford? How much quality is good enough? � Fitness for use
- Quality of ... ● Source data ○ Accuracy of the source ● Translation of source data ○ RDF mappings, rml, d2rq, scripts etc. ● Model design ○ Modelling quality ○ Data fitting on schema ● Model definition ○ Mapping of model on RDFS, OWL, ShEx|SHACL Shapes, etc ○ Semantics i.e RDFS, OWL DL/RL/FULL, etc
- Evolution & quality � Data evolves � so do ontologies � so do RDF mappings � so does code � so do SPARQL queries � so do constraints http://aligned-project.eu http://aligned-project.eu
- Scaling quality ... ● Thousands of triples ● Millions of triples ● Billions of triples ● ? Try to move validation in the K range (when possible)
- Validate closer to the source � Validate the model � Validate the RDF mappings � Validate RDF mapping excerpts � Validate instance data
- Automate, automate & automate Can you spot the error? rdfs:label ⇒ rdf:langString � :foo rdfs:label ″foo @en″ .
- Automate, automate & automate Can you spot the error? rdfs:label ⇒ rdf:langString � :foo rdfs:label ″foo @en″ . � :foo rdfs:label ″foo″@en .
- CI/CD is your buddy ● Integrate validation with your CI/CD ○ Choose tools & technologies wisely ○ Jenkins, Travis, Gitlab, TeamCity ● Fail the build until data issues are fixed ● Data integration validation checks ○ Standalone datasets can pass CI
- Thank you for your attention Questions?
View more >
Real Estate Real Estate – Real Estate – Master Data ... Estate . Real Estate – – Master Data. FIFI--RE-001. 001 September 23. ... – SAP terms glossary – SAP concepts functionality – Business process flow
Spatial Statistics For Real Estate Data - FIG paper presents spatial statistics tools in application to real estate data, ... (hedonic prices of particular ... Spatial Statistics For Real Estate Data
Modeling Real Estate Data using Semiparametric Quantile ... regression data for real estate prices Quantile regression models Bayesian inference Results Modeling Real Estate Data using Semiparametric Quantile Regression