cavity data

6
Cavity

Upload: alan-dean

Post on 28-Nov-2014

1.512 views

Category:

Technology


3 download

DESCRIPTION

A short overview of the Cavity data library.

TRANSCRIPT

Page 1: Cavity Data

Cavity

Page 2: Cavity Data

Background

• Project created on Google Code in June 2010 (under MIT licence)• https://code.google.com/p/cavity/

• Very much a “personal” open source project• I didn’t want to be writing the same plumbing repeatedly

• I figured other people might get mileage from the code and packages

• It’s probably rather idiosyncratic

• Everything is TDD with as near to total coverage as I can get

• Releases are pushed to NuGet• https://www.nuget.org/packages?q=Cavity

• At some point I will probably port to GitHub

Page 3: Cavity Data

Visual Studio Solutions

• There are 43 solutions, but the ones I use heavily are:• Cavity Configuration

• Cavity Core

• Cavity Data

• Cavity Diagnostics log4net

• Cavity Domain

• Cavity Domain (Royal Mail)

• Cavity Service Location

• Cavity Unit Testing

Page 4: Cavity Data

Cavity Data

• Depends on:• Cavity Core

• .NET 2.0 / 3.5 / 4.0 (all three frameworks are separately targeted)

• Comma-separation and tab-separation are currently implemented• Designed as forward-only, read-only StreamReader implementations

• Expectation that data files are immutable (write-once, read-many)

• Reader implementations are wrapped by DataSheet encapsulations

• Data is primarily exposed as IEnumerable<KeyStringDictionary>• Enables the power of System.Linq

Page 5: Cavity Data

• Uses Cavity extensively

• Drives much of the current development

• Processes between 200 to 300 million records per day (≈ 50Gb data)

• Example read rate (whole of UK property model)• 30 million records (12½ Gb), with 180 columns (5.4 billion data points)• StreamReader ReadLine()

• 2 minutes total read time = 15,000,000 records/sec

• CsvStreamReader ReadEntry<T>()• 10 minutes total read time = 50,000 records/sec

• The philosophy is to squeeze maximum value from dedicated tin• Predictable fixed cost with near-zero marginal cost

Page 6: Cavity Data

Use Cases

The Good

• Consuming feeds

• Producing feeds

• (Near-) Non-Volatile data

• Excellent match to CQRS (+REST)

• When storage is cheap

The Bad

• Volatile data

• Very large numbers of consumers

• Mismatch to classic n-tier architectures

• Limited tooling for ad hocqueries