everything you wanted to know about velocity (but were afraid to cache)
DESCRIPTION
Everything you wanted to know about Velocity (but were afraid to cache). Scott Colestock [email protected] Marcato Partners, LLC. About Scott. Scott Colestock [email protected] Twitter: scolestock Marcato Partners (MarcatoPartners.com) One of three partners - PowerPoint PPT PresentationTRANSCRIPT
Everything you wanted to know about Velocity
(but were afraid to cache)Scott Colestock
[email protected] Partners, LLC
About Scott
• Scott Colestock• [email protected]• Twitter: scolestock• Marcato Partners (MarcatoPartners.com)– One of three partners– Focused on agile coaching– Focused on helping early-stage startup ventures in
the mobile space
What is it?
Velocity is a distributed key/value cache that provides .NET developers with a way to increase performance and
scalability when writing data-centric applications.
What is it? (2)
• The combined RAM available to all servers in a Velocity cluster is presented to Velocity clients as a unified whole
• Any serializable CLR object can be stored– Actual location within cluster is transparent– Client is a simple key/value API at heart
• Run as a service accessed across the network• Additional servers can be added on demand
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
Motivation• Data-centric applications have been the norm for a long while
– Relational data– More recently, “service-obtained” data
• Velocity is about increasing performance by bringing the data physically closer to the consumer– Reduce pressure on underlying data stores/services
• Velocity can be about storing data in value-added form (logically closer to the consumer)– Object graphs– Output caching (not explicit in V1)– Aggregated data in xml or other transformed formats
Motivation (2)
• Databases are always a point of high contention as you scale out, and tuning is expensive– Are your data retrieval sprocs getting harder to
maintain - excessive sql chops required?• Service calls for reference data (internal/external)
are often slow or intentionally throttled• Caching has always been considered a solution
for these issues…
Motivation (3)• Machine-local caching solutions (like Microsoft’s
“Enterprise Library Caching Application Block”) can provide partial answer– Easy key/value API– Flexible store (memory, disk-backed, etc.)– Flexible expiration and eviction policy
• Limitations:– Limited by the memory available to a single node…– Application recycles typically mean you lose the cache– In a load-balanced environment, a large data set means you will
frequently “miss” when attempting to load from cache…
Motivation (4)
Load Balancer
Key 3,5,23
Key 7,11,47
Key 12,16,33
Machine-local caches wind up being sparsely populated
when used with a load balancer (if the data set has
many keys)
Motivation (5)• With machine-local caches, you have no central place to
update/delete cached items• This means you can only cache data that can afford to be
stale by some time period– If the time period is short, you need a low TTL (time-to-live, aka
expiration) which means more cache misses• You can’t cache data that must have changes visible to the
system in (near) real time• With a single logical cache, you have one cache to shoot in
the event of an update/delete– Might be able to live with no expiration
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
Windows Server AppFabric Caching
• History: AppFabric caching was a separate component– Public debut at TechEd 2008 (earlier?)– Codename: Velocity
• “Dublin” was a separate effort, focused on providing a hosting and management environment around WCF/WF
• November 2009: Technologies grouped under heading of “Windows Server AppFabric”
• RTW in June 2010…
Relationship to Windows Azure AppFabric
• Service bus: Handle communication and authentication for accessing applications– Expose apps through firewalls, NAT gateways, etc.– Assist cloud-based apps talking to on-premise apps– Other composite app scenarios; pub/sub
• Access Control Service: Allow you to avoid setting up federated identity agreements just to grant partner/customer access to your cloud-based or on-premise apps.
•Today: Only common marketing/branding with Windows Server AppFabric. •Later: Common services for both
Cache-Aside Pattern
• In the current version, the out-of-box support is for the “cache-aside” pattern.– Check cache– If miss, retrieve data, then populate the cache
• Lots of other patterns you might contemplate (and simulate) with what is provided– Read-through/Write-through– Refresh-ahead/Write-behind
Cache-Aside Pattern
Cache Cluster
Logical Hierarchy
Server A
Cache Host A
Server B
Cache Host B
Server C
Cache Host C
Named Cache: Product Catalog
Default Cache
Region: Sports
Region 1 Region 3
Client apps work with a single logical unit of cache
Server process is DistributedCacheService.exe
Caches explicitly created
with TTL, expiration, HA policy
Regions represent a partition of data (subset of key/value pairs).
Live on one node. Unit of replication/failover.
Regions can be implicit or explicit. Use explicit
only for bulk gets or searching.
Logical Hierarchy
Named Cache: Product Catalog
Default Cache
Region: Sports
Region 1
ID (Key) Payload (Value)
Tags/VersionInfo
1 Foo …2 Bar …3 Baz …
Cache Cluster
Physical Layout
Web Server A
IIS 7.x
Web Server B
IIS 7.x
Web Server C
IIS 7.x
LoadBalancer
Cache Server A
Cache Host
Cache Server B
Cache Host
Cache Server C
Cache Host
Combined Deployment
Web Server A
IIS 7.x
Web Server B
IIS 7.x
Web Server C
IIS 7.x
LoadBalancer
Cache Host
Cache Host
Cache Host
Physical Layout
• Configuration store contains cache policies and global partition map (how keys divide into regions, which servers have which regions)
• If Sql config store, servers will send heartbeat to Sql. Otherwise, heartbeat goes to one or more “lead hosts”
• Partition map used by “Global Partition Manager” (one node in the cluster, but auto failover) to communicate routing information to Velocity clients
Cache ClusterWeb Server A
IIS 7.x
Web Server B
IIS 7.x
Web Server C
IIS 7.x
LoadBalancer
Cache Server A
Cache Host
Cache Server B
Cache Host
Cache Server C
Cache Host
ConfigStore
(File share or Sql Server)
Regions as unit of replication/failover(Global Partition Manager in action)
Cache Cluster
Server A
Cache Host A
Server B
Cache Host B
Server C
Cache Host C
Named Cache: Product Catalog
Default Cache
Region: Sports
Region 1
Regions as unit of replication/failover(When using Secondaries)
Cache Cluster
Server A
Cache Host A
Server B
Cache Host B
Server C
Cache Host C
Named Cache: Product Catalog
Default Cache
Region: Sports
Region 1
Sports secondary
Region 1 secondary
(Updates done synchronously)
Local Cache
• Local cache is an option that can be enabled when creating the cache client (DataCacheFactory)• Allows a local cache to be populated that will prevent network hop (and serialization) if request
can be satisfied locally• Best when data set is (relatively) small, changes infrequently, and stale data is acceptable• Can expire via TTL or notifications (which might be late/lost)• Can specify max object count before evicting LRU
Cache Cluster
Web Server A
IIS 7.x
Web Server B
IIS 7.x
Web Server C
IIS 7.x
LoadBalancer
Cache Server A
Cache Host
Cache Server B
Cache Host
Cache Server C
Cache Host
LocalCache
LocalCache
LocalCache
Data Types and Caching Considerations
• Reference Data: Product catalogs, “lookup” tables, other slow-moving content– Safe to cache for a defined period of time because you probably live
with staleness already– “Local” cache option might be desirable for small data sets
• Activity Data: Shopping carts or other transient transaction state– Accessed for read and write operations, but not shared. Low/No
concurrency considerations – exclusive write.– Safe to cache for reads and keep in cache for writes
• Resource Data: Inventory, Orders, and other core transactional data– Accessed concurrently for read and write– Caching will require a concurrency model to be chosen and managed
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
Deploy/Install Considerations
• Windows “Application Server” Role required• A few critical updates (see install guide)• .NET3.5SP1 for cache clients; .NET4 for servers• You’ll need Powershell 2 (already in
Win7/Win2k8R2)• Windows XP cannot be a client…• “Install” and “Configure” for AppFabric are
two distinct steps
Deploy/Install Considerations• Primary screen of
interest is choosing your configuration store:– XML/File share– Sql-Based
• File share avoids the need for Sql Server, but requires that some nodes in the cache cluster be special (“Lead Hosts”)
• Using Sql as the configuration store is the better engineering choice for production – you may have other reasons to avoid it.
Deploy/Install Considerations
• As you build out your AppFabric Cache Cluster, you will do “New Cluster” on the first node, and “Join Cluster” on subsequent nodes
• Ultimately, all of Windows Server AppFabric is a set of features underneath the Application Server Role – so standard command line installations work.– Setup.exe /install /i cachingservice,cacheclient,cacheadmin /l:c:\temp\setup.log
AppFabric as Application Server“Role Service”
Deploy/Install Considerations
• Can do a “Cache client” install for clients, or for internal apps, just incorporate client assemblies in your own build/deploy process
Microsoft.ApplicationServer.Caching.Core.dllMicrosoft.ApplicationServer.Caching.Client.dllMicrosoft.WindowsFabric.Common.dllMicrosoft.WindowsFabric.Data.Common.dll
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
Caching Classes
DataCacheFactory
DataCacheFactory()DataCacheFactory(configuration)DataCache GetCache(string cache)GetDefaultCache()
DataCacheFactoryConfiguration
LocalCacheProperties NotificationProperties SecurityProperties DataCacheServerEndpoint[] Servers
(Can set these via configuration)
DataCache
Add
Adds a new object to the cache. Exception if the item is already in the cache.
Put Adds a new object to the cache. Replaces if already in cache.
Get Returns an object from the cache.
Remove Removes an object from the cache.
Caching Classes
DataCache with DataCacheItemVersion
• GetCacheItem: returns tags and version info• GetIfNewer: lets you use that version info!• Put and Remove have overloads that takes
version info– Allows for an optimistic concurrency model– Will only succeed if version information matches
what is current for the cached item
DataCache and Locking
• GetAndLock: Allows you to lock a cache item for a specified time period, even if not present– (Will fail if already locked)– public Object GetAndLock (string key, TimeSpan timeout,
out DataCacheLockHandle lockHandle, bool forceLock)
• Useful when attempting to get multiple servers to coordinate “cache pre-load” activity
• PutAndUnlock: Unlock an item, with given key and lock handle
• Unlock: Explicitly unlock, optional extend TTL
DataCache and Tags/Regions
• Explicitly created regions live on a single node…can create a hot spot for both call volume and memory growth
• But they offer bulk retrieval and flexible tag-based retrieves
• For secondary indexes, instead of regions: simulate secondary indexes with your own secondary-to-primary mapping cache
Administrative Model
• Administration for AppFabric Caching done purely through PowerShell
• Can administrate entire Cache Cluster from wherever administrative portion of install has been done – all nodes addressable from single command line location
• Use-CacheCluster points the shell at a particular cluster to administrate
• Get-Command -module DistributedCacheAdministration
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
What we’ll cover
• What motivates this product/technology• Terms / Pictures / Concepts• Deploy / Install Process• A lap around the API & Admin model• Demos• Gotchyas
Gotchyas• Balance number of nodes in cluster with memory per node.
– Too many nodes = cluster overhead, too much memory per node = GC overhead
• If you don’t use Sql Config Store, you need to manually run Start-CacheHost after reboot
• Consider the nature of data stored in cache, and secure appropriately (don’t let cache be weakest link)
• Sql Config Store requires high Sql privileges right now at point of install• Currently service runs as network service account• Consider what you will do when cache is down
– You can go after source of truth– How do you avoid leaving stale data in the cache?
Resources• AppFabric Caching and Deployment Guide
– http://bit.ly/AppFabMgmt• AppFabric Development Center
– http://bit.ly/AppFabDevCtr• AppFabric Forums
– http://bit.ly/AppFabForum• NHibernate integration
– http://sourceforge.net/projects/nhcontrib/files/NHibernate.Caches/• Entity Framework integration (basis for)
– http://code.msdn.microsoft.com/EFProviderWrappers• Recent MSDN:
http://msdn.microsoft.com/en-us/magazine/ff714581.aspx
Thank you -
Questions?