cisco live 2016 7/13/2016 -...

1

7/13/2016Cisco Live 2016

2

7/13/2016Cisco Live 2016

3

7/13/2016Cisco Live 2016

4

7/13/2016Cisco Live 2016

5

7/13/2016Cisco Live 2016

6

7/13/2016Cisco Live 2016

7

7/13/2016Cisco Live 2016

8

7/13/2016Cisco Live 2016

9

7/13/2016Cisco Live 2016

10

7/13/2016Cisco Live 2016

11

7/13/2016Cisco Live 2016

12

7/13/2016Cisco Live 2016

13

7/13/2016Cisco Live 2016

14

7/13/2016Cisco Live 2016

15

7/13/2016Cisco Live 2016

Here is our agenda. We have a lot to fit in over the next hour and 45 minutes so please hold questions until the end if you wouldn’t mind. We will try to have about 15 minutes at the end for questions and we will stay around after the session for any we missed during that time. When we developed this content we wanted to make sure we touched on both the technology involved in helping solve the problem and also the business policies and processes that are required. We also have our solution demos interspersed throughout the presentation. All of our data points will be included in the download of the session which will be available for free at Cisco Live 365 in a couple of weeks. Everything we discuss is documented and URL’s provided to all of the supplemental material including the demo videos.

16

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

17

Why did we decide on data protection as a topic for Cisco Live? The big idea here is that we’ve seen a lack of focus on this topic in general. There have been no Cisco Live sessions regarding data protection and it’s not something we generally talk about to our customers. Much of this is changing as we continue to build relationships with data protection partners and since we’ve developed the dense storage server the C3160 and C3260. Both Tim and I come from the customer side and have lots of experience with having to develop and deploy DR plans, we’ve also both seen how disasters can affect these businesses and why there needs to be more focus on this topic. As we said during the intro, we both support service provider customers and many of them have service offerings in this space. It’s something we wanted to cover and make sure everyone is aware of as a method of off-siting your data even if you don’t have a secondary data center. Finally we believe we have some interesting ideas around how to build the DR plan and choose the tools that will help you protect your business.

18

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

19

Over the next few slides we will provide some relevant data points around why companies need to come up with ways to protect data. Through our research we found that a shockingly low number of SMB’s have a DR plan even given the fact that there are a fair number of businesses that simply cease to exist if they do experience a disaster. Often times we see that even when a plan is in place all of the data is housed in a single site and there is no redundancy or replication of that data. The virtualization movement has definitely helped us with our backup and DR requirements. As many of us know, backing up a virtual machine is much easier than protecting a physical server but many larger companies still have physical assets that need protected and this can complicate tool selection and DR planning.

https://www.gartner.com/doc/reprints?id=1-2E6FC17&ct=150429&st=sb&submissionGuid=840913dc-d79e-44ad-9553-3a8af62a5263https://www.sba.gov/managing-business/running-business/emergency-preparedness/disaster-planning

20

7/13/2016Cisco Live 2016

The fact of the matter is, disasters do still happen but disaster recovery isn’t typically a top of mind activity for a company, it’s considered insurance at best and no one wants to ever have to use that insurance. Disasters don’t have to be the big tornado or hurricane, they can come in all shapes and sizes. In fact, the majority of data loss happens from regular system and hardware failure along with human error and software corruption. Case in point being that we actually had an array failure in our lab, we had 2 drives go bad and experienced some significant downtime because of this. The following data point is a little bit dated but the fact remains that hard drives fail, SSD’s fail also and when they do companies need to have an assured method for providing protection for the data that lives on these devices. A quick question to our audience…

http://blog.icorps.com/it-disaster-recovery-factshttp://beyondtechnology.com.au/drstatshttp://www.storagenewsletter.com/rubriques/market-reportsresearch/two-thirds-of-organizations-measure-dr-time-in-days-not-hours-twinstrata/

Google data is 2007…reliability has increased but number of hard drives has also increasedDowntime costs range from thousands to tens of thousands per hour…averages range greatly between reports

21

7/13/2016Cisco Live 2016

Finally, another way to look at data loss and the need for some sort of backup and recovery capabilities in the data center. We’re seeing a dramatic spike of ransomware in the wild. We won’t talk about other viruses in this session we’re only pointing this one out because of it’s capabilities and the effect it’s having in the world today. Also, McAfee has actually been seeing ransomware-as-a-service purchasable via the Tor network using bitcoin.

“In 2015 we saw ransomware-as-a-service hosted on the Tor network and using virtual currencies for payments” McAfee Labs 2016 Threat Predictionshttp://www.mcafee.com/us/resources/reports/rp-threats-predictions-2016.pdfhttps://blog.fortinet.com/post/cryptowall-teslacrypt-and-locky-a-statistical-perspectivehttp://blog.talosintel.com/http://www.talosintel.com/files/publications_and_presentations/papers/CryptoWall4_WhitePaper.042016.pdf

22

7/13/2016Cisco Live 2016

Vendors were independently chosen. We did work with them in the lab however data given on the following "The Tools" slides is from the manufacturers. The goal for the session was to minimize the marketing aspect and they could not provide wording or verbiage to any other parts of the presentation.

Cisco Live 2016 7/13/2016

23

*Indicates item where we have partial compliance**Indicates item not included in the lab environmentDemos may support more but we have listed one bullet on each as an exampleUse these as a starting point for your own DR/BC StrategyCan be prioritized to allow for best use of funding and time (what you need immediately and what can be assigned to later phases)

24

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

25

Cisco Live 2016 7/13/2016

26

Cisco Live 2016 7/13/2016

27

Ability to adopt newer processes and find efficiencies is the key to staying relevant both to internal and external customersHave to get away from the mentality of “we spent so much money on XYZ product over the years” and “we aren’t Facebook or Google”Challenge the politics within your organization…the best employees effect changeTechnology is constantly changing and your processes and tools should change with itDepending on procurement model leverage RFP to ensure response is limited to appropriate technologyConsider investing in POC or leverage partners who can facilitate an evaluation of technology

28

7/13/2016Cisco Live 2016

It is normally the path of least resistance…the “do nothing” approachOracle DB, MS SQL/Exchange, and many applications have built in tools that can be automated via scripting (but beware the mad scripter who comments for no man)Oracle has Point in Time Restore feature (requires DB restore) and Flashback (uses Flashback redo logs…more efficient than PITR)MS SQL in Azure offers point in time restores (5 min RPO)Exchange has LAG database (supposed to be easier to manage in 2016) and single item restore functionalityhttps://docs.oracle.com/database/121/BRADV/toc.htmhttps://azure.microsoft.com/en-us/blog/azure-sql-database-point-in-time-restore/

29

7/13/2016Cisco Live 2016

The TruckBusMinivanLimoSnowplowRV

30

7/13/2016Cisco Live 2016

Databases:https://docs.mongodb.com/manual/administration/backup/http://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupRestore.htmlhttps://community.hortonworks.com/questions/31874/how-to-backup-everything-in-hadoop-including-data.htmlhttps://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Docker:http://linoxide.com/linux-how-to/backup-restore-migrate-containers-docker/https://gist.github.com/rodw/3073987

31

7/13/2016Cisco Live 2016

Bimodal IT: http://www.gartner.com/it-glossary/bimodal/

Databases:https://docs.mongodb.com/manual/administration/backup/http://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupRestore.htmlhttps://community.hortonworks.com/questions/31874/how-to-backup-everything-in-hadoop-including-data.htmlhttps://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Docker:http://linoxide.com/linux-how-to/backup-restore-migrate-containers-docker/https://gist.github.com/rodw/3073987

32

7/13/2016Cisco Live 2016

Drew from experience as customers as well as from our own customers across a wide range of size and verticalsUnderstand that while we may want customers to use all Cisco we need to consider customer choiceStorage based software (with some exceptions) is not very granular…we don’t store all of our 5 minute RPO on 2 or 3 LUNsVirtual First…practice in place and working at many large companies including Cisco

33

7/13/2016Cisco Live 2016

Normally would have 2 Nexus switches with VPC enabled for redundancyC3260 would have been USC Managed but was not supported at the time the lab was built (could have leveraged storage profiles)Site 2 leveraged UCS “Mini” with embedded 6324 Fabric Interconnects

34

7/13/2016Cisco Live 2016

35

7/13/2016Cisco Live 2016

We would normally recommend using UCS Central to create and manage the backup policy for UCS but can be done easily without

36

7/13/2016Cisco Live 2016

Gotcha: TCP adjust-mss and IP MTU should be set otherwise you may experience connectivity problems (in our lab we could ping and even use putty but could not open UCSM or vSphere across sites)

http://stayinginit.blogspot.com/2014/03/secure-copy-over-otv.htmlSession on Thursday BRKDCT-2049 Overlay Transport Virtualization

37

7/13/2016Cisco Live 2016

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/BaaS/CommVault/BaaS_CommVault.pdf

38

7/13/2016Cisco Live 2016

Commvault protecting all physical and select workloads at Site 1Using server side deduplicationAux Copy to Site 2 Media AgentCan do VM restore, P V restore, or V Azure restoreChose to have virtual CommServe for ease of DRVSA = Virtual Server Agent Proxy

39

7/13/2016Cisco Live 2016

40

7/13/2016Cisco Live 2016

41

7/13/2016Cisco Live 2016

42

7/13/2016Cisco Live 2016

43

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

44

https://www.veeam.com/wp-veeam-availability-cisco-ucs-deployment-guide.htmlhttps://www.veeam.com/wp-veeam-backup-replication-enterprise-plus-powered-cisco-ucs.htmlhttps://www.veeam.com/wp-cisco-ucs-c240-deployment-guide.html

45

7/13/2016Cisco Live 2016

Veeam protecting most of the virtual workloads in both sites and demo lab which is all virtualBackup Copy job is sending data to Site 2 and to ilandTier 2 Replication is done on a 4 hour schedule from snapshotsTier 3 Replication is done on a daily schedule from the backup copy data residing at Site 2Have Veeam virtual lab built out which can be used to test backups and upgrades complete with Internet access (not meant to be static test lab)Considerations for scaling include number of concurrent jobs, modifying defaults in software, IO Control in vSphere, and others

46

7/13/2016Cisco Live 2016

47

7/13/2016Cisco Live 2016

48

7/13/2016Cisco Live 2016

49

7/13/2016Cisco Live 2016

50

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

51

Cisco Live 2016 7/13/2016

52

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/DRaaS/2-0/Collateral/Zerto_VReplication_VSAN/ZertoVrepVsan/ZertoVrepVsan1.html

53

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

54

Replicating all Tier I VM to Site 2…Failover can be performed from either Site (could do cross site replication)Site 2 VMs are being replicated to iland to demonstrate the options available…many use cases applyVPG Failover testing can be used to ensure functionality of the applications (using isolated network)Can achieve very aggressive RPO with very low RTOZVRA use DRS rules to stay sticky to the host. Important thing to remember is you must shut down the ZVRA before host will go into maintenance mode and you need to coordinate the upgrade of the Zerto environment with the upgrade of vSPhere

55

7/13/2016Cisco Live 2016

56

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

57

Cisco Live 2016 7/13/2016

58

https://www2.wwt.com/wp-content/uploads/2015/03/Overview-Brochure-FlexVault.pdf

59

7/13/2016Cisco Live 2016

60

7/13/2016Cisco Live 2016

http://pages.tegile.com/rs/tegilesystems/images/tegile-white-paper-cisco-vmware-vdi-reference-architecture.pdfhttps://www.tegile.com/wp-content/uploads/2014/10/Oracle-on-Tegile-Cisco-UCS-Reference-Architecture.pdf

61

7/13/2016Cisco Live 2016

62

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

63

Odds are unless you are a very small shop nobody knows all of the moving parts in an applicationSMB – Bob is a single source of information but rarely is it documented /Bob also has his hands in many of the systems and frequently makes changesIT – Silos are challenging to deal with on many levels and adds layers to the processApp Dev should understand the inner workings of the application but frequently don’t have deep infrastructure knowledgeBusiness owners understand things like batch processing and what happens on certain days of the week/month…sometimes can be very knowledgeableRemember that in a true disaster having a clear and detailed process document is critical

64

7/13/2016Cisco Live 2016

SLA creation can be frustrating and time consumingYou have to start somewhere so take inventory of the tools you have and come up with realistic SLA based on that…remember it’s a starting pointRemember the best way to eat an elephant….one bite at a timeTip to remember is cushion your SLA…if you think you can get 5 minutes obligate to 10…just like sales…we can often bring the price down but almost never up

65

7/13/2016Cisco Live 2016

Business Problems:Getting rid of PST filesDeciding how to handle printed materialLocal files on user PCsRetraining users about the policy

66

7/13/2016Cisco Live 2016

Example of Industry Specific Requirements:http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm495891.pdf

http://www.americanbar.org/groups/young_lawyers/publications/the_101_201_practice_series/e_discovery_shifting_the_costs_of_compliance.htmlhttp://logikcull.com/blog/estimating-the-total-cost-of-u-s-ediscovery/https://www.dlapiperdataprotection.com/#handbook/world-map-section

67

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

68

Cisco Live 2016 7/13/2016

69

70

7/13/2016Cisco Live 2016

http://www.cisco.com/c/dam/en_us/partners/program/certifications/download/powered-cloud-and-managed-services-portfolio.pdfhttp://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/DRaaS/2-0/DG/DRaaS_DG_2-0.pdf

71

7/13/2016Cisco Live 2016

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/BaaS/CommVault/BaaS_CommVault.pdf

72

7/13/2016Cisco Live 2016

http://www.cisco.com/c/dam/en_us/partners/program/certifications/download/powered-cloud-and-managed-services-portfolio.pdfhttp://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/DRaaS/2-0/DG/DRaaS_DG_2-0.pdf

73

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

74

Cisco Live 2016 7/13/2016

75

Cisco Live 2016 7/13/2016

76

Cisco Live 2016 7/13/2016

77

For regional SP it’s very important to understand capacity in the event of a region wide disaster. Do they have enough capacity? How are multiple customer requests serviced during this time?

Understand that an SLA cannot always be met and often there are pre-defined refunds which may not equate to the revenue and time lost during an outage.

78

7/13/2016Cisco Live 2016

Depending on the disaster odds are very high that employees will need to take care of family first or in some cases may be incapacitated and unable to respond

Getting the applications up and functional is great but not so great if employees have no way to access them…remember if the disaster is regional odds are you are competing for space/bandwidth…not just with the DR facility but potentially with the carrier as well (your business and your users)

79

7/13/2016Cisco Live 2016

For regional SP it’s very important to understand capacity in the event of a region wide disaster. Do they have enough capacity? How are multiple customer requests serviced during this time?

Understand that an SLA cannot always be met and often there are pre-defined refunds which may not equate to the revenue and time lost during an outage.

80

7/13/2016Cisco Live 2016

81

7/13/2016Cisco Live 2016

82

7/13/2016Cisco Live 2016

Greater uptime, quicker recovery and the peace of mind you deserve

83

7/13/2016Cisco Live 2016

84

7/13/2016Cisco Live 2016

85

7/13/2016Cisco Live 2016

Cisco Live 2016 7/13/2016

86

Cisco Live 2016 7/13/2016

87

88

7/13/2016Cisco Live 2016

89

7/13/2016Cisco Live 2016

90

7/13/2016Cisco Live 2016

91

7/13/2016Cisco Live 2016

cisco live 2016 7/13/2016 -...

Documents