managing oracle cloud...increased java heap size oms properties - console timeout - adf timeout...
TRANSCRIPT
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Managing at Hyper-Scale: Oracle Enterprise Manager as the Nerve Center of Oracle Cloud
Akshai Duggal, Director Strategic Customer ProgramsRajiv K Maheshwari, VP Product DevelopmentJonathan Cohen, VP Cloud OperationsOracle
October 26, 2015
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Coming together is a beginning; keeping together is progress; working together is success. --- Henry Ford
Akshai DuggalPM
Oracle Cloud DevOps Team
Rajiv MaheshwariDev
Jonathan Cohen Ops
3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
The Cloud: A New Era of Utility ComputingAll Three Tiers of Computing Delivered as a Service via Global Network
• Applications: Software as a Service – SaaS
• Platform: Database, Middleware, Analytics, Integration… as a Service – PaaS
• Infrastructure: Storage, Compute, Network as a Service – IaaS
SaaS PaaS IaaS
4
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Management Needs for Cloud
Enterprise Manager Deployment for Hyper Scale
Cloud Security
Cloud Agility
Cloud Operations: A Day in the Life of Ops
1
2
3
4
5
5
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Management Needs for Cloud
Automationfor ITOps, DevOps
Security & ComplianceBased on industry and
Oracle IT standards
Monitoring & SLMWith 24x7 coverage
persona-specific dashboards
Analytics & Reportingfor Ongoing planning
and optimization
6
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle Cloud is Very Very Large Scale
IT Devices Tier 4 Data Centers
Users on the Oracle Cloud
Every Day
Transactions on the Oracle Cloud
Every Day
54000 19 70M+ 33B+
Unprecedented Scale Warrants Unmatched Management
7
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hardware
Oracle Cloud Typical Datacenter Assets
Enterprise Manager : Manage, Monitor, Analyze
Software
EXADATA
ZFS FILER
FIREWALL
STORAGE
INFINIBAND
VM
COMPUTE
X86SERVER
SUN SERVER
SOA COMPOSITES
FUSION APPSDATABASE
SOA INFRA
ORACLE IDENTITY MANAGER
J2EE APP
RAC DB
ORACLE ACCESS MANAGER
ORACLE INTERNET DIRECTORY
WLS CLUSTER
WLS
METADATA DB
EXALOGIC
EXADATA CELLS
ILOM
IB SWITCH
FMW FARM
SERVICE
SOA
WEB APP
MULTITENANT
DATA GAURD
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Total Cloud Control
Optimized, EfficientAgile, Automated ||
Expanded Cloud Stack Management
Scalable, Secure
Superior Enterprise-Grade Management
Accelerated Automation for Broader Cloud Services
9
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Tier4 EM deployment with HA (4 node RAC, 6 node OMS) and DR site
Resilient Scalable Architecture
• Secure light weight agent communications
• Built on Industry and Open Standards
• EMCLI, REST API for scripting and integrations
• Public Database views for Extraction and Reporting
• Deployed with Growth & High Availability in mind
Typical EM Architecture in Oracle Cloud
EM 12.1.0.5 RDBMS 11.2.0.4 Exadata: X4-2 (2*16) 32CPU thread, 240G EM Repository: 6.5TB, FRA 11T(3way redundancy)OMS : SUN FIRE X4170 M2 (2*12) 24 Core, 140G, 700G disk
White Paper – Deploying a highly available Oracle Enterprise Manager 12c
10
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Configuration, Backups and more… for Scale
Configuration Tuning*
Load Balancer for HA-OMS
LDAP with SSO for EM user access
Secure OMS with wallet holding custom certificate
Increased Java heap size
OMS properties - Console timeout- ADF timeout values- Job worker connection threads
httpd.conf tuning
WLS socket timeouts *Appendix slide has parameter details
MOS Note 1553342.1 - Oracle Enterprise Manager 12c Configuration Best Practices
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Repository Backups Following MAA Best Practices
– RMAN. daily incremental and level0 (fullbackup) on every Sunday. Plus archlog backups for every 2 hrs to free up space in Flash (reco DG)
– Backup copied from ASM to NFS. One week backups on ASM and on NFS, and 15days backups on tape
– No FLASHBACK FRA ~11T. Guaranteed Restore points (GRP) used during Upgrades/Maintenance
• OMS Backups
– Binary Install, Software Library, EMKey $ORACLE_HOME/sysman/config/emkey.ora
– OMS Configuration from all OMSes ** Before and After every maintenance $ORACLE_HOME/bin/emctl exportconfig oms
• Agent Backups
– Recoverable from the OMS, except for emd.properties changes, backup
EM Configuration, Backups and more… for Scale
12
MOS Note 1929586.1 – Patch Set and Critical patch Update
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Managing the Millions Group Targets with Common Attributes
Whitepaper: Strategies for Scalable, Smarter Monitoring using Oracle Enterprise Manager Cloud Control 12c
13
Management Tasks Oracle Cloud Grouping
Monitoring, Notification Apply monitoring templates to groups based on target type, Or groups based on target type and a Service
Problem Analysis Dynamic groupsEx : Target type=pod and Group=PaaSTarget type=PBCS Service, lifecycle status=prod
Compliance Group of databases of SaaS Service
Reporting Group of groups for executive reports
Dashboards Group of all targets of a particular service
Service Maintenance Maintenance window patching of multiple services and blackouts, Patching selective group of hosts, Patching selective group of Agents
Jobs Selective group of hosts, group of databases
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
•Standardize Metrics (Iterative!)•Limited alerts (metrics with thresholds) to actionable metrics
•Used reports for non-urgent metrics and trending
•Frequency of metric check
•Occurrences to reduce noise
•Templates Used•Oracle certified templates edited
•Metric Extensions•Custom metrics added to templates
•Apply templates after version rev-up
Cloud Standardization: Monitoring Templates
14
MetricsTemplate
Metric Extension
Change Management Committee
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 15
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Standardization: Gold Agent Image
Create/Update Gold Agent Image
Service Provisioning: Install using Gold Agent Image
Existing Services: Upgrade using Gold Agent Image
• Pick an Agent, Apply patches
• Create Gold Agent image in shared area
• Add to source control
• Copy from Gold Agent image
• Rundeploy.sh to deploy Agent
• Copy to new Agent Oracle Home
• Run upgrade
Patching Agent : Follow Blog: Simplified Agent and Plug-in Deployment
16
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 17
Cloud Standardization: Notification & Incident Rules
Notifications
•Notification Naming Standards
•Notification Rules Criteria•Service Teams
•Target Type
•Event Severity
•Groups
•Notification Methods•Dashboard Only (Eyes on the Dashboard)
•Email + Dashboard
•Ticket** + Pager + Dashboard
•Ticket** + Dashboard
**Tickets generated by custom notification method
Change Management Committee
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 18
Cloud Standardization: Notification & Incident Rules
Notifications
•Notification Naming Standards
•Notification Rules Criteria•Service Teams
•Target Type
•Event Severity
•Groups
•Notification Methods•Dashboard Only (Eyes on the Dashboard)
•Email + Dashboard
•Ticket** + Pager + Dashboard
•Ticket** + Dashboard
**Tickets generated by custom notification method
Change Management Committee
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 19
Cloud Standardization: Notification & Incident Rules
Notifications
**Tickets generated by custom notification method
Change Management CommitteeNotification Methods
Criteria
Naming Standard
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud SecurityIngrained in every System and Service
20
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Security Management
Information Protection
Encryption,VLAN, IP Filter, TDE,
Application Data Integrity,Data Masking, Redaction
Access Control Compliance & Reporting
Auditing, Entitlements Management,Governance, Risk, &
Compliance Management
Identity Management (Password and Wallet Management),
Virtual Private Database, Data Vault
21
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Ongoing Compliance Reporting
• Ongoing drift checks across tenant PODs
• Vigilance on security best practices for the entire stack
• 30K compliance evaluations/day
Patch compliance App to Disk
22
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Operations
Role, Named Credentials
Audit in EM, Audit of access in Tenant DB
23
Cloud EM Admin
EM Super Users
Audit in EM
Cloud Support
Service based Roles, Named Credentials
Audit in EM, Audit of access in Tenant DB
EM Users
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Operations
Dashboards Management
Service Jobs, Deployment Procedures
Reports creation
Metric thresholds
Blackouts : Agent & Target level
Target Perf Management - Analyze Performance degradation of Target User interface
Level 1 Agent Administration
24
Cloud EM Admin
EM Health
EM Patching/ Agent patching
Service deployment, SLA setup
Notification Rules Setup
Named Credential creation
Automate Service Prov/de-prov
Config compare jobs
Compliance Rules enable
Metric Extension creation
Cloud Support
View only access
Establish customer pod health
Queries to establish customer Service data problems
Diagnostic dumps
Analyze customer SR
Application job status
EM Users Responsibilities
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud AgilityExtensibility and Automation for True Agility
25
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Agility
Cloud Management Requirements EM Feature
New measurements (infra or business), new errors
Metric Extensions
New services Custom Plugins
Outside-in testing, service level measurement Beacons
Reporting (Ops, Development, LOB, Executive) BI Publisher Reports
Detecting issues, avoiding configuration drift Compliance Rules
Task automation Jobs, Deployment Procedures, EMCLI
26
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Extensibility: Metric Extensions
• Important Features
– Rapidly developed by DevOps
– Easily undeployed when no longer needed
–Quickly apply and update across the fleet via monitoring templates
– Exportable from one EM to another (e.g., Test to Production)
• Cloud Use Case
–Metrics & alerts based on log filtering, tracking certificate expiry
• Usage in Oracle Cloud– 100s of Metric Extensions deployed
–Most popular – OSLineToken, URLTiming, REST, JMX, JDBC fetchlets
27
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Extensibility: Custom Plugins
• Important Features
– Easy to develop meta-data plugins
– Allow rich modeling of components/services/systems
– Support collection of custom service/business metrics, thresholds, compliance rules, …
• Cloud Use Case
– Each Cloud Service delivers a custom plugin with one or more target types for its service/business metrics
• Usage in Oracle Cloud
– Dozens of custom target plugins deployed
28
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Extensibility: Beacons
• Important Features
– Test “outside-in” health of a service or component
– Run test from multiple locations to determine external network issues
– Alert based on reach’ability as well as latency/performance
• Cloud Use Case
– Service availability for each Oracle Cloud Service is based on Beacon tests
• Usage in Oracle Cloud
– 10s of thousands Beacon tests deployed
– 10s of millions Beacon tests executed daily
29
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Extensibility: BI Publisher Reports
• Important Features
–Quickly generate rich, professional quality reports from fleet-wide EM data
–Output in PDF, Excel, PowerPoint, Word, and HTML
– Schedule reports and deliver via e-mail and FTP
• Cloud Use Case
– Storage reports, host OS reports, network device reports, outage reports, capacity reports, per-Service LOB reports
• Usage in Oracle Cloud
– 100s of custom reports
– 10s of thousands reports accessed daily
30
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Extensibility: Compliance Rules
• Important Features
– Ensure compliance with business best practices in terms of configuration, security, …
– Automatically determine if targets and systems have valid configuration settings
– Real-time monitoring to detect configuration changes or unauthorized actions
• Cloud Use Case
– Alert on configuration drift, unexpected file permissions
• Usage in Oracle Cloud
– 10s Compliance Framework deployed
– 10s of thousands Compliance Rules executed daily
31
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Automation: Jobs, Deployment Procedures
• Important Features
– Create single-task or multi-task jobs to automate repetitive tasks
– Execute jobs against specific targets or groups
–Monitor and track results of executions
– Configure Job libraries to give controlled access to Administrators
• Cloud Use Case
– Production-to-test orchestration
• Usage in Oracle Cloud – 10s of thousands Jobs deployed
– 10s of million Jobs executed daily (system, user)
32
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Automation: EMCLI
• Important Features
– EMCLI interactive or script mode
– Jython scripting, Web services
– String together a set of EM commands for automation
– Integrate external systems with EM using EMCLI
• Cloud Use Case
– Automation for template apply, blackouts
• Usage in Oracle Cloud– Used from within jobs
– Used by multiple external systems to integrate with EM
33
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Automation Example: Service Provisioning
1 • Create VMs
2 • Deploy Agents on VMs
3 • Discover and Promote Targets
4 • Set Global Target Properties
5 • Set Monitoring Templates
6 • Set Compliance Standards
7 • Create Beacons
8 • Register with External Systems
34
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud OperationsA day in the life of Ops
35
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
24x7 Monitoring and Service Level Management
• Operational as well as Service level dashboards for monitoring 7.5M assets grouped into 2000 groups
• Monitoring of end user interactions business KPIs and IT metrics
• Integrated with Support (for Ticketing)
• 11M synthetic tests/day (20/hr per Service)
• 3.4M events processed/day
36
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• All users, 24x7• Experienced performance per user click
• Aggregation by functional area of the application
• Automatic detection of outliers• Exceptional patterns of performance
• EM events triggered
• Application context aware• Generic or specific app functions impacted?
• Application.Infrastructure or Network delivery?
Monitoring Real End User Experience (RUEI)70 M+ User Requests Monitored/day
Tenant User Geo Mapping
Rapid functional issue isolation
37
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 38
Cloud Services Availability: Proactive Monitoring
• Incident Management – Pro active reaction to all warning and critical EM alerts incorporating all aspects of the application/infrastructure layer including all manage server, MidTier, Database, Infrastructure, Exadata, ZFS Storage .
• Service Availability Dashboard monitoring global health and uptime across 19 data centers.
•Global Consolidated Service Disruption Dashboard providing customer uptime and SLA metrics.
Eyes on the Dashboards
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 39
Cloud Services Availability: Proactive Monitoring
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Services: Fleet Health Management
Problem Management : Dashboards to capture fleet wide frequent occurrences of WLS errors, J2EE deployment failures, failed Web services, review Synthetic test results
40
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Services: Operational Management
•Execution of scripts on fleet of SaaS ExadataServers• Validations of Application level Certificate (Expiry/Renewal)•Centralized scheduling of jobs – example: purge jobs for all Service Pods scheduled/monitored from a single location•Identify and fix anomalies in services –custom script deployment through Enterprise Manager for auto corrective actions E.g. Identify read-only filesystem, MC Agent process monitoring, auto corrective actions for /tmp cleanup etc.
41
Tracking Executions of Procedural Steps
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Login, Business Transaction Flows
• Availability tests and BusinessFunctional tests – Execute at different frequency
• Twice a day review of Dashboard to capture failed Web Services.
Change Events Validation• Pre & Post Upgrades, Patching,
Outage
Proactive EM Incidents
• Infrastructure, Application metric alerts
• Proactively Identify potential functional impact and address issue
Monitoring Service App Flows Using Synthetic Tests (Beacons)11M+ Business Flow Tests Executed /day
Selenium based URL Tests
42
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Issue: Analysis*Search issues using business parameters * Isolate customer issue
* Validate Service health, incidents *Drill into Service details for analysis
Analyze
43
Search
Analyze
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Issue: Diagnosis
Root Cause Analysis for degraded performance
44
JVM thread is waiting on which query in the db?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Issue: Capture Diagnostic Dumps
Root Cause Analysis for degraded performance
Capture Diagnostic dumps for Incidents
45
Why has this sql been running for 32 hours?
Which Sql has high Wait events?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 46
Cloud Portal: Tenant Resource UsageAvailability and usage data reported from EM
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Portal: Tenant Resource Usage
Availability data coming from EM
47
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 48
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
EM Configuration , Backups and more… for Scale*Configuration Oracle Cloud Site Tuning
Load Balancer for HA-OMS Reference Enterprise Manager Cloud Control Advance Installation guide
LDAP with SSO for EM Users access External Roles with LDAP setup; Automated employee account creation
Secure OMS with wallet with custom certificate
Remember to Secure all agents. New Agents secured at Service Provisioning.Secure EM CLI Enterprise Manager Cloud Control Advance Installation guide
Increased JAVA Heap Size Recommend 4gb, Currently tuned to 16G
OMS properties - Console timeoutADF timeout valuesJobworkers
oracle.sysman.eml.maxInactiveTime -value 240
oracle.sysman.emdrep.adminmsg.Adminmsglistener.healthmo
nitor_timeout 1800
oracle.sysman.core.conn.maxConnforJobwWorkers 200
Httpd.conf tuning MaxClients 1024 MinSpareThreads 128 MaxSpareThreads 256 KeepAliveTimeout 16 StartServers 8
WlS Socket Timeouts WLSocketTimeoutSecs 10, WLIOTimeoutSecs 2700
*The above tuning of parameters were done by Oracle Cloud to scale for site growth. It is recommended that before making changes to your site contact Oracle Support.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Cloud Standardization: Gold Agent Image
Create/Update Gold Agent Image
Service Provisioning: Install using Gold Agent Image
Existing Services: Upgrade using Gold Agent Image
• Pick an Agent, Apply patches
• Create Gold Agent image in shared area, add to source control
• Copy from Gold Agent image
• Rundeploy.sh to deploy Agent
• Copy to new Agent Oracle Home
• Run upgrade
emcli create_gold_agent_image -source_agent=<source_agent>:<port> -config_properties="MaxThreads;MaxInComingConnections;_trustedOperationMessageTimeout;propComputeParallelization" -series_name=<imageseries> -gold_image_name="<GI name>" -gold_image_description=<desc> -working_directory=<path>
emcli update_agents -gold_image_name=<GI name> -is_staged=true -stage_location=<Stage loc name> -input_file="agents_file:<file loc>"
51