itimpulse noc process this is an interactive, detailed, step wise guide explaining how alerts are...
TRANSCRIPT
ITimpulse NOC process
This is an interactive, detailed, step wise guide explaining how alerts
are managed at our NOC.
This document contains information that is considered proprietary and confidential. No information contained in this document may be released, re-printed, or redistributed without prior permission from ITimpulse.
How to navigate this PPT
• View this presentation in Slide Show (fullscreen)mode.
• Do not navigate using Keyboard. • Use your mouse & click on buttons ,it will
redirect you to the appropriate slide.• Using button will get you to the previous
slide & to the next.• button will redirect you for more
information on the topic.• button will get you back to 1st slide.• Click F5 for Slide Show mode.
Alert detected
• An alert is generated by the RMM and an email is sent to the NOC.
• Our Service desk responds to the alert within minutes.
• Service desk checks if the alert is valid.
• Service desk sets the priority and directs the ticket to correct resource in our NOC team.
Valid Invalid
Valid Alerts
• Valid alerts are categorized and assigned a priority depending on our SLA.
• They are then assigned to L1 techs.
Urgent High Low
Work request
Ticket Life Cycle
Server Outage
Urgent Priority
• All urgent requests are responded with in 10 minutes. In simpler words an engineer is working towards problem resolution within 10 minutes.
• Urgent priority tickets (excluding server outage) are directly assigned to L2 Engineers
• A L3 gets involved if the problem is not resolved in an hour.
Typical Urgent alerts
Handling Server Outage
A server outage is categorized urgent. NOC performs these steps to verify if it’s a network problem or server crash.1. Check for scheduled outage.2. Check if other devices in same site are
online. 3. Ping site Public IP.4. Try to access device from another
computer in the network.
Server Down Network Down
Server down
If a server is confirmed as offline the NOC performs the following actions1. check if server reboots and comes back up 2. access the device using ILO/DRAC3. If server is virtualized, check access from host machine.4. Inform Customer
Yes No
Server reboots
Since we set all servers to reboot automatically, in case of a BSOD they mostly come back up. If they do...
1. Once the server is back online, Our engineers perform a root cause analysis of the issue.
2. We implement a fix and monitor the server for another 7 days.
Server Stays offline
Since we set all servers to reboot automatically in case of a BSOD, they mostly come back up. If they don’t...
1. We inform you that the server has been offline and needs onsite attention.2. We document the probable cause and all the things we have tried in the ticket.3. Our Engineer is available to help when someone gets onsite.
Network down
The NOC checks to see if other devices at the site are online. Yes/NoWe try to ping the gateway to see if it is an internet connection issue yes / no
Inform Customer
Inform Customer
• We will call a number provided by you depending on the time of day.
• We will email you about the problem with our investigations.
• All troubleshooting will be documented in detail in your PSA.
High Priority
• All High priority requests are responded with in 30 minutes.
• A L2 Engineer get involved on the ticket before 60 minutes and a L3 if the problem is not resolved in 4 hours.
• We resolve most high priority tickets in 24 hours.
Typical High priority alerts
Low priority
• All Low priority requests are responded with in 4 hours.
• A L2 Engineer gets involved on the ticket if it can not be resolved after 1 hour of troubleshooting.
• We resolve most low priority tickets in 48 hours.
Typical Low priority alerts
Examples of urgent priority alerts
Server down Alert Critical Event viewer error
Critical RMM alert
Event viewer error that leads to critical error
Device failure caused by patch deployment
Database offline alert
Scheduled task failure
Exchange service outage alerts
Server Performance threshold alert
Examples of high priority alerts
Non-Critical Event viewer Error
Event viewer warning
Non-Critical RMM alert
Server Anti-virus scan or update alert
Server Malware infection alert
Server Backup failure alert
Scheduled task failure
Software or RMM agent deployment
Examples of Low priority alerts
Event viewer alerts from workstations
Performance issues on workstations
Workstation backup failure alert
RMM alert for workstations
Workstation Anti-virus scan or update alert
Workstations Malware infection alert
Software or RMM agent deployment
Workstation Patch installation failure alert
Patch approval
Work request
• More info on Work requests • All work requests are responded with
in 4 hours. • All work requests are resolved within
24 hours.• The time may vary depending on the
scope of the request.
Invalid alerts
• Invalid alerts are closed and a properly documented.
Ticket Life Cycle
•An alert generated by your RMM creates a ticket in the PSA. For devices managed by our NOC, this alert is forwarded to the NOC’s Board or queue.
Acknowledge
•Our Service desk team validates these alert. They remove the false positives. The validated alerts are further prioritized and categorized.Validate•Our Service desk assigns the ticket to the right resource. If something needs to be done at a later time, they also schedule it.Assign
The service desk is our front line of support. They perform the below tasks on each and every ticket. Service desk does not perform any troubleshooting.
Assigned to L1L1 receives tickets assigned by SD
Our L1 team follows our internal Knowledge base and documented resolutions to resolve a problem.
ResolvedMajority of tickets are resolved by the L1 team
Unresolved Tickets are escalated to L2
If an input is needed , we contact you
MonitorWhere resolution can not be confirmed immediately
Escalated to L2• When a ticket can not be resolved with
known procedures, the tickets are escalated to L2
• All our L2 engineers are MCITP certified and have over 3 years of experience.
• L2 engineers find the root cause and resolve the problem.
• Depending on priority, they get 30 minutes to 4 hours to research and resolve the problem.
• Any tickets that are not resolved are further escalated to L3.
Assigned to L2L2 receives tickets assigned by SD
Depending on priority, L2 engineers get 30 minutes to 4 hours to research and resolve the problem.
ResolvedResolved tickets are documented and closed
Unresolved Tickets are escalated to L3
MonitorWhere resolution can not be confirmed immediately
If an input is needed , we contact you
Escalated to L3• When a ticket can not be resolved a L2, the
tickets are escalated to L3• L3 is our last tier of support. Our L3
engineers have over 6 years of experience on the field and they are also Subject matter experts in a field of their choice.
• In a rare circumstance a ticket can not be resolved by a L3, we will call you to discuss how to proceed further.
Assigned to L3L3 receives escalation from L2
L3 engineers form our final tier of support.
ResolvedResolved tickets are documented and closed
Unresolved Tickets are escalated
MonitorWhere resolution can not be confirmed immediately
Resolved tickets
• Resolved tickets are fully documented in the PSA.
• An appropriate time entry is added in the PSA.
• Ticket is marked closed.• Our Quality team reviews ticket properly
closed.• If a ticket was closed by a L2 or L3
engineer he creates a new solution article for the problem in our internal KB.
Assigned to Customer
• Any tickets that need physical access to the site are assigned to customer.
• Tickets where more information is required for resolution are assigned to customer.
• Only 1 in every 50 tickets will require your attention.
Unresolved by L3
• This often means that we have reached a dead end and may need a workaround or replacement as the problem can not be resolved.
• Our L3 Engineer will call you and discuss available options, their down sides and time it will take for implementation.
• Any changes will only be made after your approval.
Ticket on hold for monitoring
• Some tickets may be resolved but need confirmation before closure.
• Such tickets are assigned back to SD team and put on hold for a specified period of time.
• After the period of time has passed, the SD team checks if the issue is resolved.
• Resolved tickets are closed. Unresolved tickets are reassigned to engineers.
End of ticket Life Cycle
This brings us to the end of the Ticket Life cycle section. Press the back button below to go to previous section. Click home to get to beginning of the slide show.
This document contains information that is considered proprietary and confidential. No information contained in this document may be released, re-printed, or redistributed without prior permission from ITimpulse.
To know more about how to get started with NOC services, our NOC onboarding process, how we integrate with your existing tools and deliver
seamless NOC services schedule a web-demo with us.
Email [email protected] to schedule a live demo.
For further inquiries and information please feel free to contact us at: US: +1 646-351-8634 India: +91 020-6500-2328 Email: [email protected] Website: www.itimpulse.in Direct mail: ITimpulse, B112, Ganga Osian Square, Wakad, Pune – 411057
ITimpulse provides RMM agnostic, White label NOC services for MSPs