field issues avoid_enhancements
TRANSCRIPT
1
Field Issues to Avoid; Enhancements to Use
BroadSoft Technical Summit, June 2009
2 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
AgendaAgenda
• Issues to Avoid– DNS Delays
– Call Looping & Fan-out
– Overload
• Enhancements to Use– Platform Enhancements
– BroadWorks Hardware Support Policy
2
3 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Avoidable Field Issue –Analysis/Recommendations
Avoidable Field Issue –Analysis/Recommendations
• Top three trouble ticket root causes that can gener ally be avoided
– DNS Delays
– Call Looping & Fan-out
– Overload
4 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS Delays
3
5 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS Delay ImpactsDNS Delay Impacts
• When a route or contact is defined as a domain name, the server needs to resolve the name
– Call in limbo until DNS resolution completed or timed out
– Call Processing thread that the call is running in is blocked • Alarms = bwThreadDelayDetected
– Severe thread delays can trigger overload condition• Alarms = bwOverloadZoneTransition
– Severe delays across all Call Half Input Adapter threads can bring down the Execution Server process
• Alarms = bwForcedExitDueToHungThread, bwCallPThreadAutoRestart, bwPMExecutionServerRestarted
• DNS Delays = Trouble for You and Your Customers
6 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
What Requires DNS Resolution?What Requires DNS Resolution?
• Application Server access device addresses that are FQDNs
• Network Server 302 response returned routes that are FQDNs
• SIP headers used in response/request routing that are FQDNs
– Request-URI
– Via header
– Contact header
4
7 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS Problem TypesDNS Problem Types
• DNS delay issues can be caused by a number of factors
– Connectivity Issues: DNS server completely unreachable for a period of time
– Slow Response Time: DNS server is reachable, but the application is introducing delays
• BIND is not a carrier grade DNS
– Non-Authoritative Lookups: Delays incurred when going out to other DNS to resolve FQDNs not owned by provider
• URL dialing – user dialing non-existent FQDNs can take seconds to resolve
8 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Is There Anything I Can Do?Is There Anything I Can Do?
• Release 14sp3 Application Server includes a number of DNS client enhancements that:
– Apply lookup time limits to mitigate the impact of DNS delays on the system
– Optionally disable DNS lookups in URL dialing
– Better control DNS querying and response caching
– Better monitor DNS performance
• Some of these require configuration to be activated
5
9 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Lookup Algorithm14SP3+ DNS Lookup Algorithm• DNS information pulled from /etc/resolv.conf on BroadWorks
startup– nameserver : DNS server list. Lookup will start with first server
in the list (unless rotate option used) and route advance to next in list on no response conditions
– domain: Local domain name. Will be appended to the contact for additional lookups if lookup returns “No such name”response
– options: Optional parameters• retrans: response wait time (Default =1 sec)• retry: number of query attempts to a nameserver before advancing
to next server (Default = 2)• rotate: load balance across all listed nameservers
bwadmin@IHApp$ more /etc/resolv.confdomain eng.broadsoft.comnameserver 192.168.2.40nameserver 10.2.1.1options retrans:1 retry:2 rotate
10 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Lookup Algorithm14SP3+ DNS Lookup Algorithm• Query type performed varies depending on configurat ion:
1. AS_CLI/Interface/SIP>supportDnsSrv parameter
2. Contact port provided or not– e.g., device was configured with port= 5060, or no port (null)
3. Transport unspecified, UDP or TCP
supportDnsSrv = False supportDnsSrv = True
Port = ANY
Transport = ANY
A record A record
Port = Null
Transport = TCP
A record • _sip._tcp SRV, if no match,
• A record
Port = Null
Transport = UDP
A record •_sip._udp SRV, if no match,
• A record
Port = Null
Transport = Unspecified
A record • _sip._tcp SRV, if no match,
• _sip._udp SRV, if no match,
• A record
6
11 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Query Properties14SP3+ DNS Query Properties
• Additional DNS configuration properties defined in the /usr/local/broadworks/bw_base/conf/appserver.proper ties
– Configurable via AS_CLI/System/StartupParam>
Property Description
bw.nameservice.cachePolicy Enumeration {NEVER, CONFIGURED, HONOR_DNS}.When set to HONOR_DNS, the DNS client uses the response’s ttl value.Default is “CONFIGURED”.
bw.nameservice.cacheTtlSecs Amount of time (in seconds) a successfully looked up record is cached if cachePolicy = CONFIGURED.
Default is “86400”.
bw.nameservice.nCachePolicy Enumeration {NEVER, CONFIGURED, HONOR_DNS}.When set to HONOR_DNS, the DNS client uses the “minimum” value of the
response’s SOA record.Default is “CONFIGURED”.
bw.nameservice.nCacheTtlSecs Amount of time (in seconds) a looked up record with negative response is kept in negative cache if negativeCachePolicy = CONFIGURED.
Default is “600”.
12 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Query Properties14SP3+ DNS Query Properties
• Additional DNS configuration properties defined in the /usr/local/broadworks/bw_base/conf/appserver.proper ties
– Configurable via AS_CLI/System/StartupParam>
Property Description
bw.nameservice.unreachableServerLingerSecs Minimum time interval (in seconds) for which no DNS request is sent to a server detected as unreachable.
Default is “60”.
bw.nameservice.useAdditionalSrvRrs Boolean indicating that A lookups resulting from SRV lookups should use, when populated, the pre-resolved A resource records from the additional RR section of the SRV lookup response. Local caching must be enabled for this to have effect.
Default is “true”.
bw.nameservice.denyTimeBoundedDuplicateLookups
Boolean indicating if duplicate lookup with same name and type from two time-bounded threads are allowed or not.
Default is “true”.
7
13 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Configuration Options14SP3+ DNS Configuration Options
• Local Name File: Can populate local records directly on the server that will be loaded into cache on sta rtup
– /usr/local/broadworks/bw_base/conf/namedefs– DNS Client lookup order is local cache, /etc/hosts, DNS
bwadmin@IHApp$ more /usr/local/broadworks/bw_base/c onf/namedefs
#
_sip._udp.ns.lab.broadsoft.com SRV 1 99 5060 ns1.la b.broadsoft.com
_sip._udp.ns.lab.broadsoft.com SRV 2 99 5060 ns2.la b.broadsoft.com
ns1.lab.broadsoft.com IN A 192.168.1.91
ns2.lab.broadsoft.com IN A 192.168.2.61
vm1.lab.broadsoft.com IN A 192.168.1.10 7
vm2.lab.broadsoft.com IN A 192.168.2.11 9
_pop3._tcp.lab.broadsoft.com SRV 0 0 110 vm1.lab.b roadsoft.com
_pop3._tcp.lab.broadsoft.com SRV 1 0 110 vm2.lab.b roadsoft.com
14 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Configuration Options14SP3+ DNS Configuration Options
AS_CLI/System/CallP/DNS> get
enableNameLookupForURLDialing = false
enableNameLookupTimeout = true
nameLookupTimeoutMilliseconds = 500
• URL Dialing Lookup: Ability to disable DNS lookups for user dialed URLs (e.g., [email protected] )
– Generally, URL call will go to NS, if UrlDialing policy hit, NS returns MADDR=Domain, AS looks up the domain
• UrlDialing can be disabled on NS
– Bad domains can take seconds to resolve– Lookup controlled on AS by
enableNameLookupForURLDialing parameter (Default=true)
– If disabled, URL call that requires lookup will get treatment
8
15 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
14SP3+ DNS Configuration Options14SP3+ DNS Configuration Options
AS_CLI/System/CallP/DNS> get
enableNameLookupForURLDialing = false
enableNameLookupTimeout = true
nameLookupTimeoutMilliseconds = 500
• Time Limit on CallP Lookup: Ability to configure a time bound on DNS lookups within the CallP thread
– If enabled, any CallP thread DNS lookup that takes longer than namelookupTimeoutMilliseconds will result in that call being sent to treatment
– Lookup will still continue in the background and any result (positive or negative) will be cached for further use
– Default setting is false for no time limit
16 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Additional 14SP3+ DNS ToolAdditional 14SP3+ DNS ToolAS_CLI/ASDiagnostic/DNS> ?
0) clearAllCache : Clear DNS cache
1) clearCache : Clear a single entry from BroadWork s DNS cache
2) lookup : Lookup name using DNS client
3) reload : Reload BroadWorks DNS client static ent ries from configuration files
• clearAllCache/clearCache: Flush a single entry or all entries from AS DNS cache without requiring a server stop
• reload: Dynamically read the local namedefs file to update cache
• Lookup: Perform the same lookup the application will and se e where it is pulling the result from, local file, /e tc/hosts, DNS
– Need to specify the query type and proper prepend for SRVe.g., AS_CLI/ASDiagnostic/DNS> lookup _sip._udp.ns.eng.br oadsoft.com SRV
9
17 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Identifying DNS DelaysIdentifying DNS Delays• DNS Specific SNMP Traps
– bwSipUnrecognisedDomainName: Severity = Medium• Generated when a lookup returns a “No such Name”
response to a query• Trap will include the unresolved domain• Unresolved domain will be added to negative cache
– bwDnsServerUnreachable: Severity = Low• Generated when a DNS server does not return a response
within /etc/resolve.conf retrans & retry parameters• DNS server is considered out-of-service for
bw.nameservice.unreachableServerLingerSecs time period• DNS client will route advance and use next DNS server
– bwDnsAllServersUnreachable: Severity = High• The last available DNS server is unreachable and all others
are still in the “unreachable linger” state
18 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Identifying DNS DelaysIdentifying DNS Delays• DNS Specific SNMP PMs
– broadworks/executionServer/dnsModule/dnsStats/
AS_CLI/Monitoring/PM/ApplicationServer> get -r
------------------------------------------------------------------------------
broadworks/executionServer/dnsModule/dnsStats/
------------------------------------------------------------------------------
*bwDnsQueryTimeMax 501
*bwDnsQueryTimeMaxTimestampMSB 289
*bwDnsQueryTimeMaxTimestampLSB 310639013
*bwDnsQueryTimeAvg 23
bwDnsStatsQueriesTable:
(1) bwDnsStatsQueryIndex
(2) bwDnsStatsQueryType
(3) bwDnsStatsQueries
(4) bwDnsStatsQueryTimeouts
(1) (2) (3) (4)
1 A 10 0
2 PTR 0 0
3 SRV 34 8
4 NAPTR 0 0
10
19 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Identifying DNS DelaysIdentifying DNS Delays• DNS Specific SNMP PMs
– bwDnsQueryTimeMaxTimestampMSB/LSB: Time Stamp of the longest query time (excluding timeouts).
• Need to be decoded» [(MSB * 2^32) + LSB] = Unix time from 1970» To get local server time, drop rightmost 3 digits and use the following» $ perl -e 'require "ctime.pl"; print &ctime(RESULT);'
• Example,» *bwDnsQueryTimeMaxTimestampMSB 289
» *bwDnsQueryTimeMaxTimestampLSB 310639013
[ (289 *2^32) + 310639013] =1241556187557
IHApp$ perl -e 'require "ctime.pl"; print &ctime(12 41556187);'
Tue May 5 16:43:07 US/Eastern 2009
20 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Identifying DNS DelaysIdentifying DNS Delays• bwThreadDelayDetected SNMP trap
– If a Call Half Input Adapter thread is delayed for more than 2.5 seconds, /var/broadworks/logs/appserver XSOutputXX.log file will capture a thread dump
• You can quickly verify if this thread delay was related to DNS by searching for the presence of the following strings
» $grep “Inet4AddressImpl” XSOutput*.log» $grep “DNS.ExtendedResolver” XSOutput*.log
• You should not see this occur if recommended DNS configuration is implemented
11
21 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS Design ConsiderationsDNS Design Considerations• Ideally DNS should not be used for call processing within
the core– Network core elements tend to be static
• Should be defined IP addresses, or if FQDN is absolutely required, define them locally on the AS in the namedefs file
– DNS does have a place on the access side• Big difference if a phone has a problem resolving an address versus
an AS running 60 Calls/Sec can’t resolve an address
• If DNS is required within the core for Call Process ing– Should use the local namedefs files if possible– If external DNS required, should be on dedicated DNS
infrastructure, not overlay on data DNS– If “wide-open” URL dialing is not required, DNS server should
not forward to root server for domains not “owned” by the server
22 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Identifying DNS DelaysIdentifying DNS Delays• DNS Specific SNMP PMs
– bwDnsQueryTimeAvg: Average response time from the DNS servers
• Average latency added to each call requiring DNS resolution
– bwDnsStatsQueryTimeouts: Per record-type DNS server timeout count
• Means server did not respond within /etc/hosts retry/retrans parameters
– bwDnsQueryTimeMax: Longest query time since last reset
12
23 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS Recommended ConfigurationDNS Recommended Configuration
1. Use local namedefs file whenever possible• If you just have a small number of non-IP contacts to resolve, then
define locally to ensure instant response time
2. If URL dialing outside the group is not required, s et enableNameLookupForURLDialing to false• Protects against users URL dialing to “bad” domains• Can also not assign URLDialing policy on the NS
3. namelookupTimeoutMilliseconds should be enabled and the timeout period left at the default 500 msec• Ensures that the call processing thread will not be delayed more that
500 msec waiting on DNS
4. /etc/resolv.conf retrans and retry parameter should be increased toretrans:1 retry:5• Since call processing protected via DNS time limit, DNS server time
out should be adjusted upwards to avoid flagging a DNS as “unreachable” due to long response
24 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
DNS on the Network ServerDNS on the Network Server
• Network Server has default DNS functionality that most people do not realize nor use
– For every INVITE received by the NS, the NS does a forward (if IP) or reverse (if FQDN) lookup on the @host portion of the originators URI (From:, RPID, PAI) to see if it matches a known network element
– Most customer do need that since they define NS routingNEs and hostingNEs for strict string match
• This unnecessary DNS lookup can be disabled via CLI
NS_CLI/Interface/SIP> get
useDNSLookup = false
13
25 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Call Looping & Fan-out
26 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Call Looping & Fan-out TypesCall Looping & Fan-out Types
• Excessive calling– Large number of origination or terminations associated with a
given user • May be malicious or not (e.g., autodialer)
• Redirection Looping – A calls B, who is FWD to C, who is FWD back to A
• Redirection Chaining– A calls B, who is FWD to C, who is FWD to D, who is FWD to ….
• Excessive Call Fan-out– Incoming call hits user with SIMRING to 10 numbers, and each
of those numbers has SIMRING to 10 other numbers, and so on…..
• Depending on the fan-out depth, can result in 100s of new calls starting almost instantaneously impacting server performance
14
27 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Redirection InformationRedirection Information
• In general, redirection information should be passe d in SIP INVITEs using either the Diversion or Histor y-Info header
– Header will include a list of redirecting parties and reason
• In practice, Diversion/History-Info header informat ion can be lost, especially when traversing network boundaries (VoIP →TDM →Wireless →TDM →VoIP)
– Can’t be relied on to make protection decisions
28 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Is Looping/Fan-out Protection Available?Is Looping/Fan-out Protection Available?
• BroadWorks provides a full range of Call Policy cal l limits protection that can be applied at the system level, and customized at any of the sub levels (Enterprise/Group/User)
– Functionality introduced in 14SP1 under feature ID 33339 (with subset patched back in 13.0)
– 33339 needs to be activated and configured even if protection was enabled in 13.0 patch back
15
29 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Simultaneous Call ProtectionSimultaneous Call Protection• defaultMaxNumberSimultaneousCalls
– Protects against excessive calling situations • Count triggered on INVITE • Does not require 33339 activation• Separate parameter for video call control
– When maximum simultaneous calls hit:• Redirect to VM on termination• 403 Forbidden returned on originations
– bwUserExceededMaxSimultaneousCalls informational severity trap generated
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultMaxNumberSimultaneousCalls = 10
defaultUseMaxNumberSimultaneousCalls = true
defaultMaxNumberSimultaneousVideoCalls = 5
defaultUseMaxNumberSimultaneousVideoCalls = true
30 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
SIP Redirection Header Based ProtectionSIP Redirection Header Based Protection
• Automatic Loop Detection– Automatically detect redirection loops using SIP
Diversion/History-Info header information • On terminating INVITE to user, if Diversion/History-Info
header contains user’s number, block any redirection and “short-circuit” terminating to the user
• On redirection based on a user’s service, if the redirect-to number is present in the received Diversion/History-Info header then deny the redirection
– Functionality enabled by default (cannot be disabled)– bwForwardDestinationLoop informational severity trap
generated identifying calling and called party– One issue identified (EV87594): Loop Detection triggers
on user configured CLID (recommend you apply patch)
16
31 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
SIP Redirection Header Based ProtectionSIP Redirection Header Based Protection
• defaultMaxRedirectionDepth– Protects against redirection chaining by counting number
of redirections in Diversion/History-Info header and “short-circuits” call to the user if number > MaxRedirectionDepth
– Functionality enabled by default (cannot be disabled), replaced old MaxHops system parameter
– bwUserExceededMaxRedirectionDepth informational severity trap generated identifying calling and called party
– AS_CLI/Interface/SIP maxForwardingHops: has nothing to do with forwarding loops; controls max SIP message forwarding through proxies
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultMaxRedirectionDepth = 10
32 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
SIP Redirection Header Based ProtectionSIP Redirection Header Based Protection
• defaultMaxFindMeFollowMeDepth– Protects against redirection chaining of forking services
like SIMRING by counting number of reason=follow-me redirections in Diversion/History-Info header and “short-circuits” call to the user if number > MaxFindMeFollowMeDepth
– Controlled by feature 33339, disabled by default– bwUserExceededMaxFindMeFollowMeDepth
informational severity trap generated identifying calling and called party
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultUseMaxFindMeFollowMeDepth = true
defaultMaxFindMeFollowMeDepth = 3
17
33 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Service Call Count Based ProtectionService Call Count Based Protection
• defaultMaxNumberConcurrentRedirectedCalls– Protects redirection services from forwarding loops when
Redirection information is not being preserved• Diversion/History-Info header lost
– Counts number of concurrent redirections from all redirecting services and “short-circuits” call to the user if number > MaxNumberConcurrentRedirectedCalls
– Controlled by feature 33339, disabled by default– bwUserExceededMaxConcurrentRedirectedCalls
informational severity trap generated identifying calling and called party
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultUseMaxNumberConcurrentRedirectedCalls = true
defaultMaxNumberConcurrentRedirectedCalls = 3
34 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Service Call Count Based ProtectionService Call Count Based Protection
• defaultMaxNumberConcurrentFindMeFollowMeInvocations– Protects redirection services (e.g., SIMRING, Sequential
Ringing, Remote Office) from forwarding loops when Redirection information is not being preserved
• Diversion/History-Info header lost
– Counts number of concurrent redirections from all “follow-me” redirection services and will block the redirection if number > MaxNumberConcurrentRedirectedCalls
– Controlled by feature 33339, disabled by default– bwUserExceededMaxFindMeFollowMeInvocations
informational severity trap generated identifying calling and called party
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultUseMaxNumberConcurrentFindMeFollowMeInvocati ons = true
defaultMaxNumberConcurrentFindMeFollowMeInvocations = 3
18
35 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Virtual SubscribersVirtual Subscribers
• Currently, the looping and excessive calling protections outlined do not apply to virtual subscr ibers (e.g., call centers, voice portal)
– Exception: Auto Attendant nesting looping protection added in 14SP9 (65983)
• Can configure maximum number of re-entries into Auto Attendants to have a maximum number of loops/nestings
AS_CLI/Service/AutoAttendant> get
maxReentryForSameCall = 50
36 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Recommended System SettingsRecommended System Settings
AS_CLI/SubscriberMgmt/Policy/CallProcessing/CallLim its> get
defaultMaxNumberSimultaneousCalls = 10
defaultUseMaxNumberSimultaneousCalls = true
defaultMaxNumberSimultaneousVideoCalls = 5
defaultUseMaxNumberSimultaneousVideoCalls = true
defaultMaxCallTimeForAnsweredCallsInMinutes = 600
defaultUseMaxCallTimeForAnsweredCalls = false
defaultMaxCallTimeForUnansweredCallsInMinutes = 2
defaultUseMaxCallTimeForUnansweredCalls = false
defaultUseMaxNumberConcurrentRedirectedCalls = true
defaultMaxNumberConcurrentRedirectedCalls = 10
defaultUseMaxFindMeFollowMeDepth = true
defaultMaxFindMeFollowMeDepth = 3
defaultMaxRedirectionDepth = 10
defaultUseMaxNumberConcurrentFindMeFollowMeInvocati ons = true
defaultMaxNumberConcurrentFindMeFollowMeInvocations = 3
19
37 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Overload
38 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Overload Controls Key PointsOverload Controls Key Points
• BroadWorks provides extensive overload protection on the Application Server and Network Server
• Intelligent throttling for conventional overload– Based on processing delays for the main queues and
memory consumption
– Goal: maximize traffic while protecting the system• Priority given to existing calls over new calls
• Emergency Calls have configurable priority
• Call throttling more aggressive as overload increases
• Aggressive throttling in extreme overload– Based on maximum queue size and encoder/decoder
queue delay
20
39 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Conventional Overload ControlsConventional Overload Controls• Overload Controls specific to traffic type
– Call Processing related traffic• SIP: INVITE, SUBSCRIBE, NOTIFY, etc…• MGCP: All
– Non-Call Processing related traffic• SIP: REGISTER, MESSAGE, OPTIONS• MGCP: None
• Triggers and actions based on traffic type
– e.g. REGISTER storm would throttle Non-Call Processing traffic without triggering a Call Processing overload condition
– Configurable resulting actions• SIP: Ignore, 302 Temporarily Moved, or 503 Service Unavailable• MGCP : Ignore, 409 Processing Overload
• Normal (green) zone plus 2 overload zones (yellow a nd red)
– Increasing level of traffic throttling if condition deteriorates
40 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Extreme Overload ControlsExtreme Overload Controls
• Protection invoked at the low-level queues– SIP and MGCP, Decoder and Encoder queues
– Limit placed on the overall size of each queue
– Discard based on configurable time in queue
• Age or queue size based message discard– Stale messages discarded from the queue
– Newer messages added to the queue
– System protection is key
21
41 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
BroadWorks Queue ArchitectureBroadWorks Queue Architecture
MGCPDecodeQ
SIP - CallPDecodeQ
Call HalfInput Adapter
CallP ThreadedDBAccesQueue
Non-CallP Input Adapter
MGCPEncodeQ
SIPEncodeQ
AccountingOutput Adapter
Port5060
Port2427
Voice MailInput Adapter
Primary Call Processing Queues and Threads
Background activity Queues and Threads
SIP – Non-CallPDecodeQ
Worker Thread
Queue
MGCP
SIP
42 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Overload Controls Invocation PointsOverload Controls Invocation Points
CallP ThreadedDBAccesQueue
AccountingOutput Adapter
Voice MailInput Adapter
Primary Call Processing Queues and Threads
Background activity Queues and Threads
Call HalfInput Adapter
Non-CallP Input Adapter
MGCPDecodeQ
SIP - CallPDecodeQ
SIP – Non-CallPDecodeQ
MGCPEncodeQ
SIPEncodeQ
Port5060
Port2427
Worker Thread
Queue
MGCP
SIP
•Extreme Overload Controls invoked on
Decoder and Encoder queues
MGCPDecodeQ
SIP - CallPDecodeQ
SIP – Non-CallPDecodeQ
MGCPEncodeQ
SIPEncodeQ
Call HalfInput Adapter
Non-CallP Input Adapter
•Conventional Overload Controls invoked on Input
Adapters
22
43 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Conventional Overload State TransitionConventional Overload State Transition
•Transition from Green to Yellow to Red based on configurable criteria
•Orderly, configurable back-off to eliminate ping-pong effect between zones
Engineered CallP Capacity
Criteria to Enter Yellow Criteria to Enter Red
Criteria to Leave Yellow Criteria to Leave Red
Engineered Non-CallP Capacity
Criteria to Enter Yellow Criteria to Enter Red
Criteria to Leave Yellow Criteria to Leave Red
•Separate overload controls for Call Processing related traffic and Non-Call
processing related traffic
44 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Conventional Overload ActionsConventional Overload Actions
Call Related50% new calls actioned
0% existing calls actioned100% new calls actioned
0% existing calls actioned
Non-Call Related Forced to Yellow Forced to Red
Stale Messages All queue timers halved All queue timers halved
Subscriber Rollbacks Suspended Suspended
SIP RetriesT1 is doubled
5 second session completionT1 is quadrupled
5 second session completion
Logs Protocol level debug only All debug logs suspended
Misc. maintenance actions
PM Reporting suspendedMaintenance Scripts suspended
Access Device Monitoring suspendedIP device reset suspended
PM Reporting suspendedMaintenance Scripts suspended
Access Device Monitoring suspendedIP device reset suspended
Yellow Red
Call Related No Actions No Actions
Non-Call Related100% OPTIONS discarded
50% new REGISTER actioned100% OPTIONS discarded100% REGISTER actioned
REGISTER expirationExpiration 2x the time since green zone
New sessions completed in 5 secsExpiration 2x the time since green zone
No new sessions will be created
Stale Messages Non-CallP queue timers halved Non-CallP queue timers halved
Cal
l Pro
cess
ing
Non
-Cal
l P
roce
ssin
g
23
45 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Overload Control ConfigurationOverload Control Configuration
AS_CLI/System/OverloadControls> get
enabled = false
mgcpOverloadAction = error
sipOverloadAction = error
percentMemoryInUseToEnterYellow = 85
percentMemoryInUseToEnterRed = 90
percentMemoryInUseToLeaveYellow = 85
percentMemoryInUseToLeaveRed = 90
allowEmergencyCallsInOverload = true
maxPacketAgeInMsecs = 3000
maxPacketAgeDuringOverloadInMsecs = 1500
AS_CLI/System/OverloadControls/CallP> get
sampleSize = 100
minTimeInZoneInMsecs = 30000
delayInMsecsToEnterYellow = 1150
delayInMsecsToEnterRed = 1350
delayInMsecsToLeaveYellow = 1050
delayInMsecsToLeaveRed = 1250
AS_CLI/System/OverloadControls/NonCallP> get
sampleSize = 100
minTimeInZoneInMsecs = 120000
delayInMsecsToEnterYellow = 1001
delayInMsecsToEnterRed = 1201
delayInMsecsToLeaveYellow = 1000
delayInMsecsToLeaveRed = 1200
callpQDelayInMsecsToEnterYellow = 700
callpQDelayInMsecsToEnterRed = 750
callpQDelayInMsecsToLeaveYellow = 600
callpQDelayInMsecsToLeaveRed = 701
• Overload Controls require configuration– See BroadWorks System Configuration Document
46 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
So What Is The Problem?So What Is The Problem?
Sounds like turning on Overload Controls on the AS/NS is a good thing and I should do it ASAP
• On a well running system, Overload Controls will provide you protection against external driven even ts
– Traffic spikes, Registration storms
• On poorly performing systems, Overload Controls wil l potentially trigger as a result of the poor perform ance
– Primary queue delays that are the result of things like DNS delays or lack of CPU resources are not distinguished from delays that are the result of excessive external traffic
24
47 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Gauging Server PerformanceGauging Server PerformanceHow do I know if my server is performing poorly?
• Performance problems are going to express themselves in delays which can be monitored
– bwThreadDelayDetected traps: this trap is generated whenever a Call Half Input Adapter thread is delayed by >= 2.5 sec
• If you are getting these traps frequently, you have problems
– bwSipStatsMaxSetupSignalDelay gauge: this gauge tracks the max call setup time and encompasses all delays
• Good indicator on general performance of the system• EMS tracks and resets every 15 min• If all readings <1 sec, there should be no issue with enabling
Overload Controls with default threshold settings
48 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Should I Turn On Overload Controls?Should I Turn On Overload Controls?
My server has some pretty bad queue/call delays, I guess I can’t turn on Overload Controls
•Overload Controls can be safely enabled on poorly performing systems through proper tuning
– General OC configuration requires tuning sample sizes based on server maximum traffic levels
• Recommendation is to go with the default threshold delay settings
– CallP and Non-CallP yellow and red delay thresholds can be increased to allow for a certain level of delay
• For example, If you are consistently seeing primary queue delay maximums > 1 sec, < 2 sec, you could 2x all OC delay thresholds
25
49 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Determining The Trigger Determining The Trigger
• Alarms will be generated whenever a server transitions from one zone to another
– bwCallOverloadZoneTransition– bwNonCallOverloadZoneTransition
• Identifying the trigger that caused overload is mor e art than science
– Internal Driver: DNS delay, lack of resource (CPU), bad DB query
• Generally requires BroadSoft support in analyzing XSOuputXX.log file (thread dump analysis)
– External Driver: Call Looping scenario, traffic flood• Generally need to parse through the log files to see what
occurred before/during/after the event• New bwTrafficParser script can help
50 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
bwTrafficParser ScriptbwTrafficParser Script
• BwTrafficParser script will extract traffic information from the XS Log files providing an analysis of traffic flow
– Will identify traffic based on SIP method, source/destination IP/port, internal session ID, DN/userid
– Script will be sourced in release 16, patched in 14SP9, and available for download on BroadSoft Xchange for earlier releases
bwadmin@IHApp$ ./bwTrafficParser
Must specify one or more log files.
Usage:
bwTrafficParser f -p -r -t <number> <log file(s)>
List of possible option:
-f : prints full output (instead of top 10)
-m <regexp> : prints out log file entries matching regular expression
-p : narrows in address information by port
-r : displays rate-based information
-t <number> : prints top <number> items instead of top 10
The script parses up to 5 log files.
When using the -r option the files should be specified in time order.
26
51 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
bwTrafficParser ScriptbwTrafficParser Script
bwadmin@IHApp$ ./bwTrafficParser
Must specify one or more log files.
Usage:
bwTrafficParser f -p -r -t <number> <log file(s)>
List of possible option:
-f : prints full output (instead of t op 10)
-m <regexp> : prints out log file entries matc hing regular expression
-p : narrows in address information b y port
-r : displays rate-based information
-t <number> : prints top <number> items instea d of top 10
The script parses up to 5 log files.
When using the -r option the files should be specif ied in time order.
• BwTrafficParser script will extract traffic information from the XS Log files providing an analysis of traffic flow
– Will identify traffic based on SIP method, source/destination IP/port, internal session ID, DN/userid
– Script will be sourced in release 16, patched in 14SP9, and available for download on BroadSoft Xchange for earlier releases
52 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Enhancements to Use•Platform Enhancements•BroadWorks Hardware Support Policy
27
53 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Platform Related EnhancementsPlatform Related Enhancements
• New BroadWorks platform related enhancements added since Release 14SP6
– Pre-Upgrade Validation Tool
– Installation Patch Bundle (14SP7)
– Configurable MS interface (14SP8)
– Configurable TimesTen Replication Port (14SP9)
– New OS Support• RHEL 5.x support (14SP9)• Solaris 10 x86_64 support (15.0)
– EMS Threshold Rework (16.0)
– Improved TimesTen DB Migration Time (16.0)
54 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Pre-Upgrade Validation ToolPre-Upgrade Validation Tool• New tool that should be run prior to any upgrade; it will
– Validate supported upgrade paths
– Check system configuration attributes• Disk space, system variables, ssh configuration, etc
• Release/OS independent tool– Download latest version for various target releases from
BroadSoft Xchange• e.g.: bw-preUpgradeCheck-Rel_15-95158.bin
is for any source release looking to upgrade to any R15 release
28
55 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Installation Patch Bundle (14SP7)Installation Patch Bundle (14SP7)
• Installing a new BroadWorks version now requires a valid Installation Patch (IP) bundle
– IP bundle updated regularly when new install/upgrade patch-worthy identified
– Customer should always get the latest IP
– IP bundle installed at same time release .bin file is run• # ./ AS_Rel_15.0_1.285.Linux-x86_64.bin –patch
/bw/install/IP.as.15.0.285.ip20080616.Linux-x86_64.tar.gz
• IP bundles can also include (and automatically inst all) critical Application Patches (AP)
– Ensure that patches deemed critical are present at upgrade
56 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Configurable MS Interface (14SP8)Configurable MS Interface (14SP8)
• AS uses the publicIPAddress in the HTTP URL when signaling the MS for playing media files, and not t he address defined for media via config-network script
– New appserver.properties parameter (bw.http.mediaif) that allows MS apache interface to be bound to any of the available AS interfaces
• Defaults to the AS public address• Controlled via MS interface settings as part of the config-
network script
29
57 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Configurable TimesTen Replication Port (14SP9)Configurable TimesTen Replication Port (14SP9)
• By default, TimesTen will select a random TCP port (>32K) to use for replication
– Number of customers have expressed that this was a security concern
• New functionality allows for static port allocation– New installs will prompt user for replication port setting
– Existing install can move to fixed port via CLI config
AS_CLI/System/Peering> get
portNumber = random
58 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
New OS SupportNew OS Support
• RHEL 5.x Linux Support– Available release 14SP9+
• Solaris 10 x86_64 Support– Available release 15.0+
– Means Solaris can be used on any Intel x64 based server (e.g. IBM)
30
59 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
EMS Threshold Rework (16.0)EMS Threshold Rework (16.0)• EMS incorporates monitored object threshold that ca n produce
alarms or health summary changes when surpassed– Based on field data, thresholds have been revisited
– Thresholds automatically selected based on automatic server size discovery
• Based on server resources (e.g., CPUs, Memory), correct thresholds selected
– Manual threshold modification simplified
– A number of monitored objects dropped from the health summary check
• Focus on the core basics
• New monitored objects added to better match performance/growth monitoring recommendation
– XS JAVA Heap
– Database Size
– True AS user count (not including virtual subscribers)
60 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Improved TimesTen DB Migration Time (16.0)Improved TimesTen DB Migration Time (16.0)
• As part of a BroadWorks upgrade, there are a number of database backup/restores that are performed; for a large database, these restores can take an extremel y long time
– e.g.: 60+ minutes for a 84K DB on a SUN T2000
• Upgrade enhanced to optimize the number of DB restores
– Eliminate unnecessary restores when the TimesTen database version has not changed
– Significant upgrade times savings (hours)
31
61 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
BroadWorks Hardware Support Policy
62 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Hardware/OS Support Policy: HistoryHardware/OS Support Policy: History
• Pre-Release 13: SUN SPARC-based servers only
– Required Solaris SPARC operating system
– Limited subset of servers supported (e.g., v24x, V44x, T2000)
• Release 13: IBM Intel Xeon-based server support added
– Required Linux operating system
– Limited subset of servers supported (x336, x3550, HS20, HS21)
• Release 15: SUN Intel Xeon-based servers added
– Solaris x64 operating system
– Limited subset of servers supported (e.g., x4150, x4250, x6250)
32
63 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Hardware/OS Support Policy: EvolutionHardware/OS Support Policy: Evolution
• Restricted server/vendor support a frustration to customers
– Slow turn around in getting new servers added to the “supported” list
– Some customer preferred vendors completely shut out (e.g., HP)
• Supported server list continuously growing over tim e
– Since BroadSoft rarely deprecates servers, number of servers requiring periodic performance testing increased year-by-year
• Ubiquitous OS support increases deployment combinat ions
– Intel-based x64 servers can run either Solaris or Linux
• Above issues have led BroadSoft to loosen up its ha rdware and OS support policies
64 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
New Hardware Support PolicyNew Hardware Support Policy
• BroadSoft platform support focuses on 2 processor t ypes
– Intel-based Xeon family
– UltraSPARC family
• Hardware still categorized by server size (e.g., Sm all, Medium, Large, Large – High Performance) which map back to capacity numbers
– Server-sizes are CPU specific (Intel Small and UltraSPARC Small have different capacity numbers
– Server-sizes map to minimum resource requirements (CPU, Memory, HDD)
• Introduction of new hardware classification with di fferent support connotations
– Preferred, Supported, Compatible, Legacy, Lab
33
65 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Preferred Server CategoryPreferred Server Category
• List of Sun and IBM platforms using the preferred Intel Xeon-based CPU which architecturally is best suited for BroadWorks Call Processing applications
– Currently focused on the Xeon 5400 family of CPU, but list will continue to evolve
• Xeon 5500 family coming soon
– BroadSoft validates these platforms in the lab and continues to do so from release to release
– BroadSoft performs performance benchmarking on these servers to ensure that they perform to rated numbers provided in the BroadWorks System Capacity Planner
• These are the servers that BroadSoft recommends the customer use
66 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Supported Server CategorySupported Server Category
• List of Sun platforms using the “supported” UltraSPARC-based CPU
– Currently no plan to validate any next-gen UltraSPARCs
– BroadSoft validates these platforms in the lab and continues to do so from release to release.
– BroadSoft performs performance benchmarking on these servers to ensure that they perform to rated numbers provided in the BroadWorks System Capacity Planner
• UltraSPARC processor’s family (i.e., T1 and T2) do not fit well with the BroadWorks application, and as su ch, is considered a supported, but not preferred CPU by BroadSoft
34
67 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Compatible Server CategoryCompatible Server Category
• The Compatible category applies to any Intel Xeon-based platform that is not on the Preferred platforms list.
– Compatible servers can be used in lieu of Preferred servers • Must meet or surpass the Preferred platform minimum CPU speed
and the minimum hard disk drive (HDD) requirement, and are equipped with the required amount of memory for a given server size
– BroadSoft does not provide any guarantee that there will be no platform interaction between our application and the Compatiblecategory server.
• Although unlikely, there is the possibility that certain platform-level operations (e.g., installation, upgrades, patching, licensing) might experience issues
• BroadSoft can provide a BroadWorks Platform Compatibility Test Plan that can be used to validate basic functionality on the Compatible server
68 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Compatible Server CategoryCompatible Server Category
– BroadSoft does not provide any guidance on capacity• The Compatible server is not part of the regular performance
benchmarking/validation process. » In general, since the Compatible server has the same hardware footprint
as the Preferred server for a given server size, the Compatible server capacity should track to the Preferred server
• BroadSoft’s performance validation scripts and databases are available to any customer wanting to perform a performance benchmarking of their Compatible server
• Customers can engage BroadSoft Professional Service to perform functional and performance validation of Compatible servers
35
69 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Legacy Server CategoryLegacy Server Category
• Any server that was once considered Preferred or Supported but is no longer documented, and has not been officially deprecated
– Examples of valid Legacy Servers: IBM x336, IBM HS20
• Legacy category servers can still be used with BroadWorks and are basically equivalent to a Compatible server type from a support perspective
• BroadSoft does not provide any guidance on capacity for Legacy category servers
– BroadSoft no longer actively performs performance validation on Legacy servers
70 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Lab Server CategoryLab Server Category• Any UltraSPARC-based or Intel Xeon 5000 based platf orm can be
considered a Lab category server as long as it meets the lab server minimum requirements
– Ideally, lab servers should be of the same platform type used in production (that is, a server from the Preferred, Supported, or Compatible category)
• Lab category servers are supported with the following c aveats:– BroadSoft does not provide any guarantee that there will be no
platform interaction between BroadSoft’s application and the Labcategory server
• Although unlikely, there is the possibility that certain platform-level operations (for example, installation, upgrades, patching, licensing) might experience issues
– BroadSoft does not provide any guidance on capacity for Labcategory servers.
36
71 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Which OS can I use?Which OS can I use?
• We now support Solaris SPARC, Solaris x64 and Linux– Solaris SPARC is only applicable to SUN SPARC-based servers
– Solaris x64 and Linux can run on ANY x64 server (SUN, IBM, or Compatible
• From a BroadSoft perspective, our general preferred OS is Solaris, but it really is a customer choice as to w hich to use based on support
– Linux RHEL 4 WS: Rel 13, 14, 15
– Linux RHEL 5 Server: Rel 14sp9+
– Solaris 10 x64: Rel 15.0+
– Solaris 9 SPARC: Rel 13, 14, 15
– Solaris 10 SPARC: Rel 14, 15
72 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Hardware Deprecation PolicyHardware Deprecation Policy
• In general, BroadSoft does not deprecate supported hardware unless there is a functional reason
– Only the SunFire V12x and Netra 12x have been officially deprecated
– Any hardware deprecations would be accompanied by a BroadSoft Support Alert
• BroadSoft’s approach is natural evolution – As a hardware ages, customer will naturally replace it with
newer hardware and port the BroadWorks instance– Hardware platforms evolve within BroadSoft by moving
from the Preferred, or Supported category to Legacy category
37
73 2009 BroadSoft®, Inc. Proprietary and confidential; do not copy, duplicate, or distribute
Documentation ChangesDocumentation Changes
• BroadWorks Recommended Hardware Guide has been update to reflect the policy change
– Document slimmed down to under 20 pages
– All references to part numbers removed• Impossible to keep up with current and valid info• Vendor/Distributor is the proper place for BoM creation based
on our minimum requirements
Corporate Headquarters220 Perry Parkway
Gaithersburg, Maryland 20877Tel. +1 301.944.9770 www.broadsoft.com