SURFnet6 Network Monitoring and ReportingHans Trompert, SURFnet
Information needs
Connected organizations
NOC / SURFnet / research
Annual report
Info
rmati
on
deta
il
Monitoring versus Reporting
- Monitoring- real-time- status- alarms
- Reporting- afterwards - over a specific time period (day, week, month,
year)
Information source and destination
AviciSSR
NortelERS8600
NortelOM5200
NortelOME6500
NortelOME1060
SURFnet6 operations
Real-time customer reporting
Security
Equipment and interface
Optical devices CPL TL1
OM5200 TL1 (+ SNMP)
OME6500 TL1 (+ SNMP)
OME1060 SNMP
Data devices ERS8600 SNMP
Avici SSR SNMP + Netflow
Reporting: SNMP metrics
SNMP metrics:- Interface in/out octet counters- Interface in/out packet counters
(unicast/broadcast/multicast)- Interface input/output errors- Interface availability- Temperature- Memory- CPU- Device uptime- and more …
Reporting: TL1 metrics
TL1 metrics:- Input/Output Frames - Errored frames- Discarded frames- Transmit and receive power levels- Errored Seconds - number of seconds that have had
CRC errors- Severely Errored Seconds - after 10 seconds of ES
we start counting SES- UnAvailable Seconds - Seconds where we had no
sync- and more …
Monitoring: SNMP traps
SNMP traps- Fan- Temperature - Voltage- Link Up/Down- Bay Controller - Module - PIM + MSDP - BGP- VRRP- ISIS- and more …
Monitoring: TL1 events
TL1 Events- Equipment
- Circuit pack missing/mismatch/failed- Fan failed/missing- Power failure A or B- High temperature
- Shelf- Software upgrade failed/mismatch/….- Database integrity fail/restore in progress/…
- Amplifier- input/output loss of signal- automatic shutoff
- and many, many more
SNMP based volume reporting
Internet
Connected organizations
Border routerAmsterdam1
(SARA)
Border routerAmsterdam2(TeleCity II)
Core routerAmsterdam2(TeleCity II)
Core routerAmsterdam1
(SARA)
-Total external traffic-Per traffic class (AMS-IX, Global, privat peers)-Per provider/peer
-Total SURFnet internal traffic-Per connected organization
SURFnet external traffic volume
- SURFnet external traffic volume- Ams-IX- Private peers (via Ams-IX), including:
- Chello, Tiscali, @Home, Planet, XS4all- Garnier Projects, Abovenet , UUnet, Cogent
- NREN- Geant2- SINET- Abilene
- Global- Global Crossing- Cable & Wireless
SURFnet external traffic volume
SURFnet extern verkeer - januari 1999 t/m december 2006
0
500
1.000
1.500
2.000
2.500
jan-9
9jul
-99
jan-0
0jul
-00
jan-0
1jul
-01
jan-0
2jul
-02
jan-0
3jul
-03
jan-0
4jul
-04
jan-0
5jul
-05
jan-0
6jul
-06
TiB
TiB In TiB Uit
SURFstat: Real-time connected organization traffic volume reporting
- Software- Net-SNMP- Python- RRDtool
- Features- Easy administration by labeling connections with
keywords in interface description on router- Different graph resolutions: day, week, month,
year, decade- 1 minute measurement interval
- Reports on- volume (bits in/out)- packets (unicast/multicast/broadcast)
SURFstat: UvA (many users)
SURFstat: CWI (few users)
Netflow – flow information
- Netflow uses the common 5-tuple definition, where a flow is defined as a unidirectional sequence of packets all sharing all of the following 5 values:
1. Source IP address2. Destination IP address3. Source TCP port4. Destination TCP port5. IP protocol
- Most common fields in Netflow record:- 5-tuple information- Input and output SNMP interface index- Timestamps for the flow start and finish time- Number of bytes and packets observed in the flow
Netflow – versions
v1 First tryv5 Most used versionv6 Encapsulation informationv7 Switch informationv8 Several aggregation formsv9 Template Based, allowing many
combinations, supports IPv6IPFIX aka v10; IETF Standardized NetFlow 9
with Enterprise fields and other community input
Netflow setup
Internet
Connected organizations
Border routerAmsterdam1
(SARA)
Border routerAmsterdam2(TeleCity II)
Core routerAmsterdam2(TeleCity II)
Core routerAmsterdam1
(SARA)
FLOWmon
perfSONAR
test
NFSEN
PeakFlow
Fan out
Netflow applications
- connected organizations:- FLOWmon
detailed traffic reporting- SURFflow (Arbor Peakflow / NFSEN)
suspicious traffic pattern reporting- SURFnet-CERT:
- NFSENsuspicious traffic pattern reportinghistorical flow data queriesprofiles for custom reports
- Geant2 JRA1 perfSONAR probes- Flow Subscription Measurement Point- Flow Selection and Aggregation Measurement
Archive
FLOWmon
Detailed traffic reporting:- total traffic- prefix-based flow grouping- reports on:
- IP version (v4/v6)- IP protocol (TCP, UDP, ICMP, GRE, …)- TCP port (HTTP, SMTP, NNTP, FTP, SSH, …)- UDP port (domain, RTSP, VPN, …)
- top N connected organizations- destination AS traffic
UvA traffic by IP protocol
Connected organization to world traffic by TCP destination port
SURFflow
Reports on suspicious traffic patterns like:- Unusual amount of flows DOS attack- Flows from one host to many ports on other host
portscan- From 1 host to same port on many hosts break-in
attempt making use of known bug- From many hosts to specific (set of) port(s) to many
other hosts virus/worm- etc …
Active measurements: RTTPL
Round Trip Time and Packet Loss monitoring- measurement probes throughout the network- central storage of results- active measurements by injecting ICMP echo
request packets- measuring min/max/avg RTT and jitter
- both IPv4 and IPv6- both unicast and multicast (under development)
- measuring packet loss - 20 pings per minute- report matrices per minute/hour/day/month- results between two probes in graphs
RTTPL report matrices
RTTPL Nijmegen - Amsterdam
Active measurements: Connected organization availability
- measuring availability by sending ICMP Echo Requests to connected organization router
- measurement includes last mile to connected organization plus connected organization router port (unlike commercial providers)
- Cisco routers with Service Assurance Agent software on both Amsterdam1 and Amsterdam2
- results stored in database and reported monthly- redundancy in measurements by ORing results
from Amsterdam1 and Amsterdam2
Thank you