mirnet administrative data analysis system (madas) greg cole, natasha bulashova friends &...
TRANSCRIPT
MIRnet Administrative Data MIRnet Administrative Data Analysis System (MADAS)Analysis System (MADAS)
Greg Cole, Natasha Bulashova
Friends & Partners
NCSA
DescriptionDescription
System converts netflow data into structured data stored in a series of relational database tables
System provides means of browsing summary statistics in graphic and table format
A work in progress since 1998; first version in summer of 1999, second in fall of 2000 (for HPIIS review), third in February 2001
http://www.friends-partners.org/madasd/
FOR MORE INFO...
DescriptionDescription
141.142.121.5|193.233.46.21|3130|3130|UDP-Other|55|6349|2|979067306|979067523193.233.46.21|141.142.121.5|3130|3130|UDP-Other|55|6569|2|979067306|979067523198.32.1.116|193.233.82.3|53|3271|UDP-DNS|1|482|1|979067480|979067480195.208.55.40|194.81.150.167|63499|80|TCP-WWW|2|96|1|979067547|979067550194.226.45.8|193.0.72.16|53|35432|UDP-DNS|2|634|1|979067717|979067721195.208.55.40|194.81.150.168|63500|80|TCP-WWW|2|96|1|979067547|979067550195.208.55.40|194.81.158.128|61492|80|TCP-WWW|2|96|1|979067677|979067680194.226.65.17|128.61.81.129|51270|21|TCP-FTP|6|360|3|979067720|979067781194.226.65.17|128.61.81.129|51271|21|TCP-FTP|6|360|3|979067720|979067781195.19.10.238|18.72.1.2|0|2048|ICMP|1|1500|1|979067753|979067753195.208.55.40|194.81.150.169|63501|80|TCP-WWW|2|96|1|979067547|979067550193.233.46.21|141.142.121.5|3143|3128|TCP-Other|5|1486|1|979067620|979067620141.142.121.5|193.233.46.21|3128|3143|TCP-Other|5|1043|1|979067620|979067620195.208.55.40|194.81.158.129|61493|80|TCP-WWW|2|96|1|979067677|979067680195.208.55.40|194.81.150.170|63502|80|TCP-WWW|2|96|1|979067547|979067550212.192.244.68|193.0.0.193|1024|53|UDP-DNS|1|71|1|979067714|979067714
ProcessProcess
Aggregate netflow data from Router Load into primary database tables Update summary tables Update “heap” tables Wait 10 minutes (and do it again)
Primary IPheaders tablePrimary IPheaders table*************************** 1. row *************************** ip_source: 193.233.46.3 ip_destination: 152.3.233.71 port_source: 40C-45Cport_destination: 25 protocol: TCP-SMTP packets: 199 octets: 285413 flows: 1 timestart: 2000-08-28 22:50:21 timeend: 1999-09-08 06:18:09 channel: BE periodbegin: 1999-09-08 06:11:49 periodduration: 600 keyid: 2 domain_source: 42 domain_dest: 28*************************** 2. row *************************** ip_source: 195.208.220.5 ip_destination: 128.148.55.233 port_source: 80port_destination: 1K-2K protocol: TCP-WWW packets: 11 octets: 11128 flows: 1 timestart: 2000-08-29 18:39:41 timeend: 1999-09-08 06:20:52 channel: BE periodbegin: 1999-09-08 06:11:49 periodduration: 600 keyid: 3 domain_source: 9 domain_dest: 125
All network flows must meet minimum traffic threshold to be included in live database (for MIRnet, this is set to 10K)
Lose 3% of total traffic volume but reduce 95% of records
All data kept in archives Currently maintains
17,000,000+ network flow records (June 1, 2001)
Primary DNSdata tablePrimary DNSdata table
+----------------+---------------------------+----------------+----------------+-----------+| ip_address | ip_name | createtime | modifytime | ip_domain |+----------------+---------------------------+----------------+----------------+-----------+| 128.178.16.37 | icpmac12.epfl.ch | 20010110104036 | 00000000000000 | 6203 || 156.17.180.31 | budm31.ar.wroc.pl | 20010110104036 | 00000000000000 | 3232 || 62.32.36.134 | ip134-tpas-1.ti.net.ge | 20010110104032 | 00000000000000 | 6131 || 194.82.81.146 | dyn081-146.stanmore.ac.uk | 20010110104029 | 00000000000000 | 9760 || 194.83.11.34 | gosh-atm.ex.ac.uk | 20010110104026 | 00000000000000 | 9488 || 194.81.127.202 | 194.81.127.202 | 20010110104025 | 00000000000000 | 2 || 194.81.174.83 | 194.81.174.83 | 20010110104025 | 00000000000000 | 2 || 195.25.253.130 | 195.25.253.130 | 20010110104024 | 00000000000000 | 2 || 194.80.105.9 | paul.cvcp.ac.uk | 20010110104024 | 00000000000000 | 9456 || 194.81.127.113 | 194.81.127.113 | 20010110104023 | 00000000000000 | 2 || 131.114.187.5 | endo1.endoc.med.unipi.it | 20010110104023 | 00000000000000 | 6214 || 193.99.163.9 | 193.99.163.9 | 20010110104022 | 00000000000000 | 2 || 194.80.104.23 | 194.80.104.23 | 20010110104022 | 00000000000000 | 2 || 194.80.104.3 | 194.80.104.3 | 20010110104022 | 00000000000000 | 2 || 194.81.33.48 | imb.hope.ac.uk | 20010110104021 | 00000000000000 | 9526 |+----------------+---------------------------+----------------+----------------+-----------+
Currently maintains 806,431 DNSdata IP records (January 10, 2001)
Primary Domains tablePrimary Domains table*************************** 1. row *************************** domainid: 715 domainname: anl.gov latitude: 41.858 longitude: -88.017domainlabel: Argonne Natl Lab createtime: 20010103224037 modifytime: 20001227191828 origin: US shortlabel: Argonne Natl Lab location: pdomainid: 715 rdomainid: 715 loccity: Chicago locstate: IL loccountry: United States orgclass: US Government,US Govt DOE worldclass: North Americaregionclass: USA Great Lakes
*************************** 2. row *************************** domainid: 948 domainname: doe.gov latitude: 38.892 longitude: -77.017domainlabel: US Department of Energy createtime: 20001227170946 modifytime: 20001227170946 origin: US shortlabel: US-DOE location: Washington, DC pdomainid: 948 rdomainid: 948 loccity: Washington locstate: DC loccountry: United States orgclass: US Government,US Govt DOE worldclass: North Americaregionclass: USA Atlantic Central
Heart and soul of MADAS system
Adding new “intelligence” to this database enables entirely new classes of analysis
Currently maintains 11,771 domain records (January 10, 2001)
Other Primary TablesOther Primary Tables
IP Today (last 24 hours of ipheaders records)
Country Codes Parent domains Color mappings
+------+--------------------------+---------------+| code | country | worldclass |+------+--------------------------+---------------+| ?? | Unknown | Unclassified || AC | Ascension Island | Other || AD | Andorra | Europe || AE | United Arab Emirates | Middle East || AF | Afghanistan(Islamic St.) | Middle East || AG | Antigua and Barbuda | North America || AI | Anguilla | Other || AL | Albania | Europe || AM | Armenia | Middle East || AN | Netherland Antilles | Other |+------+--------------------------+---------------+
+----------+-------------+| parentid | parentname |+----------+-------------+| 1308 | ac.jp || 3 | ac.ru || 959 | ac.uk || 986 | edu.tw || 6 | free.net || 735 | nasa.gov || 41 | nlanr.net || 4762 | ircache.net || 100 | ras.ru |+----------+-------------+
+-------+---------+| code | value |+-------+---------+| ?? | pink || CA | lblue || CH | purple || DE | lbrown || DK | green || EE | dgray || FI | white || FR | cyan || IL | gold || IT | lred || JP | dpink || NL | lpurple || NO | gray || Other | lyellow || PL | orange || RU | blue || SE | lgray || TW | yellow || UK | marine || US | lgreen |+-------+---------+
CapabilitiesCapabilities
With these tables (updated every 10 minutes), we can provide all sorts of live (and historical) traffic analysis between world regions, countries, country regions, cities, institutions, organizations, network protocols by year, month, day, hour, minute, . .
But . .
Need to use Indexed Summary TablesNeed to use Indexed Summary Tables
Database “mirsum” 8 tables updated live every
10 minutes 2 “Heap” (RAM-based)
tables used for most live queries
Pre-query “optimizer” selects best tables for current query
Domain_date_proto Domain_date_proto_mm Domain_date Domain_date_mm Country_date_proto Country_date_proto_mm Country_date Country_date_mm
Heap_domain_date_proto Heap_domain_date_proto_mm
A word about technologiesA word about technologies
No proprietary softwareMysql for databasePHP for query interfaceWeb/CGI for stats interfacePerl for code/CGI base– DBI for interaction with Mysql– GD::Graph graphics libraries
Perl Code (object-oriented)Perl Code (object-oriented)
Analysis that in original MADAS system took 400-500 lines of perl code, now looks like:
#### 2 ########## # chart showing total volume with breakdown by top countries my $self = MADAS::Country->new( database => "mirsum", table => "domain_date", variable => "origin_dest", imagemapcgi => "/cgi-bin/madas/printtable.pl", imagemap => 0, percent => 1, graphtype => "bars", title1 => "Total MIRnet Traffic Flow by Destination Country", rh_input => \%in); $self->set_title2("Period: <b>" . $self->get_timebegin . "</b> - <b>" . $self->get_timeend . "</b>"); $self->doit();
DemonstrationDemonstration
World Regions (by country)
Countries (by domain)
US Regions
Russian Regions
US Government
DOE NASA
DOD
AdvantagesAdvantages
Higher-level analysis of network usage (“not just for engineers”)
System encourages “exploration”Better understanding of ‘users’ and
their applicationsImmediate feedback on traffic
problems/issues
Future PlansFuture Plans
Evaluate shared use of Domains and DNSdata tables (perhaps via LDAP)
Standard monthly and quarterly reports of traffic utilization
“Monster” query“Project” level accounting/analysis
more . . .
Future Plans (continued)Future Plans (continued)
Create always-running “server” to maintain data, provide “instant stats”, manage web site/interface
Provide statistical analysis routinesCreate database to maintain all
“global” settingsPort-level analysis (looking for
“napster”, etc.)more . . .
Future Plans (continued)Future Plans (continued)
Explore integration/sharing with HPIIS projects (others?)
Develop data maintenance applications for Domains database
Develop ‘world-map’ graphics applications
more . . .
Future Plans (continued)Future Plans (continued)
Develop “partnerships” analyses (looking at domain-domain and machine-machine partnerships)
Add additional “organizational” classes (i.e., “US Govt DOE”, “University”)
Add state-level analysesClean-up/refine Domains database
more . . .
Future Plans (continued)Future Plans (continued)
Add “science” classifiers and “project” identifiers to regular traffic flows
Integrate this with database describing high performance network science applications
Integrate back-end reporting with front-end reservation system
Future plans (continued)Future plans (continued)
Authentication system for machine-level inquiry/analysis
Device independent display of usage (for text-only, email, WAP devices)
Handle IP address cache expiration problem
Etc. . . .