I recently did a lab demo where I illustrated the dangers of sloppy redistribution policies between different routing protocols (BGP and OSPF). I pumped about 30,000 BGP NLRIs into OSPF, and then formed a redistibution loop and got some really awesome route oscillation going on. This got me wondering just how many routes I could pump into OSPF, and what would happen when the lid on the LSDB overflowed.
I was planning on pumping a dump of a semi-current Internet routing table of about 400,000 prefixes I converted into static routes into OSPF, but then I found an easier and "more scalable" solution when I came across a really cool program called exaBGP. In a very brief summary, exaBGP is a BGP route announcer. The really cool thing about it, is you can use scripts to announce and withdraw routes dyanmically. I coded up a Python script to generate a bogus IPv4 prefix, and used this in conjunction with a shell script and exaBGP to advertise prefixes to a BGP router, who will then sloppily redistribute these into OSPF. The Python script doesn't take any care to avoid martians or bogons -- it just spits out a random network with a random CIDR mask.
I was also planning on just pumping my original test devices, a couple of Olives, full of OSPF routes to see what happened -- but then I got it in my head to see what all kinds of various implementations and platforms would do with a LSDB stuffed full of routes. So, I cobbled together a collection of hardware platforms I have and built up some virtual machines to take care of some sofware implementations to see what happens.
I managed to dig up and/or construct the following routing platforms:
I figured it would be the most fun to let all of the boxes neighbor on up with each other and subject them all to more or less the same treatment. The diagram below represents the topology:
I tried to "exercise" each implmentation a bit, so I created three clusters. Each router in a cluster peers with all of the other routers in a shared broadcast domain, then with two other peers via Non-Broacast-Multiple Access (NBMA) links, and then with two other rotuers in the other clusters via a point-to-multipoint link. When an implmentation didn't support a link type or I ran into a RFC intepetation issue, a broadcast network was substituted instead. Every implementation wound up supporting broadcast links nicely, and anything besides broadcast links wound up being a problem with most of the open platforms. MTUs were made adjusted on a link-by-link basis as needed.
Topology Details
Descripton: This is an install of the Vyatta Community Edition 32-bit Virtualization ISO V6.5 running on a KVM with 1 GB RAM and was installed on a 2GB virtual disk.
vyatta@vyatta:~$ show version Version: VC6.5R1 Description: Vyatta Core 6.5 R1 Copyright: 2006-2012 Vyatta, Inc. Built by: autobuild@vyatta.com Built on: Fri Nov 16 16:39:16 UTC 2012 Build ID: 1211161701-334fb58 System type: Intel 32bit Virtual Boot via: disk Hypervisor: KVM HW model: Bochs HW S/N: Not Specified HW UUID: C7CEB5F4-FBF9-475E-3FA3-C9136AF3141B Uptime: 20:40:21 up 4 min, 1 user, load average: 0.00, 0.03, 0.02
Config:
interfaces { ethernet eth1 { duplex auto hw-id 52:54:00:60:c3:d4 smp_affinity auto speed auto vif 1 { address 1.0.0.1/24 ip { ospf { cost 1000 dead-interval 40 hello-interval 10 priority 128 retransmit-interval 5 transmit-delay 1 } } mtu 1496 } vif 11 { address 11.1.1.11/24 ip { ospf { cost 100 dead-interval 40 hello-interval 10 priority 128 retransmit-interval 5 transmit-delay 1 } } } vif 112 { address 1.1.2.1/24 ip { ospf { cost 10 dead-interval 40 hello-interval 10 priority 128 retransmit-interval 5 transmit-delay 1 } } } vif 117 { address 1.1.7.1/24 ip { ospf { cost 10 dead-interval 40 hello-interval 10 priority 128 retransmit-interval 5 transmit-delay 1 } } mtu 1496 } } loopback lo { address 1.1.1.1/32 } } protocols { ospf { area 0.0.0.0 { network 1.0.0.0/24 network 11.1.1.0/24 network 1.1.2.0/24 network 1.1.7.0/24 } passive-interface lo } }
Impressions: I played with one in a VM by it's lonesome before, but his was my first time actually trying to get a Vyatta box to talk to another box. It feels like a wierd mix of Quagga and XORP -- which are in turn IOS-like and a Junos-like knockoffs. The config was like Junos, but with a strange meld of Cisco and Junos heirarchy. It has a Junos like commit and rollback process. The operational commands were very IOS like. I had a lot of problems getting Vyatta to work right. I had to routinely kick and non-broadcast type interfaces by removing and replacing the config, and I had a lot of problems with it not consistently remembering my commits from one reboot to the next. At first I liked this platform, but after working with it a bit I came to hate it almost as much as XORP. I would never use this on a real network.
Issues:
Where to get it: Go here if you want to experience the pain and frustration yourself -- www.vyatta.org
Descripton: This is an install of the stock ospfd OpenOSPFd daemon that is packaged with OpenBSD 5.2. The 64-bit version of OpenBSD 5.2 was installed on a KVM with 1 GB RAM and was installed on a 2GB virtual disk.
# uname -a OpenBSD openospfd 5.2 GENERIC#309 amd64 # md5 /usr/sbin/ospfd MD5 (/usr/sbin/ospfd) = dba2cdcb812566de71c7b43e804e8434
Config: /etc/ospfd.conf
# $OpenBSD: ospfd.conf,v 1.4 2007/06/19 16:49:56 reyk Exp $ # global configuration router-id 1.1.1.2 metric 10 router-priority 128 # areas area 0.0.0.0 { interface vlan1 { metric 1000 } interface vlan22 interface vlan112 { router-priority 250 } interface vlan123 interface lo1 { passive } }
Impressions: Back in the early 90's I used OpenBSD as my firewall device because I liked it's packetfilter a lot, it seemed like a really clean and simple and pretty pure BSD. However I was a bit turned off by the attitues of the OpenBSD developers -- "we need to take everyting and clean it up because everyting else is a messy poorly written security nightmare." Meanwhile, I had constant stability problems with OpenBSD on two different boxes. I tried FreeBSD in it's place when the pf packet filter was ported to it and never looked back. FreeBSD was rock solid on the same hardware that OpenBSD had fits on. Anyway, I had the same impression of the developers of all of the OpenBSD related projects -- especially OpenBGPd and OpenOSPFd so I was a bit skeptical going in. I installed OpenBSD for the first time again in about 10 years, and it hasn't changed a bit. In some ways this was really good, as the binary base is really tiny in comparison to most other modern OSes, and I didn't have to spend much time learning anything new. The config was a standard UNIX file which is parsed by OpenOSPF at boot. It was gated/Junos-like and wasn't very difficult. Howver, there was a lot of stuff missing from the implmentation -- like anything besides broadcast links. I had to change all of the interfaces that the opensospfd implementation particapated in to broadcast links. I also wasn't very impressed with the feedback I was given when from the operational tools or by the ospfd daemon iteslf -- especially when it encountered an error. The implementation was also clearly not checking things it was supposed to be chekcing either -- like agreeing on link type. Running this is a very UNIX daemon like experience, it configs, runs and debugs just like a standard UNIX daemon. What really mystified me though, was it was near impossible to get a version of the daemon itself. For the version I had to post a MD5 sum and the OS version! Overall though, it wasn't too bad an experience and the support on it really seemed tailored to running as part of a firewall cluster than anything else.
Issues:
Where to get it: At www.openbsd.org
Descripton: This is KVM with 2GB RAM running Junos 10.0R4.7.
juniper@Olive2GB> show version brief Hostname: Olive2GB Model: olive JUNOS Base OS boot [10.0R4.7] JUNOS Base OS Software Suite [10.0R4.7] JUNOS Kernel Software Suite [10.0R4.7] JUNOS Crypto Software Suite [10.0R4.7] JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R4.7] JUNOS Packet Forwarding Engine Support (M20/M40) [10.0R4.7] JUNOS Online Documentation [10.0R4.7] JUNOS Voice Services Container package [10.0R4.7] JUNOS Border Gateway Function package [10.0R4.7] JUNOS Services AACL Container package [10.0R4.7] JUNOS Services LL-PDF Container package [10.0R4.7] JUNOS Services Stateful Firewall [10.0R4.7] JUNOS AppId Services [10.0R4.7] JUNOS IDP Services [10.0R4.7] JUNOS Routing Software Suite [10.0R4.7] juniper@Olive2GB>show system boot-messages Copyright (c) 1996-2010, Juniper Networks, Inc. All rights reserved. Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. JUNOS 10.0R4.7 #0: 2010-08-22 03:07:19 UTC builder@ormonth.juniper.net:/volume/build/junos/10.0/release/10.0R4.7/obj-i386/bsd/sys/compile/JUNIPER MPTable:Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: QEMU Virtual CPU version 1.0.1 (3092.96-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x623 Stepping = 3 Features=0x78bfbfd Features2=0x80802001 , > AMD Features=0x20100800 AMD Features2=0x1 real memory = 2147475456 (2047 MB) avail memory = 2093432832 (1996 MB)
Config: This box has some extra config, as it's the first box I picked to do the evil deed of exporting BGP routes into OSPF.
interfaces { fxp0 { unit 0 { family inet { address 172.20.1.66/24; } } } fxp1 { vlan-tagging; unit 1 { vlan-id 1; family inet { address 1.0.0.3/24; } } unit 33 { vlan-id 33; family inet { address 33.3.3.31/24; } } unit 123 { vlan-id 123; family inet { address 1.2.3.3/24; } } unit 134 { vlan-id 134; family inet { address 1.3.4.3/24; } } } lo0 { unit 0 { family inet { address 1.1.1.3/32; } } } } routing-options { static { route 172.20.10.0/24 { next-hop 172.20.1.1; no-readvertise; } } router-id 1.1.1.3; autonomous-system 65066; } protocols { bgp { group exaBGP { type external; multihop { ttl 10; } local-address 172.20.1.66; import FIX-NH; peer-as 65069; neighbor 172.20.10.117; } } ospf { traceoptions { file ospf.log; flag error; flag state; } export EXPORT-BGP-TO-OSPF; area 0.0.0.0 { interface lo0.0 { passive; } interface fxp1.1 { metric 1000; } interface fxp1.123 { metric 10; } interface fxp1.33 { interface-type p2mp; metric 100; hello-interval 30; dead-interval 120; neighbor 33.3.3.32; } interface fxp1.134 { metric 10; } } } } policy-options { policy-statement EXPORT-BGP-TO-OSPF { term STUPID { from protocol bgp; then accept; } } policy-statement FIX-NH { then { next-hop 172.20.1.1; } } }
Impressions: I love Junos! This is definately the cleanest, easiest, and safest config method of the bunch. There are a few imitators such as XORP and Vyatta, but they don't come close to the refinement of Junos.
Issues:
Where to get it: At www.juniper.net
Descripton: This is an install of the BIRD Internet Routing Daemon on a KVM running Ubuntu 12.04 x86_64 with 1 Gig of RAM. BIRD was installed from the Ubuntu repositories with the command apt-get install bird
.
router@bird:~$ sudo birdc BIRD 1.3.4 ready. bird>
Interface Config: /etc/network/interfaces
All of the IP configuration was left up to the OS.
# The loopback network interface auto lo iface lo inet loopback auto lo:0 iface lo:0 inet static address 1.1.1.3 netmask 255.255.255.255 # The primary network interface auto eth0 iface eth0 inet manual auto eth0.0001 iface eth0.0001 inet static address 1.0.0.4 netmask 255.255.255.0 mtu 1496 vlan_raw_device eth0 auto eth0.0044 iface eth0.0044 inet static address 44.4.4.41 netmask 255.255.255.0 vlan_raw_device eth0 auto eth0.0134 iface eth0.0134 inet static address 1.3.4.4 netmask 255.255.255.0 mtu 1496 vlan_raw_device eth0 auto eth0.0145 iface eth0.0145 inet static address 1.4.5.4 netmask 255.255.255.0 vlan_raw_device eth0
BIRD Config: /etc/bird.conf
All of the IP configuration was left up to the OS.
# Override router ID router id 1.1.1.4; protocol kernel { # learn; # Learn all alien routes from the kernel persist; # Don't remove routes on bird shutdown scan time 20; # Scan kernel routing table every 20 seconds export all; # Default is export none } protocol device { scan time 10; # Scan interfaces every 10 seconds } protocol ospf OSPFol { tick 2; rfc1583compat no; area 0.0.0.0 { stub no; interface "eth0.0001" { cost 1000; priority 128; }; interface "eth0.0044" { cost 100; priority 128; }; interface "eth0.0134" { cost 10; }; interface "eth0.0145" { cost 10; type ptp; }; interface "lo:0" { }; }; }
Impressions: This was actually my first time using BIRD. Like, OpenOSPFd, it is very UNIX daemon-like in it's configuration and operation. However, it has a pretty nice interactive cli for querying the status of the daemon that seemed to work pretty wll and be very intuitive. Overall BIRD was very easy to configure and debug and supported everyting I was trying to do (simplisitc OSPF) very easily. They have a good sense of humor too. In the style of GNU, BIRD stands for the BIRD Internet Routing Daemon.
Issues:
Where to get it: At BIRD Internet Routing Daemon
Descripton: This is an install of the stock quagga set of daemons that are packaged with Ubuntu 12.04 and was installed with apt-get install quagga
. This is running on the same base install as the VM runnig BIRD -- also on a KVM with 1 Gb RAM.
quagga-router# sh ver Quagga 0.99.20.1 (quagga-router). Copyright 1996-2005 Kunihiro Ishiguro, et al. quagga-router#
Config: /etc/quagga/Quagga.conf
interface eth0 ipv6 nd suppress-ra ! interface eth0.0001 ip address 1.0.0.5/24 ip ospf cost 1000 ip ospf priority 128 ipv6 nd suppress-ra ! interface eth0.0055 ip ospf cost 100 ip ospf priority 128 ipv6 nd suppress-ra ! interface eth0.0145 ip address 1.4.5.5/24 ip ospf cost 10 ip ospf network point-to-point ip ospf priority 128 ipv6 nd suppress-ra ! interface eth0.0156 ip address 1.5.6.5/24 ip ospf cost 10 ip ospf priority 128 ipv6 nd suppress-ra ! interface lo description "Loopback Interface" ! interface lo:0 ip address 1.1.1.5/32 ipv6 nd suppress-ra ! router ospf ospf router-id 1.1.1.5 passive-interface lo:0 network 1.0.0.0/24 area 0.0.0.0 network 1.1.1.5/32 area 0.0.0.0 network 1.4.5.0/24 area 0.0.0.0 network 1.5.6.0/24 area 0.0.0.0 network 55.5.5.0/24 area 0.0.0.0 !
Impressions: I have a soft spot in my heart for Quagga as it's what I used to learn Cisco IOS and BGP (before I could afford a pair of used 3620 routers off eBay). I've run it on a ton of servers and VMs, including NetBSD, FreeBSD and all sorts of Linux distros. The config is very Cisco IOS-like, with a bit of sanity thrown in as far as network masks. However, I've also had a lot of problems with it. Quagga is a set of daemons -- a control daemon, and then one for each protocol you run -- and each has their own config and own interface, which is very fractured. It's possible to set Quagga up to use a integrated config, which is what I always do, but it always turns out to be a battle to get it working right and it's almost never smooth. I've also had some problems in my years of running it -- from OSPF stability issues, to memory leaks, to crashes, to wierd multicasting problems. On later versions of FreeBSD I seemed to always have issues getting it to use any multicasting -- so things like OSPF never worked right. Quagga also always seems to like to get in a battle with the underlying OS in terms of interface configurations. Even with my complaints, I still run Quagga on many a Linux box.
Issues:
Where to get it: At http://www.nongnu.org/quagga/
Descripton: This is an install of the stock XORP suite available from the Ubuntu 12.04 repositories and was installed with apt-get install xorp
. This is the same base KVM and OS install that was used for BIRD and Quagga, 64-bit Ubuntu 12.04 on a 2GB VM disk with 1 GB RAM. .
router@xorp:~$ xorpsh Welcome to XORP on xorp router@xorp> show version Version 1.8.3 router@xorp>
Config: /etc/xorp/config.boot
/* XORP configuration file * * Configuration format: 1.1 * XORP version: 1.8.3 * Date: 2013/01/05 23:09:19.188435 * Host: xorp * User: router */ protocols { ospf4 { router-id: 1.1.1.6 rfc1583-compatibility: false ip-router-alert: false area 0.0.0.0 { area-type: "normal" interface lo { link-type: "broadcast" vif lo { address 1.1.1.6 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 1 retransmit-interval: 5 transit-delay: 1 passive { disable: false host: false } disable: false } } } interface eth0 { link-type: "broadcast" vif eth0 { address 1.0.0.6 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 1000 retransmit-interval: 5 transit-delay: 1 disable: false } } } interface eth1 { link-type: "broadcast" vif eth1 { address 66.6.6.61 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 100 retransmit-interval: 5 transit-delay: 1 disable: false } } } interface eth2 { vif eth2 { address 1.5.6.6 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 10 retransmit-interval: 5 transit-delay: 1 disable: false } } } interface eth3 { link-type: "broadcast" vif eth3 { address 1.6.7.6 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 10 retransmit-interval: 5 transit-delay: 1 disable: false } } } } } } fea { unicast-forwarding4 { disable: false forwarding-entries { retain-on-startup: false retain-on-shutdown: false } } } interfaces { restore-original-config-on-shutdown: false interface eth0 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } interface eth1 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } interface eth2 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } interface eth3 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } interface lo { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" vif lo { disable: false address 1.1.1.6 { prefix-length: 32 disable: false } } } }
Impressions: I actually used XORP on a couple of FreeBSD boxes some time back as an alternative to Quagga. Quagga was having some serious issues annoucing OSPF hellos to 225.0.0.1 on a couple of FreeBSD boxes I had -- so I searched for something else. Somehow I wound up compling XORP from source by hand, as none of the packaged versions seemed to work right. After some fiddling it filled the gap that Quagga had left in my FreeBSD world. Everything in XORP is done through a pretty poor knock off of the Junos cli. XORP and the OS had some real battles when it came to who was the one to setup VLAN interfaces on Linux. In the end I was so frustrated, that on the XORP box I setup three additional interfaces on the VM and let OpenvSwitch add the VLAN tags to the traffic coming off the interfaces. XORP also couldn't form any point-to-point adjacencies with any other router, so I had to change all of these to broadcast links. XORP also has the same annoying mannerisms as Vyatta when it comes to saving the config. You have commit it, then save it to a file by hand, and then copy it to be the boot config. There are probably easier ways to do this, but I found it a real annoyance. I kept forgetting to do the manual save file routine after I got things working. Junos has spoiled me. In the end, I actually hated XORP more than Vyatta. In retrospect, when I was tinkering with this back on by FreeBSD boxen, I have no idea why I didn't try BIRD instead.
Issues:
Where to get it: At XORP if you dare.
Descripton: This is a clone of the 2GB Olive, except with half the RAM. An olive running Junos 10.0R4.7.
juniper@Olive1GB> show version Hostname: Olive1GB Model: olive JUNOS Base OS boot [10.0R4.7] JUNOS Base OS Software Suite [10.0R4.7] JUNOS Kernel Software Suite [10.0R4.7] JUNOS Crypto Software Suite [10.0R4.7] JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R4.7] JUNOS Packet Forwarding Engine Support (M20/M40) [10.0R4.7] JUNOS Online Documentation [10.0R4.7] JUNOS Voice Services Container package [10.0R4.7] JUNOS Border Gateway Function package [10.0R4.7] JUNOS Services AACL Container package [10.0R4.7] JUNOS Services LL-PDF Container package [10.0R4.7] JUNOS Services Stateful Firewall [10.0R4.7] JUNOS AppId Services [10.0R4.7] JUNOS IDP Services [10.0R4.7] JUNOS Routing Software Suite [10.0R4.7] juniper@Olive1GB> show system boot-messages Copyright (c) 1996-2010, Juniper Networks, Inc. All rights reserved. Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. JUNOS 10.0R4.7 #0: 2010-08-22 03:07:19 UTC builder@ormonth.juniper.net:/volume/build/junos/10.0/release/10.0R4.7/obj-i386/bsd/sys/compile/JUNIPER MPTable:Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: QEMU Virtual CPU version 1.0.1 (3092.96-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x623 Stepping = 3 Features=0x78bfbfd Features2=0x80802001 , > AMD Features=0x20100800 AMD Features2=0x1 real memory = 1073733632 (1023 MB) avail memory = 1038938112 (990 MB)
Config:
interfaces { fxp1 { vlan-tagging; unit 1 { vlan-id 1; family inet { address 1.0.0.7/24; } } unit 77 { vlan-id 77; family inet { address 77.7.7.71/24; } } unit 117 { vlan-id 117; family inet { address 1.1.7.7/24; } } unit 167 { vlan-id 167; family inet { address 1.6.7.7/24; } } } lo0 { unit 0 { family inet { address 1.1.1.7/32; } } } } routing-options { router-id 1.1.1.7; } protocols { ospf { traceoptions { file ospf.log; flag error; } area 0.0.0.0 { interface lo0.0 { passive; } interface fxp1.1 { metric 1000; } interface fxp1.77 { interface-type nbma; metric 100; neighbor 77.7.7.72; neighbor 77.7.7.73; } interface fxp1.117 { metric 10; } interface fxp1.167 { metric 10; } } } }
Impressions: I love Junos -- especially after dealing with XORP and Vyatta!
Issues:
Where to get it: At Juniper Networks
Descripton: This is a Juniper Networks EX3200-24T switch. It's running Junos 12.2R2.4 and has 512MB of RAM.
juniper@EX3200-2_OSPF> show version Hostname: EX3200-2_OSPF Model: ex3200-24t JUNOS Base OS boot [12.2R2.4] JUNOS Base OS Software Suite [12.2R2.4] JUNOS Kernel Software Suite [12.2R2.4] JUNOS Crypto Software Suite [12.2R2.4] JUNOS Online Documentation [12.2R2.4] JUNOS Enterprise Software Suite [12.2R2.4] JUNOS Packet Forwarding Engine Enterprise Software Suite [12.2R2.4] JUNOS Routing Software Suite [12.2R2.4] JUNOS Web Management [12.2R2.4] JUNOS FIPS mode utilities [12.2R2.4] juniper@EX3200-2_OSPF> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 EX3200-24T Routing Engine 0 REV 11 750-021261 XX0000000000 EX3200-24T, 8 POE FPC 0 REV 11 750-021261 XX0000000000 EX3200-24T, 8 POE CPU BUILTIN BUILTIN FPC CPU PIC 0 BUILTIN BUILTIN 24x 10/100/1000 Base-T Power Supply 0 REV 03 740-020957 XX0000000000 PS 320W AC Fan Tray Fan Tray
Config: The EX3200 does not allow for specification of neighbors on a point-to-mulitpoint interface.
interfaces { ge-0/0/1 { vlan-tagging; unit 2 { vlan-id 2; family inet { address 2.0.0.1/24; } } unit 11 { vlan-id 11; family inet { address 11.1.1.21/24; } } unit 212 { vlan-id 212; family inet { address 2.1.2.1/24; } } unit 217 { vlan-id 217; family inet { address 2.1.7.1/24; } } } lo0 { unit 0 { family inet { address 2.2.2.1/32; } } } } routing-options { router-id 2.2.2.1; } protocols { ospf { area 0.0.0.0 { interface ge-0/0/1.2 { metric 1000; } interface ge-0/0/1.212 { interface-type p2p; metric 10; } interface ge-0/0/1.217 { interface-type p2p; metric 10; } interface ge-0/0/1.11 { metric 100; } interface lo0.0 { passive; } } } }
Impressions: This switch is setup as a full blown OSPF router, no need to do everything of routed VLAN interfaces (RVI, SVI, Bridge Interface, etc). This is a very fast, but very noisy switch from the fans.
Issues:
Where to get it: At Juniper Networks EX3200
Descripton: This is a Juniper Networks SRX100H with 1GB RAM with Junos 11.1R6.4 configured to run in packet mode, vice flow mode.
juniper@SRX100-6_OSPF> show version Hostname: SRX100-6_OSPF Model: srx100h JUNOS Software Release [12.1R4.7] juniper@SRX100-6_OSPF> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 SRX100H Routing Engine REV 19 750-021773 XX0000000000 RE-SRX100H FPC 0 FPC PIC 0 8x FE Base PIC Power Supply 0 juniper@SRX100-6_OSPF> show system boot-messages | match memory real memory = 1073741824 (1024MB) avail memory = 526499840 (502MB)
Config: The command set security forwarding-options family mpls mode packet-based
makes the SRX forward packets individually, rather than based on stateful flows, after a reboot. This can be seen with the operational command below:
juniper@SRX100-6_OSPF> show security flow status Flow forwarding mode: Inet forwarding mode: packet based Inet6 forwarding mode: drop MPLS forwarding mode: packet based ISO forwarding mode: drop Flow trace status Flow tracing status: offAnd the config:
interfaces { fe-0/0/0 { vlan-tagging; unit 2 { vlan-id 2; family inet { address 2.0.0.2/24; } } unit 22 { vlan-id 22; family inet { address 22.2.2.22/24; } } unit 212 { vlan-id 212; family inet { address 2.1.2.2/24; } } unit 223 { vlan-id 223; family inet { address 2.2.3.2/24; } } } lo0 { unit 0 { family inet { address 2.2.2.2/32; } } } } routing-options { router-id 2.2.2.2; } protocols { ospf { area 0.0.0.0 { interface lo0.0 { passive; } interface fe-0/0/0.2 { metric 1000; } interface fe-0/0/0.212 { interface-type p2p; metric 10; } interface fe-0/0/0.223 { interface-type p2p; metric 10; } interface fe-0/0/0.22 { metric 100; } } } } security { forwarding-options { family { mpls { mode packet-based; } } } }
Impressions: The SRX100 is an awesome little platform. Quick, cheap and it does everything imaginable. These are great platforms for learning and testing! I bought a pair of these when they first came out to play with the High Availability features. Yes, these little boxes can form a nice little HA pair.
Issues:
Where to get it: At Juniper Networks SRX100
Descripton: This is a Cisco 3640 with 128 MB of RAM running IOS 12.4(25b) Telco train.
C3640-1>sh ver Cisco IOS Software, 3600 Software (C3640-TELCO-M), Version 12.4(25b), RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2009 by Cisco Systems, Inc. Compiled Wed 12-Aug-09 12:52 by prod_rel_team ROM: System Bootstrap, Version 11.1(20)AA2, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) C3640-1 uptime is 1 hour, 10 minutes System returned to ROM by power-on System restarted at 17:32:39 UTC Fri Jan 4 2013 System image file is "flash:c3640-telco-mz.124-25b.bin" Cisco 3640 (R4700) processor (revision 0x00) with 92160K/38912K bytes of memory. Processor board ID 21961615 R4700 CPU at 100MHz, Implementation 33, Rev 1.0 2 FastEthernet interfaces 2 Token Ring interfaces 3 Serial interfaces DRAM configuration is 64 bits wide with parity disabled. 125K bytes of NVRAM. 32768K bytes of processor board System flash (Read/Write) Configuration register is 0x2102
Config:
interface Loopback0 ip address 2.2.2.3 255.255.255.255 ! interface FastEthernet0/0 no ip address duplex auto speed auto ! interface FastEthernet0/0.2 encapsulation dot1Q 2 ip address 2.0.0.3 255.255.255.0 ip ospf cost 10 ip ospf priority 128 ! interface FastEthernet0/0.33 encapsulation dot1Q 33 ip address 33.3.3.32 255.255.255.0 ip mtu 1496 ip ospf network point-to-multipoint ip ospf cost 100 ip ospf priority 255 ip ospf mtu-ignore ! interface FastEthernet0/0.223 encapsulation dot1Q 223 ip address 2.2.3.3 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ! interface FastEthernet0/0.234 encapsulation dot1Q 234 ip address 2.3.4.3 255.255.255.0 ip ospf network point-to-point ip ospf priority 128 ! router ospf 1 router-id 2.2.2.3 log-adjacency-changes passive-interface Loopback0 network 2.0.0.0 0.0.0.255 area 0 network 2.2.2.3 0.0.0.0 area 0 network 2.2.3.0 0.0.0.255 area 0 network 2.3.4.0 0.0.0.255 area 0 network 33.3.3.0 0.0.0.255 area 0 neighbor 33.3.3.33 cost 100 neighbor 33.3.3.31 cost 100 !
Impressions: I collected five of these in total from eBay over the years, and these are what I would consider to be my first "real" routers. I had a couple of Cisco 3620's as well, but those were too limited to do anything fun with. These are pretty versitile little boxes, if a tad bit slow, but they supported a ton of different interfaces. I used these to learn IS-IS and BGP, voice, ATM, as well as MPLS on Cisco. Until I came across a couple of 2811s for cheap, one of these functioned as my terminal server with an ASYNC-32A card.
Issues:
Where to get it: Since these are EOS and EOL for a long time now, you're only option is eBay.
Descripton: This is a Juniper Networks SRX210HE running Junos 10.4R6.5 with 1GB of RAM that supports Power over Ethernet (PoE).
juniper@SRX210HE_OSPF> show version Hostname: SRX210HE_OSPF Model: srx210he-poe JUNOS Software Release [12.1R4.7] juniper@SRX210HE_OSPF> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 SRX210he-poe Routing Engine REV 05 750-034596 XX000000 RE-SRX210HE-POE FPC 0 FPC PIC 0 2x GE, 6x FE, 1x 3G Power Supply 0 juniper@SRX210HE_OSPF> show system boot-messages | match memory real memory = 1073741824 (1024MB) avail memory = 527036416 (502MB)
Config:
interfaces { ge-0/0/0 { vlan-tagging; unit 2 { vlan-id 2; family inet { address 2.0.0.4/24; } } unit 44 { vlan-id 44; family inet { address 44.4.4.42/24; } } unit 234 { vlan-id 234; family inet { address 2.3.4.4/24; } } unit 245 { vlan-id 245; family inet { address 2.4.5.4/24; } } } } routing-options { router-id 2.2.2.4; } protocols { ospf { area 0.0.0.0 { interface ge-0/0/0.2 { metric 1000; } interface ge-0/0/0.234 { interface-type p2p; metric 10; } interface ge-0/0/0.245 { interface-type p2p; metric 10; } interface ge-0/0/0.44 { metric 100; } } } } security { zones { security-zone OSPF { host-inbound-traffic { system-services { ping; } protocols { ospf; } } interfaces { ge-0/0/0.2; lo0.0; ge-0/0/0.234; ge-0/0/0.245; ge-0/0/0.44; } } } }This one is functioning in flow (stateful) mode:
juniper@SRX210HE_OSPF> show security flow status Flow forwarding mode: Inet forwarding mode: flow based Inet6 forwarding mode: drop MPLS forwarding mode: drop ISO forwarding mode: drop Flow trace status Flow tracing status: off
Impressions: This is an updated version of the SRX210H, same thing but has a slightly faster processor which is ironically the same one found in the SRX100 series. I bought a pair of SRX210Hs right when they came out several years ago, and one has been functioning as my main firewall/router ever since at home. Although there have been some growing pains, as with any new product, this has really matured into a really nice and very capable litlle box and I am very happy with it. I used my original SRX210s to learn flow mode Junos.
Issues:
Where to get it: At Juniper Networks
Descripton: This is a Netscreen NS-208 running ScreenOS 5.4.0r18.0 with 128MB of RAM.
ns208-> get system Product Name: NetScreen-208 Serial Number: 0000000000000000, Control Number: 00000000 Hardware Version: 0110(0)-(12), FPGA checksum: 00000000, VLAN1 IP (0.0.0.0) Software Version: 5.4.0r18.0, Type: Firewall+VPN Compiled by build_master at: Tue Aug 17 08:51:27 PDT 2010 Base Mac: 0012.1ea3.8af0 File Name: screenos_image, Checksum: 6c2f30ed
Config:
set vrouter name "OSPF" id 1025 set vrouter "OSPF" unset auto-route-export set protocol ospf set enable exit set zone id 1002 "OSPF" set zone "OSPF" vrouter "OSPF" set zone "OSPF" tcp-rst set interface "ethernet2.2" tag 2 zone "OSPF" set interface "ethernet2.4" tag 245 zone "OSPF" set interface "ethernet2.5" tag 256 zone "OSPF" set interface "ethernet2.7" tag 55 zone "OSPF" set interface "loopback.2" zone "OSPF" set interface ethernet2.2 ip 2.0.0.5/24 set interface ethernet2.2 route set interface ethernet2.4 ip 2.4.5.5/24 set interface ethernet2.4 route set interface ethernet2.5 ip 2.5.6.5/24 set interface ethernet2.5 route set interface ethernet2.7 ip 55.5.5.52/24 set interface ethernet2.7 route set interface loopback.2 ip 2.2.2.5/32 set interface loopback.2 route set interface ethernet2.2 ip manageable set interface ethernet2.4 ip manageable set interface ethernet2.5 ip manageable set interface ethernet2.7 ip manageable set interface loopback.2 ip manageable set interface ethernet2.2 manage ping set interface ethernet2.4 manage ping set interface ethernet2.5 manage ping set interface ethernet2.7 manage ping set interface ethernet2.7 manage ssh set interface ethernet2.7 manage telnet set interface ethernet2.7 manage snmp set interface ethernet2.7 manage ssl set interface ethernet2.7 manage web set interface ethernet2.7 manage mtrace set interface loopback.2 manage ping set vrouter "OSPF" set router-id 2.2.2.5 exit set interface ethernet2.2 protocol ospf area 0.0.0.0 set interface ethernet2.2 protocol ospf enable set interface ethernet2.2 protocol ospf priority 128 set interface ethernet2.2 protocol ospf cost 1000 set interface ethernet2.4 protocol ospf area 0.0.0.0 set interface ethernet2.4 protocol ospf link-type p2p set interface ethernet2.4 protocol ospf enable set interface ethernet2.4 protocol ospf priority 128 set interface ethernet2.4 protocol ospf cost 10 set interface ethernet2.5 protocol ospf area 0.0.0.0 set interface ethernet2.5 protocol ospf link-type p2p set interface ethernet2.5 protocol ospf enable set interface ethernet2.5 protocol ospf priority 128 set interface ethernet2.5 protocol ospf cost 10 set interface loopback.2 protocol ospf area 0.0.0.0 set interface loopback.2 protocol ospf passive set interface ethernet2.7 protocol ospf area 0.0.0.0 set interface ethernet2.7 protocol ospf enable set interface ethernet2.7 protocol ospf priority 128 set interface ethernet2.7 protocol ospf cost 100 set vrouter "OSPF" exit
Impressions: I don't have to touch Netscreens too often, and for that I'm glad. I'm not a big fan of ScreenOS, but they are very reliable little boxes and a lot of people still use them. Additionally, flow mode Junos inherited a lot of the concepts and security modes of operation from the Netscreens. I find ScreenOS a really clunky take on Cisco's cli, except you use set,get and unset instead of show and no. I don't like working with it. It's not really intuitive, and for some reason has the backspace key mapped to ^H, so you have to remember to setup your terminal ahead of time (or change your profile) before connecting to one of these. To really do anything with these, I think you need to use NSM.
Issues:
Where to get it: These have been EOS and EOL for a quite a while, so you're only hope is eBay.
Descripton: This is a Cisco 2811 running the Advanced Enterprise Train of IOS 15.1 with 256 MB of RAM
C2811-1>sh ver Cisco IOS Software, 2800 Software (C2800NM-ADVENTERPRISEK9-M), Version 15.1(1)XB, RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2009 by Cisco Systems, Inc. Compiled Mon 21-Dec-09 01:14 by prod_rel_team ROM: System Bootstrap, Version 12.4(1r) [hqluong 1r], RELEASE SOFTWARE (fc1) C2811-1 uptime is 6 minutes System returned to ROM by power-on System image file is "flash:c2800nm-adventerprisek9-mz.151-1.XB.bin" Cisco 2811 (revision 53.51) with 247808K/14336K bytes of memory. Processor board ID FTX1045A5UA 1 DSL controller 18 FastEthernet interfaces 2 Serial(sync/async) interfaces 1 ATM interface 1 Virtual Private Network (VPN) Module DRAM configuration is 64 bits wide with parity enabled. 239K bytes of non-volatile configuration memory. 125440K bytes of ATA CompactFlash (Read/Write)
Config:
! interface Loopback0 ip address 2.2.2.6 255.255.255.255 ! interface FastEthernet0/0 no ip address duplex auto speed auto ! interface FastEthernet0/0.2 encapsulation dot1Q 2 ip address 2.0.0.6 255.255.255.0 ip ospf cost 1000 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! interface FastEthernet0/0.66 encapsulation dot1Q 66 ip address 66.6.6.62 255.255.255.0 ip ospf cost 100 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! interface FastEthernet0/0.256 encapsulation dot1Q 256 ip address 2.5.6.6 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! interface FastEthernet0/0.267 encapsulation dot1Q 267 ip address 2.6.7.6 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! router ospf 1 router-id 2.2.2.6 log-adjacency-changes passive-interface Loopback0 network 2.0.0.6 0.0.0.0 area 0.0.0.0 network 2.2.2.6 0.0.0.0 area 0.0.0.0 !
Impressions: These were very nice little capable all around boxes. I'm currently using one with a NM-32A port as my terminal server for my lab.
Issues:
Where to get it: These are EOS now, so you need to search eBay if you want one.
Descripton: This is a RouterBoard 750G running MikroTik RouterOS 5.22 with 32 MB of RAM.
[admin@RB750G] > system routerboard print routerboard: yes model: 750G serial-number: 000000000000 current-firmware: 2.41 upgrade-firmware: 2.41 [admin@RB750G] > system resource print uptime: 3d52m55s version: 5.22 free-memory: 17956KiB total-memory: 29696KiB cpu: MIPS 24Kc V7.4 cpu-count: 1 cpu-frequency: 680MHz cpu-load: 1% free-hdd-space: 33308KiB total-hdd-space: 61440KiB write-sect-since-reboot: 210579 write-sect-total: 296987 bad-blocks: 0% architecture-name: mipsbe board-name: RB750G platform: MikroTik [admin@RB750G] >
Config: This is obtained with the /export command at the RouterOS cli. As with the other routers, parts I considered irrelevant to this exercise are not shown.
/interface bridge add admin-mac=00:00:00:00:00:00 ageing-time=5m arp=enabled auto-mac=yes \ disabled=no forward-delay=15s l2mtu=65535 max-message-age=20s mtu=1500 \ name=Loopback1 priority=0x8000 protocol-mode=none transmit-hold-count=6 /interface ethernet set 0 arp=enabled auto-negotiation=yes bandwidth=unlimited/unlimited \ disabled=no full-duplex=yes l2mtu=1520 mac-address=00:0C:42:A5:C5:7F \ master-port=none mtu=1500 name=OSPF speed=100Mbps /interface vlan add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=CLUSTER2 \ use-service-tag=no vlan-id=2 add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=VLAN217 \ use-service-tag=no vlan-id=217 add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=VLAN267 \ use-service-tag=no vlan-id=267 add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1496 name=P2MP7 \ use-service-tag=no vlan-id=77 /routing ospf instance set [ find default=yes ] disabled=no distribute-default=never in-filter=\ ospf-in metric-bgp=auto metric-connected=20 metric-default=1 \ metric-other-ospf=auto metric-rip=20 metric-static=20 name=default \ out-filter=ospf-out redistribute-bgp=no redistribute-connected=no \ redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \ router-id=2.2.2.7 /routing ospf area set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\ backbone type=default /routing ospf-v3 area set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\ backbone type=default /ip address add address=2.2.2.7/32 disabled=no interface=Loopback1 network=2.2.2.7 add address=2.0.0.7/24 disabled=no interface=CLUSTER2 network=2.0.0.0 add address=2.1.7.7/24 disabled=no interface=VLAN217 network=2.1.7.0 add address=2.6.7.7/24 disabled=no interface=VLAN267 network=2.6.7.0 add address=77.7.7.72/24 disabled=no interface=P2MP7 network=77.7.7.0 /routing ospf interface add authentication=none authentication-key="" authentication-key-id=1 cost=10 \ dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\ Loopback1 network-type=broadcast passive=yes priority=1 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=\ 1000 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \ interface=CLUSTER2 network-type=broadcast passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=10 \ dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\ VLAN217 network-type=point-to-point passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=10 \ dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\ VLAN267 network-type=point-to-point passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=\ 100 dead-interval=2m disabled=no hello-interval=30s instance-id=0 \ interface=P2MP7 network-type=nbma passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no /routing ospf nbma-neighbor add address=77.7.7.71 disabled=no instance=default poll-interval=2m priority=\ 128 add address=77.7.7.73 disabled=no instance=default poll-interval=2m priority=\ 0 /routing ospf network add area=backbone disabled=no network=2.2.2.7/32 add area=backbone disabled=no network=2.0.0.0/24 add area=backbone disabled=no network=2.1.7.0/24 add area=backbone disabled=no network=2.6.7.0/24 add area=backbone disabled=no network=77.7.7.0/24
Impressions: I first picked up a RouterBoard about 5 or 6 years ago just to play with, and because it was so damn cheap -- I mean REALLY cheap! These things have evolved over the years to be very capable little boxes. I've been using a RB600 as my Wireless Access Point (with 3 radios in it), for several years. I have a RB750 or RB750G scattered throuthout the house as well. The cli is kind of unique, but a bit like some of the old Marconi ATM switches I've used a couple of times. The cli takes a bit of getting used to, but you can wade your way through it without too many problems. However, MikroTik has a pretty capable GUI client called winbox. Unfortunately, it's Windows only and I'm a Linux/FreeBSD user, but it runs fine under wine. In the later versions of RouterOS, the built in WebFig interface (http and/or https) has really become very slick and extremely capable. It is easily the best web based router/swich config manager I've used. It's intuitive, fast, doesn't use a lot of router or client resources and is quite full featured. I'm a big cli user, but for RouterBoards I wind up using WebFig more than anything else. As far as the RB750G goes, it is amazingly cheap and capbable -- doing MPLS, MPLS-TE, VPLS, L3VPNs, BGP, IPv6, etc. The only thing missing is IS-IS, and one thing that bugs me a bit -- an actual interface called "loopback". If you want to use a loopback interface, you have to create a bridge interface and not bind any ports to it. The RB750s are kind of a pain to setup as well, as they have no serial ports, so you have to connect via the first ethernet port. If you forget how one was setup or the password after it's been lying around awhile, you have to factory reset the whole box. Kind of a pain, but it keeps the cost down.
Issues:
Where to get it: RouterBoards are available at http://routerboard.com/ and RouterOS is available at Mikrotik. The RB750G isn't available any more, it's been replaced with the RB750GL which looks to have twice the memory.
Descripton: This is a Juniper Networks SRX100B running Junos 12.1R3.5 with 512 MB of RAM
juniper@SRX100-5_OSPF> show version Hostname: SRX100-5_OSPF Model: srx100b JUNOS Software Release [12.1R4.7] juniper@SRX100-5_OSPF> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 SRX100B Routing Engine REV 02 750-021773 XX0000000000 RE-SRX100B FPC 0 FPC PIC 0 8x FE Base PIC Power Supply 0 juniper@SRX100-5_OSPF> show system boot-messages | match memory real memory = 536870912 (512MB) avail memory = 304193536 (290MB) juniper@SRX100-5_OSPF>
Config:
interfaces { fe-0/0/1 { vlan-tagging; unit 3 { vlan-id 3; family inet { address 3.0.0.1/24; } } unit 11 { vlan-id 11; family inet { address 11.1.1.33/24; } } unit 312 { vlan-id 312; family inet { address 3.1.2.1/24; } } unit 317 { vlan-id 317; family inet { address 3.1.7.1/24; } } } lo0 { unit 0 { family inet { address 3.3.3.1/32; } } } } routing-options { router-id 3.3.3.1; } protocols { ospf { area 0.0.0.0 { interface lo0.0 { passive; } interface fe-0/0/1.3 { metric 1000; } interface fe-0/0/1.312 { interface-type p2p; metric 10; } interface fe-0/0/1.317 { interface-type p2p; metric 10; } interface fe-0/0/1.11 { metric 100; } } } } security { zones { security-zone OSPF { host-inbound-traffic { system-services { ping; } protocols { ospf; } } interfaces { fe-0/0/1.3; lo0.0; fe-0/0/1.312; fe-0/0/1.317; fe-0/0/1.11; } } } }And this one is running in flow mode:
juniper@SRX100-5_OSPF> show security flow status Flow forwarding mode: Inet forwarding mode: flow based Inet6 forwarding mode: drop MPLS forwarding mode: drop ISO forwarding mode: drop Flow trace status Flow tracing status: off
Impressions: This is the same thing as a SRX100H, but only has half of the RAM enabled. This is upgradable to a full SRX100H via a software license which "turns on" the remaining RAM. This is one of the first batch of SRX100s I bought.
Where to get it: At Juniper Networks SRX100
Issues:
Descripton: This is a Juniper Networks J2300 running Junos 9.3R4.4 with 1 GB of RAM.
juniper@J2300-7> show version Hostname: J2300-7 Model: j2300 JUNOS Software Release [9.3R4.4] juniper@J2300-7> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX00000000 J2300 Routing Engine REV 07 750-009992 XX00000000 RE-J.1 FPC 0 REV 05 750-011147 XX00000000 FPC PIC 0 2x FE, 2x E1 Power Supply 0 juniper@J2300-7> show system boot-messages | match memory real memory = 1073741824 (1024 MB) avail memory = 703148032 (670 MB) pcib0:pcibus 0 on motherboard juniper@J2300-7>
Config:
interfaces { fe-0/0/0 { vlan-tagging; unit 3 { vlan-id 3; family inet { address 3.0.0.2/24; } } unit 22 { vlan-id 22; family inet { address 22.2.2.23/24; } } unit 312 { vlan-id 312; family inet { address 3.1.2.2/24; } } unit 323 { vlan-id 323; family inet { address 3.2.3.2/24; } } } lo0 { unit 0 { family inet { address 3.3.3.2/32; } } } } routing-options { router-id 3.3.3.2; } protocols { ospf { area 0.0.0.0 { interface lo0.0 { passive; } interface fe-0/0/0.3 { metric 1000; } interface fe-0/0/0.312 { interface-type p2p; metric 10; } interface fe-0/0/0.323 { interface-type p2p; metric 10; } interface fe-0/0/0.22 { metric 100; } } } }
Impressions: This is one of a lot of J2300s that I got off of eBay. I used this one to study for the Class of Service, Multicast, and L2VPN sections of my JNCIE back a few years ago. These are actually really nice little boxes that run in packet mode only. I don't use them too much anymore since I got my SRXs, but I still find room and time to play with them.
Issues:
Where to get it: These are EOL and EOS, so you have to cruise eBay if you want one.
Descripton: This is a Cisco 3750-24P running IOS 12.2(50)SE3 using the IP services image. It has 128 MB of RAM.
C3750-1>sh ver Cisco IOS Software, C3750 Software (C3750-IPSERVICESK9-M), Version 12.2(50)SE3, RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2009 by Cisco Systems, Inc. Compiled Wed 22-Jul-09 06:19 by prod_rel_team Image text-base: 0x01000000, data-base: 0x02D00000 ROM: Bootstrap program is C3750 boot loader BOOTLDR: C3750 Boot Loader (C3750-HBOOT-M) Version 12.2(44)SE5, RELEASE SOFTWARE (fc1) C3750-1 uptime is 1 minute System returned to ROM by power-on System image file is "flash:/c3750-ipservicesk9-mz.122-50.SE3.bin" cisco WS-C3750-24P (PowerPC405) processor (revision C0) with 131072K bytes of memory. Processor board ID CAT0834X15V Last reset from power-on 5 Virtual Ethernet interfaces 24 FastEthernet interfaces 2 Gigabit Ethernet interfaces The password-recovery mechanism is enabled. 512K bytes of flash-simulated non-volatile configuration memory. Base ethernet MAC Address : 00:11:BB:DA:46:80 Motherboard assembly number : 73-8312-04 Power supply part number : 341-0029-03 Motherboard serial number : XX000000000 Power supply serial number : XX000000000 Model revision number : C0 Motherboard revision number : B0 Model number : WS-C3750-24PS-S System serial number : CAT0834X15V Top Assembly Part Number : 800-21982-01 Top Assembly Revision Number : F0 Version ID : N/A Hardware Board Revision Number : 0x09 Switch Ports Model SW Version SW Image ------ ----- ----- ---------- ---------- * 1 26 WS-C3750-24P 12.2(50)SE3 C3750-IPSERVICESK9-M Configuration register is 0xF
Config:
! vlan 3,33,323,334 ! interface Loopback0 ip address 3.3.3.3 255.255.255.255 ! interface FastEthernet1/0/1 description "OSPF TEST" switchport trunk encapsulation dot1q switchport mode trunk switchport nonegotiate ! interface Vlan3 ip address 3.0.0.3 255.255.255.0 ip ospf cost 1000 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! interface Vlan33 ip ospf network point-to-multipoint ip ospf cost 100 ip ospf priority 0 ip ospf mtu-ignore ip ospf 1 area 0.0.0.0 ! interface Vlan323 ip address 3.2.3.3 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! interface Vlan334 ip address 3.3.4.3 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ip ospf 1 area 0.0.0.0 ! router ospf 1 router-id 3.3.3.3 log-adjacency-changes passive-interface Loopback0 neighbor 33.3.3.32 cost 100 !
Impressions: This is definately a switch, all of the router-like functions had to be performed on RVI pseudo-interfaces. It still supported all of the OSPF interface types though.
Issues:
Where to get it: These are EOL, EOS as well => eBay.
Descripton: This is a SRX100H running Junos 11.4R5.5 with 1 GB of RAM.
juniper@SRX100-7_OSPF> show version Hostname: SRX100-7_OSPF Model: srx100h JUNOS Software Release [12.1R4.7] juniper@SRX100-7_OSPF> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 SRX100H Routing Engine REV 19 750-021773 XX0000000000 RE-SRX100H FPC 0 FPC Power Supply 0 juniper@SRX100-7_OSPF> show system boot-messages | match memory real memory = 1073741824 (1024MB) avail memory = 526438400 (502MB)
Config:
interfaces { fe-0/0/0 { vlan-tagging; unit 3 { vlan-id 3; family inet { address 3.0.0.4/24; } } unit 44 { vlan-id 44; family inet { address 44.4.4.43/24; } } unit 334 { vlan-id 334; family inet { address 3.3.4.4/24; } } unit 345 { vlan-id 345; family inet { address 3.4.5.4/24; } } } lo0 { unit 0 { family inet { address 3.3.3.4/32; } } } } routing-options { router-id 3.3.3.4; } protocols { ospf { area 0.0.0.0 { interface lo0.0 { passive; } interface fe-0/0/0.3 { metric 1000; } interface fe-0/0/0.334 { interface-type p2p; metric 10; } interface fe-0/0/0.345 { interface-type p2p; metric 10; } interface fe-0/0/0.44 { metric 100; } } } } security { zones { security-zone OSPF { host-inbound-traffic { system-services { ping; } protocols { ospf; } } interfaces { fe-0/0/0.3; fe-0/0/0.334; fe-0/0/0.345; fe-0/0/0.44; lo0.0; } } } }And this one is also running in flow mode:
juniper@SRX100-7_OSPF> show security flow status Flow forwarding mode: Inet forwarding mode: flow based Inet6 forwarding mode: drop MPLS forwarding mode: drop ISO forwarding mode: drop Advanced services data-plane memory mode: Default Flow trace status Flow tracing status: off
Impressions: This is a SRX100H running in flow mode to contrast it with the one running in packet mode.
Issues:
Where to get it: At Juniper Networks
Descripton: This is a Juniper Networks EX2200C running Junos with Junos 12.2R2.4 with PoE support and 512 MB of RAM.
{master:0} juniper@EX2200C-3> show version fpc0: -------------------------------------------------------------------------- Hostname: EX2200C-3 Model: ex2200-c-12p-2g JUNOS Base OS boot [12.2R2.4] JUNOS Base OS Software Suite [12.2R2.4] JUNOS Kernel Software Suite [12.2R2.4] JUNOS Crypto Software Suite [12.2R2.4] JUNOS Online Documentation [12.2R2.4] JUNOS Enterprise Software Suite [12.2R2.4] JUNOS Packet Forwarding Engine Enterprise Software Suite [12.2R2.4] JUNOS Routing Software Suite [12.2R2.4] JUNOS Web Management [12.2R2.4] JUNOS FIPS mode utilities [12.2R2.4] {master:0} juniper@EX2200C-3> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis XX0000000000 EX2200-C-12P-2G, POE+ Routing Engine 0 REV 05 650-036547 XX0000000000 EX2200-C-12P-2G, POE+ FPC 0 REV 05 650-036547 XX0000000000 EX2200-C-12P-2G, POE+ CPU BUILTIN BUILTIN FPC CPU PIC 0 BUILTIN BUILTIN 12x 10/100/1000 Base-T PIC 1 REV 05 650-036547 XX0000000000 2x (10/100/1000 Base-T or GE SFP) Power Supply 0 PS 180W AC {master:0} juniper@EX2200C-3> show system boot-messages | match memory real memory = 536870912 (512 MB) avail memory = 503500800 (480 MB)
Config:
interfaces { ge-0/0/8 { vlan-tagging; unit 3 { vlan-id 3; family inet { address 3.0.0.5/24; } } unit 55 { vlan-id 55; family inet { address 55.5.5.53/24; } } unit 345 { vlan-id 345; family inet { address 3.4.5.5/24; } } unit 356 { vlan-id 356; family inet { address 3.5.6.5/24; } } } } routing-options { router-id 3.3.3.5; } protocols { ospf { area 0.0.0.0 { interface lo0.0; interface ge-0/0/8.3 { metric 1000; } interface ge-0/0/8.345 { interface-type p2p; metric 10; } interface ge-0/0/8.356 { interface-type p2p; metric 10; } interface ge-0/0/8.55 { metric 100; } } } }
Impressions: This is a very capable little switch that is pretty inexpensive. I have one of these sitting below my SRX210H serving my switching needs for my home datacenter (basement consisting of a NAS, a backup server, VOIP PBX, XEN VM server, and lab connection). The coolest thing about these is that they have a USB serial console port with a USB-serial converter built right in. All you need to do is plug a USB micro cable into your laptop's USB port, and viola -- console port access! Since I run Linux and FreeBSD on my laptops, you just get a new serial port just like that! Since Junos 12.1, you can also run up to four of these little things in a virtual chassis! Just like it's bigger brother, the EX3200, this thing can operate a port just like it was a router with no need to use pseudo-routed-VLAN interfaces.
Issues:
Where to get it: At Juniper Networks
Descripton: This is a RouterBoard RB133 running MikroTik RouterOS 5.22.
[admin@RB133] > system routerboard print routerboard: yes model: 133 serial-number: 000000000000 current-firmware: 2.18 upgrade-firmware: 2.18 [admin@RB133] >
Config:
/interface ethernet set 0 arp=enabled auto-negotiation=yes bandwidth=unlimited/unlimited \ disabled=no full-duplex=yes l2mtu=1518 mac-address=00:0C:42:25:18:36 \ master-port=none mtu=1500 name=OSPF speed=100Mbps /interface bridge add admin-mac=00:00:00:00:00:00 ageing-time=5m arp=enabled auto-mac=yes \ disabled=yes forward-delay=15s max-message-age=20s mtu=1500 name=\ loopback0 priority=0x8000 protocol-mode=none transmit-hold-count=6 /interface vlan add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=CLUSTER3 \ use-service-tag=no vlan-id=3 add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=VLAN356 \ use-service-tag=no vlan-id=356 add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=VLAN367 \ use-service-tag=no vlan-id=367 add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=P2MP6 \ use-service-tag=no vlan-id=66 /interface bridge port add disabled=no edge=auto external-fdb=auto horizon=none interface=OSPF \ path-cost=10 point-to-point=auto priority=0x80 /ip address add address=3.3.3.6/32 disabled=no interface=loopback0 network=3.3.3.6 add address=3.5.6.6/24 disabled=no interface=VLAN356 network=3.5.6.0 add address=3.6.7.6/24 disabled=no interface=VLAN367 network=3.6.7.0 add address=66.6.6.63/24 disabled=no interface=P2MP6 network=66.6.6.0 add address=3.0.0.6/24 disabled=no interface=CLUSTER3 network=3.0.0.0 set OSPF disabled=no set loopback0 disabled=no set CLUSTER3 disabled=yes set VLAN356 disabled=yes set VLAN367 disabled=yes set P2MP6 disabled=yes /routing ospf area set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\ backbone type=default /routing ospf instance set [ find default=yes ] disabled=no distribute-default=never in-filter=\ ospf-in metric-bgp=200000 metric-connected=2000 metric-default=1000 \ metric-other-ospf=auto metric-rip=20000 metric-static=2000 mpls-te-area=\ backbone mpls-te-router-id=loopback0 name=default out-filter=ospf-out \ redistribute-bgp=no redistribute-connected=no redistribute-other-ospf=no \ redistribute-rip=no redistribute-static=no router-id=3.3.3.6 /routing ospf interface add authentication=none authentication-key="" authentication-key-id=1 cost=\ 1000 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \ interface=CLUSTER3 network-type=broadcast passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=\ 100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \ interface=P2MP6 network-type=broadcast passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=10 \ dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\ VLAN356 network-type=point-to-point passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no add authentication=none authentication-key="" authentication-key-id=1 cost=10 \ dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\ VLAN367 network-type=point-to-point passive=no priority=128 \ retransmit-interval=5s transmit-delay=1s use-bfd=no /routing ospf network add area=backbone disabled=no network=3.3.3.6/32 add area=backbone disabled=no network=3.0.0.0/24 add area=backbone disabled=no network=3.5.6.0/24 add area=backbone disabled=no network=3.6.7.0/24 add area=backbone disabled=no network=66.6.6.0/24
Impressions: Another really cheap, but really fun and capable little router. Plus, this one has a serial port for easy first time and emergency access!
Issues:
Where to get it: These aren't made any more. You may be able to still find them online or at eBay.
Descripton: This is a Cisco 1760 running IOS 12.3(4) on the Advanced Enterprise train with 96 MB of RAM.
C1760-1>sh ver Cisco IOS Software, C1700 Software (C1700-ADVENTERPRISEK9-M), Version 12.3(4)XG5, RELEASE SOFTWARE (fc1) Synched to technology version 12.3(5.7)T Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2005 by Cisco Systems, Inc. Compiled Tue 04-Oct-05 00:08 by ealyon ROM: System Bootstrap, Version 12.2(4r)XL, RELEASE SOFTWARE (fc1) ROM: C1760-1 uptime is 34 minutes System returned to ROM by power-on System image file is "flash:c1700-adventerprisek9-mz.123-4.XG5.bin" Cisco 1760 (MPC860P) processor (revision 0x200) with 83559K/14745K bytes of memory. Processor board ID XX000000000 (526361396), with hardware revision BB67 MPC860P processor: part number 5, mask 2 1 FastEthernet interface 1 Virtual Private Network (VPN) Module 32K bytes of NVRAM. 32768K bytes of processor board System flash (Read/Write) Configuration register is 0x2102
Config:
! interface Loopback0 ip address 3.3.3.7 255.255.255.255 ! interface FastEthernet0/0 no ip address speed auto ! interface FastEthernet0/0.3 encapsulation dot1Q 3 ip address 3.0.0.7 255.255.255.0 ip ospf cost 1000 ip ospf priority 128 ! interface FastEthernet0/0.77 encapsulation dot1Q 77 ip address 77.7.7.73 255.255.255.0 ip mtu 1496 ip ospf network non-broadcast ip ospf cost 100 ip ospf priority 128 ! interface FastEthernet0/0.317 encapsulation dot1Q 317 ip address 3.1.7.7 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ! interface FastEthernet0/0.367 encapsulation dot1Q 367 ip address 3.6.7.7 255.255.255.0 ip ospf network point-to-point ip ospf cost 10 ip ospf priority 128 ! router ospf 1 router-id 3.3.3.7 log-adjacency-changes passive-interface Loopback0 network 3.0.0.7 0.0.0.0 area 0.0.0.0 network 3.1.7.7 0.0.0.0 area 0.0.0.0 network 3.3.3.7 0.0.0.0 area 0.0.0.0 network 3.6.7.7 0.0.0.0 area 0.0.0.0 network 77.7.7.73 0.0.0.0 area 0.0.0.0 neighbor 77.7.7.71 cost 100 neighbor 77.7.7.72 cost 100 !
Impressions: I originally picked up this box off eBay to play with some voice stuff. My first impression whas the whopping 1 builtin Fast Ethernet port, and only WIC slots usable for network interfaces (the others are for voice modules), was that this router wouldn't get much lab use outside of playing with VoIP. However, for some reason, this thing became one of my favorite boxes to use as a CE router. For a Cisco branch router, it boots pretty quickly, it's quiet, and has a pretty quick processor so there isn't much delay in anything.
Issues:
Where to get it: Again, at eBay because this one is EoL, EoS.
exaBGP is initally used to announce routes via a BGP session to the Olive box that has 2GB of RAM for redistribution into OSPF.
exaBGP configuration is very Junos-like, and has a cool feature that it can run a script from within the config file, that generates more config. I used this feature to announce routes at the rate of 10 per second. This is done with the process service-1
stanza that calls the dyn.sh
script that loops continuously, calling a python script that generates a random IPv4 prefix in CIDR style. Junos has a builtin default martian filter that will simply refuse to install any martian routes, so peering with the Olive we don't need worry about odd routes in our setup. You can view the martain table in Junos with the show route martians
table. The default martian table for IPv4 routes in Junos is:
juniper@SRX210> show route martians table inet.0 inet.0: 0.0.0.0/0 exact -- allowed 0.0.0.0/8 orlonger -- disallowed 127.0.0.0/8 orlonger -- disallowed 192.0.0.0/24 orlonger -- disallowed 240.0.0.0/4 orlonger -- disallowed 224.0.0.0/4 exact -- disallowed 224.0.0.0/24 exact -- disallowed juniper@SRX210>Here is the exaBGP config file that is used to peer with the Olive above:
neighbor 172.20.1.66 { description "Olive2GB on kvm1"; router-id 66.66.66.66; local-address 172.20.10.117; local-as 65069; peer-as 65066; # advertise a bunch of bogus prefixes process service-1 { run /etc/exabgp/processes/dyn.sh; } }The
dyn.sh
script:
#!/bin/sh # ignore Control C # if the user ^C exabgp we will get that signal too, ignore it and let exabgp send us a SIGTERM trap '' SIGINT while `true`; do echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0" sleep 1 doneAnd the python script:
bs-route.py
#!/usr/bin/python # # BS Route Generator - CIDR style # # This just prints out a random network from the IPv4 address space and makes no attempt to filter bogons or martians # # Blackhole Networks # # version = 0.1 import random, sys # Get a random 32 bit number for our IP address and a random length for our netmask ip = random.randint(0,4294967295) mask = random.randint(0,31) # Convert our mask length into a 32 bit number submask = 0 for i in range (mask): submask=submask+2**(31-i) # Get the network ID by performing a bitwise and with our IP address and our subnet mask network = ip & submask # print the network ID out in dotted quad notation # do this by masking and bit shifting oct1 = (network & 0b11111111000000000000000000000000) >> 24 oct2 = (network & 0b00000000111111110000000000000000) >> 16 oct3 = (network & 0b00000000000000001111111100000000) >> 8 oct4 = (network & 0b00000000000000000000000011111111) >> 0 # Print out the network in CIDR notation sys.stdout.write("%s.%s.%s.%s/%s\n" % ( oct1, oct2, oct3, oct4, mask))
This is run simply with: exabgp /etc/exabgp/exabgp.conf
. exaBGP does not listen for any connections on port 179, so you don't need to be root to run it.
vyatta@vyatta:~$ show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL 1.1.1.2 128 2-Way/DROther 30.516s 1.0.0.2 eth1.1:1.0.0.1 0 0 0 1.1.1.3 128 2-Way/DROther 32.628s 1.0.0.3 eth1.1:1.0.0.1 0 0 0 1.1.1.4 128 Full/Backup 39.920s 1.0.0.4 eth1.1:1.0.0.1 0 0 0 1.1.1.5 128 Full/DR 35.919s 1.0.0.5 eth1.1:1.0.0.1 0 0 0 1.1.1.6 128 2-Way/DROther 36.072s 1.0.0.6 eth1.1:1.0.0.1 0 0 0 1.1.1.7 128 2-Way/DROther 38.611s 1.0.0.7 eth1.1:1.0.0.1 0 0 0 2.2.2.1 128 Full/Backup 38.818s 11.1.1.21 eth1.11:11.1.1.11 0 0 0 3.3.3.1 128 Full/DR 38.101s 11.1.1.31 eth1.11:11.1.1.11 0 0 0 1.1.1.2 250 Full/Backup 35.665s 1.1.2.2 eth1.112:1.1.2.1 0 0 0 1.1.1.7 128 Full/Backup 38.357s 1.1.7.7 eth1.117:1.1.7.1 5630 0 0
# ospfctl show neighbor ID Pri State DeadTime Address Iface Uptime 1.1.1.3 128 FULL/BCKUP 00:00:35 1.2.3.3 vlan123 00:02:21 1.1.1.1 128 FULL/DR 00:00:30 1.1.2.1 vlan112 00:46:09 2.2.2.2 128 FULL/DR 00:00:34 22.2.2.22 vlan22 00:45:31 3.3.3.2 128 FULL/BCKUP 00:00:37 22.2.2.23 vlan22 00:45:32 1.1.1.6 128 2-WAY/OTHER 00:00:37 1.0.0.6 vlan1 - 1.1.1.3 128 2-WAY/OTHER 00:00:32 1.0.0.3 vlan1 - 1.1.1.7 128 2-WAY/OTHER 00:00:35 1.0.0.7 vlan1 - 1.1.1.5 128 FULL/DR 00:00:31 1.0.0.5 vlan1 00:45:59 1.1.1.1 128 2-WAY/OTHER 00:00:30 1.0.0.1 vlan1 - 1.1.1.4 128 FULL/BCKUP 00:00:37 1.0.0.4 vlan1 00:14:15
juniper@Olive2GB> show ospf neighbor Address Interface State ID Pri Dead 1.0.0.5 fxp1.1 Full 1.1.1.5 128 38 1.0.0.1 fxp1.1 2Way 1.1.1.1 128 38 1.0.0.2 fxp1.1 2Way 1.1.1.2 128 36 1.0.0.4 fxp1.1 Full 1.1.1.4 128 35 1.0.0.6 fxp1.1 2Way 1.1.1.6 128 35 1.0.0.7 fxp1.1 2Way 1.1.1.7 128 33 1.2.3.2 fxp1.123 Full 1.1.1.2 128 37 1.3.4.4 fxp1.134 Full 1.1.1.4 1 35 33.3.3.33 fxp1.33 ExStart 3.3.3.3 0 116 33.3.3.32 fxp1.33 Full 2.2.2.3 255 100
bird> show ospf neighbors OSPFol: Router ID Pri State DTime Interface Router IP 1.1.1.5 128 full/dr 00:37 eth0.0001 1.0.0.5 1.1.1.1 128 full/other 00:36 eth0.0001 1.0.0.1 1.1.1.2 128 full/other 00:35 eth0.0001 1.0.0.2 1.1.1.7 128 full/other 00:40 eth0.0001 1.0.0.7 1.1.1.6 128 full/other 00:33 eth0.0001 1.0.0.6 1.1.1.3 128 full/other 00:40 eth0.0001 1.0.0.3 2.2.2.4 128 full/bdr 00:39 eth0.0044 44.4.4.42 3.3.3.4 128 full/dr 00:38 eth0.0044 44.4.4.43 1.1.1.3 128 full/bdr 00:39 eth0.0134 1.3.4.3 1.1.1.5 128 full/ptp 00:37 eth0.0145 1.4.5.5
quagga-router# sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL 1.1.1.1 128 Full/DROther 36.174s 1.0.0.1 eth0.0001:1.0.0.5 0 0 0 1.1.1.2 128 Full/DROther 34.744s 1.0.0.2 eth0.0001:1.0.0.5 0 0 0 1.1.1.3 128 Full/DROther 36.529s 1.0.0.3 eth0.0001:1.0.0.5 0 0 0 1.1.1.4 128 Full/Backup 32.899s 1.0.0.4 eth0.0001:1.0.0.5 0 0 0 1.1.1.6 128 Full/DROther 33.112s 1.0.0.6 eth0.0001:1.0.0.5 0 0 0 1.1.1.7 128 Full/DROther 32.252s 1.0.0.7 eth0.0001:1.0.0.5 0 0 0 1.1.1.4 1 Full/DROther 32.899s 1.4.5.4 eth0.0145:1.4.5.5 0 0 0 1.1.1.6 128 Full/Backup 33.116s 1.5.6.6 eth0.0156:1.5.6.5 0 0 0 2.2.2.5 128 Full/DROther 32.607s 55.5.5.52 eth0.0055:55.5.5.5 1 0 0 0 3.3.3.5 128 Full/Backup 33.022s 55.5.5.53 eth0.0055:55.5.5.5 1 0 0 0
router@xorp> show ospf4 neighbor Address Interface State ID Pri Dead 1.0.0.4 eth0/eth0 Full 1.1.1.4 128 31 1.0.0.7 eth0/eth0 TwoWay 1.1.1.7 128 36 1.0.0.2 eth0/eth0 TwoWay 1.1.1.2 128 33 1.0.0.1 eth0/eth0 TwoWay 1.1.1.1 128 34 1.0.0.5 eth0/eth0 Full 1.1.1.5 128 35 1.0.0.3 eth0/eth0 TwoWay 1.1.1.3 128 30 66.6.6.62 eth1/eth1 Full 2.2.2.6 128 38 66.6.6.63 eth1/eth1 Full 3.3.3.6 128 30 1.5.6.5 eth2/eth2 Full 1.1.1.5 128 35 1.6.7.7 eth3/eth3 Full 1.1.1.7 128 36
juniper@Olive1GB> show ospf neighbor Address Interface State ID Pri Dead 1.0.0.3 fxp1.1 2Way 1.1.1.3 128 38 1.0.0.6 fxp1.1 2Way 1.1.1.6 128 31 1.0.0.4 fxp1.1 Full 1.1.1.4 128 31 1.0.0.2 fxp1.1 2Way 1.1.1.2 128 32 1.0.0.5 fxp1.1 Full 1.1.1.5 128 34 1.0.0.1 fxp1.1 2Way 1.1.1.1 128 34 1.1.7.1 fxp1.117 Full 1.1.1.1 128 34 1.6.7.6 fxp1.167 Full 1.1.1.6 128 31 77.7.7.72 fxp1.77 Full 2.2.2.7 128 94 77.7.7.73 fxp1.77 Full 3.3.3.7 128 101
juniper@EX3200-2_OSPF> show ospf neighbor Address Interface State ID Pri Dead 11.1.1.31 ge-0/0/1.11 Full 3.3.3.1 128 36 11.1.1.11 ge-0/0/1.11 Full 1.1.1.1 128 30 2.0.0.5 ge-0/0/1.2 Full 2.2.2.5 128 34 2.0.0.6 ge-0/0/1.2 Full 2.2.2.6 128 33 2.0.0.3 ge-0/0/1.2 2Way 2.2.2.3 128 34 2.0.0.7 ge-0/0/1.2 2Way 2.2.2.7 128 37 2.0.0.4 ge-0/0/1.2 2Way 2.2.2.4 128 35 2.0.0.2 ge-0/0/1.2 2Way 2.2.2.2 128 35 2.1.2.2 ge-0/0/1.212 Full 2.2.2.2 128 34 2.1.7.7 ge-0/0/1.217 Full 2.2.2.7 128 37
juniper@SRX100-6_OSPF> show ospf neighbor Address Interface State ID Pri Dead 2.0.0.5 fe-0/0/0.2 Full 2.2.2.5 128 39 2.0.0.6 fe-0/0/0.2 Full 2.2.2.6 128 30 2.0.0.3 fe-0/0/0.2 2Way 2.2.2.3 128 39 2.0.0.7 fe-0/0/0.2 2Way 2.2.2.7 128 33 2.0.0.4 fe-0/0/0.2 2Way 2.2.2.4 128 38 2.0.0.1 fe-0/0/0.2 2Way 2.2.2.1 128 38 2.1.2.1 fe-0/0/0.212 Full 2.2.2.1 128 31 22.2.2.21 fe-0/0/0.22 Full 1.1.1.2 128 36 22.2.2.23 fe-0/0/0.22 Full 3.3.3.2 128 31 2.2.3.3 fe-0/0/0.223 Full 2.2.2.3 128 39
C3640-1#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 3.3.3.3 0 FULL/ - 00:01:36 33.3.3.33 FastEthernet0/0.33 1.1.1.3 0 FULL/ - 00:01:59 33.3.3.31 FastEthernet0/0.33 2.2.2.4 0 FULL/ - 00:00:35 2.3.4.4 FastEthernet0/0.234 2.2.2.2 0 FULL/ - 00:00:37 2.2.3.2 FastEthernet0/0.223 2.2.2.1 128 2WAY/DROTHER 00:00:33 2.0.0.1 FastEthernet0/0.2 2.2.2.2 128 2WAY/DROTHER 00:00:35 2.0.0.2 FastEthernet0/0.2 2.2.2.4 128 2WAY/DROTHER 00:00:34 2.0.0.4 FastEthernet0/0.2 2.2.2.5 128 FULL/BDR 00:00:32 2.0.0.5 FastEthernet0/0.2 2.2.2.6 128 FULL/DR 00:00:37 2.0.0.6 FastEthernet0/0.2 2.2.2.7 128 2WAY/DROTHER 00:00:35 2.0.0.7 FastEthernet0/0.2
juniper@SRX210HE_OSPF> show ospf neighbor Address Interface State ID Pri Dead 2.0.0.5 ge-0/0/0.2 Full 2.2.2.5 128 35 2.0.0.7 ge-0/0/0.2 2Way 2.2.2.7 128 38 2.0.0.6 ge-0/0/0.2 Full 2.2.2.6 128 33 2.0.0.3 ge-0/0/0.2 2Way 2.2.2.3 128 34 2.0.0.2 ge-0/0/0.2 2Way 2.2.2.2 128 36 2.0.0.1 ge-0/0/0.2 2Way 2.2.2.1 128 33 2.3.4.3 ge-0/0/0.234 Full 2.2.2.3 128 34 2.4.5.5 ge-0/0/0.245 Full 2.2.2.5 128 34 44.4.4.41 ge-0/0/0.44 Full 1.1.1.4 128 38 44.4.4.43 ge-0/0/0.44 Full 3.3.3.4 128 37
ns208-> get vrouter OSPF protocol ospf neighbor VR: OSPF RouterId: 2.2.2.5 ---------------------------------- Neighbor(s) on interface ethernet2.7 (Area 0.0.0.0) IpAddr/IfIndex RouterId Pri State Opt Up StateChg ------------------------------------------------------------------------------ 55.5.5.51 1.1.1.5 128 Full E 11:53:12 (+7 -0) 55.5.5.53 3.3.3.5 128 Full E 11:53:12 (+7 -0) Neighbor(s) on interface loopback.2 (Area 0.0.0.0) Neighbor(s) on interface ethernet2.5 (Area 0.0.0.0) IpAddr/IfIndex RouterId Pri State Opt Up StateChg ------------------------------------------------------------------------------ 2.5.6.6 2.2.2.6 128 Full E 12:27:21 (+8 -1) Neighbor(s) on interface ethernet2.4 (Area 0.0.0.0) IpAddr/IfIndex RouterId Pri State Opt Up StateChg ------------------------------------------------------------------------------ 2.4.5.4 2.2.2.4 128 Full E 12:27:13 (+8 -1) Neighbor(s) on interface ethernet2.2 (Area 0.0.0.0) IpAddr/IfIndex RouterId Pri State Opt Up StateChg ------------------------------------------------------------------------------ 2.0.0.7 2.2.2.7 128 Full E 01:47:03 (+11 -1) 2.0.0.2 2.2.2.2 128 Full E 01:47:23 (+7 -0) 2.0.0.4 2.2.2.4 128 Full E 01:47:23 (+7 -0) 2.0.0.1 2.2.2.1 128 Full E 01:47:23 (+7 -0) 2.0.0.3 2.2.2.3 128 Full E 01:47:23 (+7 -0) 2.0.0.6 2.2.2.6 128 Full E 12:27:22 (+7 -0)
C2811-1#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 2.2.2.7 0 FULL/ - 00:00:30 2.6.7.7 FastEthernet0/0.267 2.2.2.5 0 FULL/ - 00:00:36 2.5.6.5 FastEthernet0/0.256 1.1.1.6 128 FULL/DROTHER 00:00:39 66.6.6.61 FastEthernet0/0.66 3.3.3.6 128 FULL/BDR 00:00:39 66.6.6.63 FastEthernet0/0.66 2.2.2.1 128 FULL/DROTHER 00:00:39 2.0.0.1 FastEthernet0/0.2 2.2.2.2 128 FULL/DROTHER 00:00:35 2.0.0.2 FastEthernet0/0.2 2.2.2.3 128 FULL/DROTHER 00:00:36 2.0.0.3 FastEthernet0/0.2 2.2.2.4 128 FULL/DROTHER 00:00:37 2.0.0.4 FastEthernet0/0.2 2.2.2.5 128 FULL/BDR 00:00:36 2.0.0.5 FastEthernet0/0.2 2.2.2.7 128 FULL/DROTHER 00:00:30 2.0.0.7 FastEthernet0/0.2
[admin@RB750G] > /routing ospf neighbor print 0 instance=default router-id=2.2.2.6 address=2.6.7.6 interface=VLAN267 priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h48m41s 1 instance=default router-id=2.2.2.3 address=2.0.0.3 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 2 instance=default router-id=2.2.2.5 address=2.0.0.5 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h48m43s 3 instance=default router-id=2.2.2.1 address=2.0.0.1 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 4 instance=default router-id=2.2.2.6 address=2.0.0.6 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h48m43s 5 instance=default router-id=2.2.2.2 address=2.0.0.2 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 6 instance=default router-id=2.2.2.4 address=2.0.0.4 interface=CLUSTER2 priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 7 instance=default router-id=2.2.2.1 address=2.1.7.1 interface=VLAN217 priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h48m49s 8 instance=default router-id=3.3.3.7 address=77.7.7.73 interface=P2MP7 priority=128 dr-address=77.7.7.73 backup-dr-address=77.7.7.72 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h48m26s 9 instance=default router-id=1.1.1.7 address=77.7.7.71 interface=P2MP7 priority=128 dr-address=77.7.7.73 backup-dr-address=77.7.7.72 state="Full" state-changes=12 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=48m59s
juniper@SRX100-5_OSPF> show ospf neighbor Address Interface State ID Pri Dead 11.1.1.11 fe-0/0/1.11 Full 1.1.1.1 128 31 11.1.1.21 fe-0/0/1.11 Full 2.2.2.1 128 38 3.0.0.3 fe-0/0/1.3 2Way 3.3.3.3 128 39 3.0.0.5 fe-0/0/1.3 2Way 3.3.3.5 128 34 3.0.0.7 fe-0/0/1.3 Full 3.3.3.7 128 38 3.0.0.2 fe-0/0/1.3 2Way 3.3.3.2 128 34 3.0.0.6 fe-0/0/1.3 Full 3.3.3.6 128 37 3.0.0.4 fe-0/0/1.3 2Way 3.3.3.4 128 39 3.1.2.2 fe-0/0/1.312 Full 3.3.3.2 128 35 3.1.7.7 fe-0/0/1.317 Full 3.3.3.7 128 38
juniper@J2300-7> show ospf neighbor Address Interface State ID Pri Dead 22.2.2.21 fe-0/0/0.22 Full 1.1.1.2 128 31 22.2.2.22 fe-0/0/0.22 Full 2.2.2.2 128 33 3.0.0.5 fe-0/0/0.3 2Way 3.3.3.5 128 31 3.0.0.7 fe-0/0/0.3 Full 3.3.3.7 128 30 3.0.0.3 fe-0/0/0.3 2Way 3.3.3.3 128 33 3.0.0.1 fe-0/0/0.3 2Way 3.3.3.1 128 33 3.0.0.4 fe-0/0/0.3 2Way 3.3.3.4 128 34 3.0.0.6 fe-0/0/0.3 Full 3.3.3.6 128 30 3.1.2.1 fe-0/0/0.312 Full 3.3.3.1 128 38 3.2.3.3 fe-0/0/0.323 Full 3.3.3.3 128 32
Adjacency to 1.1.1.3 is down due to the MTU mismatch discussed ealier
C3750-1#show ip ospf nei Neighbor ID Pri State Dead Time Address Interface 3.3.3.4 0 FULL/ - 00:00:39 3.3.4.4 Vlan334 3.3.3.2 0 FULL/ - 00:00:37 3.2.3.2 Vlan323 1.1.1.3 0 DOWN/ - - 33.3.3.31 Vlan33 2.2.2.3 0 FULL/ - 00:01:45 33.3.3.32 Vlan33 3.3.3.1 128 2WAY/DROTHER 00:00:32 3.0.0.1 Vlan3 3.3.3.2 128 2WAY/DROTHER 00:00:37 3.0.0.2 Vlan3 3.3.3.4 128 2WAY/DROTHER 00:00:35 3.0.0.4 Vlan3 3.3.3.5 128 2WAY/DROTHER 00:00:38 3.0.0.5 Vlan3 3.3.3.6 128 FULL/BDR 00:00:38 3.0.0.6 Vlan3 3.3.3.7 128 FULL/DR 00:00:39 3.0.0.7 Vlan3
juniper@SRX100-7_OSPF> show ospf neighbor Address Interface State ID Pri Dead 3.0.0.3 fe-0/0/0.3 2Way 3.3.3.3 128 39 3.0.0.5 fe-0/0/0.3 2Way 3.3.3.5 128 39 3.0.0.7 fe-0/0/0.3 Full 3.3.3.7 128 33 3.0.0.2 fe-0/0/0.3 2Way 3.3.3.2 128 32 3.0.0.6 fe-0/0/0.3 Full 3.3.3.6 128 31 3.0.0.1 fe-0/0/0.3 2Way 3.3.3.1 128 37 3.3.4.3 fe-0/0/0.334 Full 3.3.3.3 128 35 3.4.5.5 fe-0/0/0.345 Full 3.3.3.5 128 38 44.4.4.41 fe-0/0/0.44 Full 1.1.1.4 128 32 44.4.4.42 fe-0/0/0.44 Full 2.2.2.4 128 35
{master:0} copek@EX2200C-3> show ospf neighbor Address Interface State ID Pri Dead 3.0.0.3 ge-0/0/8.3 2Way 3.3.3.3 128 31 3.0.0.6 ge-0/0/8.3 Full 3.3.3.6 128 39 3.0.0.7 ge-0/0/8.3 Full 3.3.3.7 128 31 3.0.0.2 ge-0/0/8.3 2Way 3.3.3.2 128 33 3.0.0.1 ge-0/0/8.3 2Way 3.3.3.1 128 36 3.0.0.4 ge-0/0/8.3 2Way 3.3.3.4 128 38 3.4.5.4 ge-0/0/8.345 Full 3.3.3.4 128 39 3.5.6.6 ge-0/0/8.356 Full 3.3.3.6 128 39 55.5.5.52 ge-0/0/8.55 Full 2.2.2.5 128 39 55.5.5.51 ge-0/0/8.55 Full 1.1.1.5 128 38
[admin@RB133] > /routing ospf neighbor print 0 instance=default router-id=3.3.3.3 address=3.0.0.3 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=41m10s 1 instance=default router-id=1.1.1.6 address=66.6.6.61 interface=P2MP6 priority=128 dr-address=66.6.6.62 backup-dr-address=66.6.6.63 state="Full" state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=58m15s 2 instance=default router-id=3.3.3.7 address=3.0.0.7 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h57m7s 3 instance=default router-id=3.3.3.7 address=3.6.7.7 interface=VLAN367 priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h57m7s 4 instance=default router-id=3.3.3.5 address=3.5.6.5 interface=VLAN356 priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h57m8s 5 instance=default router-id=3.3.3.1 address=3.0.0.1 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h56m58s 6 instance=default router-id=3.3.3.4 address=3.0.0.4 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=8 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h56m58s 7 instance=default router-id=3.3.3.5 address=3.0.0.5 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h56m58s 8 instance=default router-id=3.3.3.2 address=3.0.0.2 interface=CLUSTER3 priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h56m58s 9 instance=default router-id=2.2.2.6 address=66.6.6.62 interface=P2MP6 priority=128 dr-address=66.6.6.62 backup-dr-address=66.6.6.63 state="Full" state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 adjacency=1h57m16s
C1760-1#sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 2.2.2.7 128 FULL/BDR 00:01:38 77.7.7.72 FastEthernet0/0.77 1.1.1.7 128 FULL/DROTHER 00:01:53 77.7.7.71 FastEthernet0/0.77 3.3.3.6 0 FULL/ - 00:00:31 3.6.7.6 FastEthernet0/0.367 3.3.3.1 0 FULL/ - 00:00:34 3.1.7.1 FastEthernet0/0.317 3.3.3.1 128 FULL/DROTHER 00:00:33 3.0.0.1 FastEthernet0/0.3 3.3.3.2 128 FULL/DROTHER 00:00:34 3.0.0.2 FastEthernet0/0.3 3.3.3.3 128 FULL/DROTHER 00:00:38 3.0.0.3 FastEthernet0/0.3 3.3.3.4 128 FULL/DROTHER 00:00:34 3.0.0.4 FastEthernet0/0.3 3.3.3.5 128 FULL/DROTHER 00:00:39 3.0.0.5 FastEthernet0/0.3 3.3.3.6 128 FULL/BDR 00:00:31 3.0.0.6 FastEthernet0/0.3
exaBGP was started from an "off-net" Linux box, and set up to peer with the Olive with the most memory. The script sent 10 prefixes to the Olive every second as an IPv4 unicast BGP route. The olive had to parse them out, and then re-advertise them as an OSPF external route which it did as soon as it was determined to be a valid route and was deemed suitable for readvertisement: exaBGP was run below:
user@linux-box:~$ exabgp /etc/exabgp/exabgp.conf Sun, 06 Jan 2013 00:22:17 INFO 11392 configuration Performing reload of exabgp 1.3.4 Sun, 06 Jan 2013 00:22:17 INFO 11392 supervisor New Peer 172.20.1.66 Sun, 06 Jan 2013 00:22:17 INFO 11392 configuration Loaded new configuration successfully Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Forked process service-1 trap: SIGINT: bad trap Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 113.252.0.0/17 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 223.124.11.116/30 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 148.217.8.0/21 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 245.229.64.0/19 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 128.0.0.0/6 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 supervisor Performing dynamic route update Sun, 06 Jan 2013 00:22:17 INFO 11392 supervisor Updated peers dynamic routes successfully Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> OPEN version=4 asn=65069 hold_time=180 router_id=66.66.66.66 capabilities=[Multiprotocol for IPv4 unicast IPv6 unicast IPv4 flow-ipv4, 4Bytes AS 65069] Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 99.67.40.0/21 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 147.20.72.0/22 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 82.156.0.0/17 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 224.0.0.0/9 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 processes Command from process service-1 : announce route 85.241.220.0/22 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:17 INFO 11392 supervisor Performing dynamic route update Sun, 06 Jan 2013 00:22:17 INFO 11392 supervisor Updated peers dynamic routes successfully Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 << OPEN version=4 asn=65066 hold_time=90 router_id=1.1.1.3 capabilities=[Cisco Route Refresh, Multiprotocol for IPv4 unicast, Route Refresh, Graceful Restart, 4Bytes AS 65066] Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> KEEPALIVE (OPENCONFIRM) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 << KEEPALIVE (ESTABLISHED) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (update) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> 10 UPDATE(s) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> KEEPALIVE (no more UPDATE and no EOR) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (0) Sun, 06 Jan 2013 00:22:17 INFO 11392 message Peer 172.20.1.66 ASN 65066 << KEEPALIVE Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 220.222.228.0/23 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 65.88.32.0/19 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 152.0.0.0/6 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 37.200.117.128/30 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 156.0.0.0/7 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 207.149.60.16/31 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 75.128.0.0/9 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 154.45.172.0/22 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 172.0.0.0/8 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 supervisor Performing dynamic route update Sun, 06 Jan 2013 00:22:18 INFO 11392 supervisor Updated peers dynamic routes successfully Sun, 06 Jan 2013 00:22:18 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (9) Sun, 06 Jan 2013 00:22:18 INFO 11392 processes Command from process service-1 : announce route 96.252.114.0/26 next-hop 10.0.0.0 Sun, 06 Jan 2013 00:22:18 INFO 11392 supervisor Performing dynamic route update Sun, 06 Jan 2013 00:22:18 INFO 11392 supervisor Updated peers dynamic routes successfully Sun, 06 Jan 2013 00:22:18 INFO 11392 message Peer 172.20.1.66 ASN 65066 >> UPDATE (1)
Everything was going well, but after a while (anywhere from 4k to 40k prefixes), the Olives would loose all of their neighbors:
juniper@Olive1GB> show ospf neighbor Address Interface State ID Pri Dead 77.7.7.72 fxp1.77 Down 0.0.0.0 0 0 77.7.7.73 fxp1.77 Down 3.3.3.7 128 0
I had to restart the Olive, or even the whole VM instance to get the Olives to re-form any adjacencies. After a while, I narrowed this down to the emulated network card -- an Intel 10/100 NIC. After a while, the emulated NIC would just stop processing traffic. I tried three different models supported by QEMU-kvm, and althgouth the i82557b model seemed to work the best, it would still go to sleep after about 40k routes. So I swung the BGP into OSPF export duties over to the J2300, and it seemed to perform very well.
At first everything was fine, until the network had about 2,000 external LSAs floating around. Then the Cisco 3750 started to complain:
*Mar 1 10:39:41.477: %PLATFORM_UCAST-6-PREFIX: One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefixThen it began to complain a lot more:
*Mar 1 10:41:00.799: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x295E598, alignment 0 Pool: Processor Free: 56772 Cause: Not enough free memory Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "IP RIB Update", ipl= 0, pid= 251 -Traceback= 1EA47C0 1EA4F0C 294A8A0 294CDC0 294D0D0 295E59C 295F1D8 149B6D0 13E0CAC 1FF7A48 1F9F724 1FB98D4 1FB9A5C 1FA4310 1FA4438 1FA4870 *Mar 1 10:41:00.799: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platf --More-- orm IPv4 Fib malloc failed (fatal) (0 subsequent failures). *Mar 1 10:41:00.799: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error. *Mar 1 10:41:00.908: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (0 subsequent failures). *Mar 1 10:41:00.908: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error. *Mar 1 10:41:00.950: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib --More-- malloc failed (fatal) (5 subsequent failures). *Mar 1 10:41:00.950: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error. *Mar 1 10:41:01.026: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (0 subsequent failures). *Mar 1 10:41:01.034: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error. *Mar 1 10:41:22.492: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]"At that point, the C3750 just complained about having to disable CEF for awhile as it ignored LSAs. Then it decided that due to too many requests from it's neighbors for LSA acknowlgedments that it would just drop off the network:
*Mar 1 10:41:25.889: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions --More-- *Mar 1 10:42:22.521: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:42:25.893: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expired --More-- *Mar 1 10:43:22.549: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:44:22.578: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:44:38.541: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions --More-- *Mar 1 10:45:22.606: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:45:38.544: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expired --More-- *Mar 1 10:46:22.634: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:47:22.663: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:48:01.217: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions --More-- *Mar 1 10:48:22.691: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" --More-- *Mar 1 10:49:01.220: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expiredThis sequence repeated ad-infinum. However, though the console the box was responsive. Measures to stop the onslaught could have been taken to protect the switch.
Not long thereafer, at 4K external LSAs the two Mikrotik routers seemed to start to work a bit harder. The CPU on the RB750G going at about 5%, and the poor little RB133 was runinng at about 20%. Just shy of 8,000 external LSAs the EX2200C made a complaint: Jan 6 13:04:10 EX2200C-3 rpd[1075]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (8000) in table inet.0 still exceeds or equals configured maximum (8000)
. Despite it's bitching, it contined to process LSAs just fine. Then the first real death happned as I lost the telnet connection to the RB750G. The RB750G was totally unresponsive and I could not even ping it. I power cycled it, and it came back a while with an ICMP echo-reply, but it never let me log in again. It would reply to my pings for a few seconds, and then it was gone again.
At just shy of 2^14 (16K) LSAs, the EX3200 voiced a complaint:
Jan 6 12:11:13 EX3200-2_OSPF rpd[1081]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (16384) in table inet.0 still exceeds or equals configured maximum (16384)
At 20k LSAs, the Cisco 1760 entered an infinte reboot cycle with the following cry for help:
*Mar 5 12:27:37.998: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 357580 Cause: Memory fragmentation Alternate Pool: I/O Free: 108940 Cause: Memory fragmentation -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 811A5B38 811A5D94 811A7610 81434CF8 8143CB18 8143CE78 8143CBB0 8143CC54 8143F950 8143FBB8 81400064 80561688 80565EFC %Software-forced reload Unexpected exception to CPUvector 700, PC = 8055FCCC -Traceback= 8055FCCC 80020D78 8141356C 8142DA8C 813FFFDC 80561688 80565EFC Writing crashinfo to flash:crashinfo_20020305-122750 === Flushing messages (12:27:50 UTC Tue Mar 5 2002) === Queued messages: *Mar 5 12:27:50.911: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output. *Mar 5 12:27:50.879: %SYS-2-CHUNKNOROOT: Root chunk need to be specified for 8380FC48 -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 80020D70 8141356C 8142DA8C 813FFFDC 80561688 80565EFC *** System received a Software forced crash *** signal= 0x17, code= 0x700, context= 0x8328be48 PC = 0x8055fccc, Vector = 0x700, SP = 0x837f4370 System Bootstrap, Version 12.2(4r)XL, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 2001 by cisco Systems, Inc. C1700 platform with 98304 Kbytes of main memory with 115K -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 814134C8 8142DA8C 813FFFDC 80561688 80565EFC %% Low on memory; try again later %% Low on memory; try again later *Mar 5 14:56:15.041: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.1 on FastEthernet0/0.317 from EXSTART to DOWN, Neighbor Down: Too many retransmissions *Mar 5 14:56:17.036: %SYS-2-MALLOCFAIL: Memory allocation of 20000 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 294980 Cause: Memory fragmentation Alternate Pool: I/O Free: 52 Cause: Not enough free memory -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 814134C8 8142DA8C 813FFFDC 80561688 80565EFC
The Cisco 1760 would come back to life, proceed to restuff it's LSDB over the top, and shriek the same cry of pain. If you were quick, you could have saved it from it's agony from the console port.
At 22K routes, the RB133's CPU pegged at 100%. The cli through the serial port was completely unresponsive.
[admin@RB133] > /system resource monitor cpu-used: 99 cpu-used-per-cpu: 99% free-memory: 6624 -- [Q quit|D dump|C-z pause]I'd hit enter, and minutes later I might get another carriage return. After a long while the message:
action timed out - try again, if error continues contact MikroTik support and send a supout file (13)
appeared in the midst of my repeated bashing of the "Enter" key. There was no way to save the RB133, it was too slow to do anything.
Then the Netscreen 208 went, displaying by far the bloodiest message. I deleted about three quarters of the contents for brevity. As soon as it came back on the network, it would repeat it's cycle of agony.
ns208-> timer is NULL during creation timer is NULL during creation timer handler memory allocation failed timer handler memory allocation failed frag 7128230: bad pointer 07128230, task unknown (517) =0000000007128230: 00 00 00 00 00 04 22 05 15 a0 00 00 03 03 03 02 ......". ........ 07128240: 80 00 00 01 95 db 00 24 ff ff 80 00 80 00 00 00 .......$ ........ 07128280: 00 04 22 05 95 df ff ff 03 03 03 02 80 00 00 01 .."..... ........ **** overwriting suspect: 7128130 -- start 0x07128130 0x07127500 0x07128230 -- used block, allocation trace: 004d5954 004d8c10 004dd230 -- size 224, handler 0, task 65, used mask 15 07128130: 00 00 00 ef 19 13 08 41 00 4d 59 54 00 4d 8c 10 .......A .MYT.M.. 07128320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........ **** end overwrting suspect ###Frag Check, Trace: 0021e288 0021e33c 0021e4cc 0021f074 0021d5b0 004eeee8 004eee0c 004dcac4 004dd404 004e0440 004ed0a4 004ed460 004fc59c 004ecb10 03257eec ###severe problem,can't free memory, Trace: 0021f08c 0021d5b0 004eeee8 004eee0c 004dcac4 004dd404 004e0440 004ed0a4 004ed460 004fc59c 004ecb10 03257eec *********************** ScreenOS Crash Context: *********************** bad:80000008 cnt:982192f6 cmp:9821ca2d sts:b4009013 ------------- ASIC Context: ------------- config1 11490093, config2 002b0082 md5 control 00000000 md5 pak base 00000000 des status 00000000 des dma adr 00000000 des dma cnt 00000000 ----------- OS Context: ----------- Died Flow/bootup Module Cur Task Context: ospf ------------ Memory Check: ------------ frag 7128130: bad next pointer 07128230, task ospf (65) =0000000007128230: 00 00 00 00 00 04 22 05 15 a0 00 00 03 03 03 02 ......". ........ 07128280: b1 80 00 00 03 03 03 02 80 00 00 02 a0 53 00 24 ........ .....S.$ **** overwriting suspect: 7128130 -- start 0x07128130 0x07127500 0x07128230 -- used block, allocation trace: 004d5954 004d8c10 004dd230 -- size 224, handler 0, task 65, used mask 15 07128130: 00 00 00 ef 19 13 08 41 00 4d 59 54 00 4d 8c 10 .......A .MYT.M.. 07128320: a0 53 00 24 ff 80 00 00 80 00 00 00 00 00 00 00 .S.$.... ........ **** end overwrting suspect ###Frag Check, Trace: 0021e288 0021e36c 0021e4cc 0021f630 0021f54c 0008371c 00081790 frag 7128230: bad pointer 07128230, task unknown (517) =0000000007128230: 00 00 00 00 00 04 22 05 15 a0 00 00 03 03 03 02 ......". ........ 07128280: b1 80 00 00 03 03 03 02 80 00 00 02 a0 53 00 24 ........ .....S.$ **** overwriting suspect: 7128130 -- start 0x07128130 0x07127500 0x07128230 -- used block, allocation trace: 004d5954 004d8c10 004dd230 -- size 224, handler 0, task 65, used mask 15 07128320: a0 53 00 24 ff 80 00 00 80 00 00 00 00 00 00 00 .S.$.... ........ **** end overwrting suspect ###Frag Check, Trace: 0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 00081790 frag 80000000: bad pointer 80000000, task unknown (1536) =0000000380000000: 3c 1a 01 b3 67 5a de 00 af 41 00 00 3c 01 a0 00 <...gZ.. .A..<... 80000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........ Can not find overwrite suspect for 0x80000000 Address out of heap range ###Frag Check, Trace: 0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 00081790 80000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........ 80000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........ Can not find overwrite suspect for 0x80000000 Address out of heap range ###Frag Check, Trace: 0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 00081790 NetScreen NS-200 Boot Loader Version 3.0.0 (Checksum: B48FB1B8) Copyright (c) 1997-2003 NetScreen Technologies, Inc. Total physical memory: 128MB Test - Pass Initialization - Done Model Number: NS-208 Hit any key to run loader Hit any key to run loader Hit any key to run loader Hit any key to run loader Loading default system image from on-board flash disk... Image authenticated! Start loading... .................................................................................................... Done. Juniper Networks, Inc NS-200 System Software Copyright, 1997-2006 Version 5.4.0r18.0 Init Heap (1ebd010/5342bf0, 00000000/00000000) GT64120 revision id: 0x12 Load NVRAM Information ... (5.4.0)Done GT64120 revision id: 0x12 Memory Test: b7800000,40000 ....... Done Install module init vectors Verify ACL register default value (at hw reset) ... Done Verify ACL register read/write ... Done Verify ACL rule read/write ... Done Verify ACL rule search ... Done MD5("a") = 0cc175b9 c0f1b6a8 31c399e2 69772661 MD5("abc") = 90015098 3cd24fb0 d6963f7d 28e17f72 MD5("message digest") = f96b697d 7cb7938d 525a2f31 aaf161d0 Verify DES register read/write ... Done Install modules (00db4000,0197fb38) ... Initializing DI 1.1.0-ns load dns table : dns table file does not exist. System config (1387 bytes) loaded . Done. Load System Configuration ...............................................................................................................................Enabled licensekey auto update .....................................Done system init done.. login: ethernet1 interface change physical state to Up ethernet2 interface change physical state to Up System change state to Active(1)
Once the we had the Cisco 3750 causing retransmissions, and the C1760 and NS208 constantly rebooting, the tiny 17 node ( 21 - 2xOlive+ 2xMikrotik) OSPF network really seemed to drop into chaos. Despite all of this, the J2300 continued to pump more and more external LSAs into the backbone area. Shortly thereafter, at about 25K routes the 3640 seemed to give up on life displaying the follwing message before it rebooted:
rPool: Processor Free: 109816 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "IP RIB Update", ipl= 3, pid= 76 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x6107A530 0x61084698 0x61085728 0x6042E188 0x60400780 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948 Jan 6 12:08:58.107: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948 Jan 6 12:09:03.399: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948 Jan 6 12:09:06.475: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Dead timer expired Jan 6 12:09:13.855: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948 Jan 6 12:09:19.527: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948 Jan 6 12:09:28.239: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x61084834, alignment 0 Pool: Processor Free: 1014468 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FDE378 0x60FDFABC 0x60FFA4A4 0x60FC8AEC 0x61023964 0x61023948 Jan 6 12:10:44.551: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired Jan 6 12:11:22.116: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.4 on FastEthernet0/0.2 from LOADING to FULL, Loading Done C3640-1#
After it's first reboot, the 3640 just added to the network churn by blowing it's memory bounds, time after time, after time, dropping all of it's neighbors, clearing up it's RAM and staring the process all over again
Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948 Jan 6 18:16:15.525: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x61084834, alignment 0 Pool: Processor Free: 53296 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948 Jan 6 18:16:46.389: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x61084834, alignment 0 Pool: Processor Free: 53296 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948 Jan 6 18:16:55.553: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Too many retransmissions Jan 6 18:17:56.430: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from DOWN to DOWN, Neighbor Down: Ignore timer expired Jan 6 18:18:07.722: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
At about 30k external LSAs, the EX2200C stepped up it's complaints by oversubscribing it's FIB:
Jan 6 13:21:58 EX2200C-3 rpd[1075]: RPD_SCHED_SLIP: 7 sec scheduler slip, user: 5 sec 289760 usec, system: 0 sec, 0 usec Jan 6 13:22:07 EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 31.176.0.0/16 (cstatus: 65565) Jan 6 13:22:07 EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid) Jan 6 13:22:07 EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22) Jan 6 13:22:07 EX2200C-3 fpc0 Failed to install the RT entry (status: 22) Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 16 prefix 31.176/16 nh 1329 Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed Jan 6 13:22:07 EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 36.246.0.0/15 (cstatus: 65565) Jan 6 13:22:07 EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22) Jan 6 13:22:07 EX2200C-3 fpc0 Failed to install the RT entry (status: 22) Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 15 prefix 36.246/15 nh 1329 Jan 6 13:22:07 EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed Jan 6 13:22:08 EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 46.144.64.0/18 (cstatus: 65565) Jan 6 13:22:08 EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22) Jan 6 13:22:08 EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid) Jan 6 13:22:08 EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid) Jan 6 13:22:08 EX2200C-3 fpc0 Failed to install the RT entry (status: 22) Jan 6 13:22:08 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed Jan 6 13:22:08 EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 18 prefix 46.144.64/18 nh 1329 Jan 6 13:22:08 EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed Jan 6 13:22:08 EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 47.41.15.240/29 (cstatus: 65565)
To it's credit, the little EX2200C kept running just fine. It was easily accessible the whole time.
Somewhere in this range the first opensorce implementation displayed the first serious problems, with suprise, suprise XORP giving up.
[ 2013/01/06 13:21:12.16138 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16211 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16289 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16384 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf
At 40k, the SRX100B, started to bitch about FIB space:
Jan 6 13:37:34 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:37:34 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid) Jan 6 13:37:34 SRX100-5_OSPF last message repeated 13 times Jan 6 13:39:29 SRX100-5_OSPF last message repeated 1098 times Jan 6 13:45:37 SRX100-5_OSPF last message repeated 3391 times Jan 6 13:45:39 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:45:39 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid) Jan 6 13:46:05 SRX100-5_OSPF last message repeated 313 times Jan 6 13:46:46 SRX100-5_OSPF last message repeated 335 times Jan 6 13:46:48 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:46:48 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid) Jan 6 13:47:03 SRX100-5_OSPF last message repeated 210 times Jan 6 13:47:49 SRX100-5_OSPF last message repeated 424 times Jan 6 13:47:55 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:47:55 SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid) Jan 6 13:48:01 SRX100-5_OSPF last message repeated 107 times
The Cisco 2811, feeling left out decided to get into the action at about 75k routes. It started to show signs by flapping adjacencies and reporting corrupted LSAs:
*Jan 6 12:43:24.690: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:43:43.990: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:43:50.558: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:43:57.226: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:03.486: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:09.890: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:16.114: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:44:16.590: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:23.354: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:30.014: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:36.594: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:42.918: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:44:49.610: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256 *Jan 6 12:45:03.410: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2 *Jan 6 12:45:10.130: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2 *Jan 6 12:45:16.602: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2 *Jan 6 12:45:24.426: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2 *Jan 6 12:45:25.810: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:45:31.406: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2 *Jan 6 12:46:17.910: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:47:27.910: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.256 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:47:27.966: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:47:29.466: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.267 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:47:32.530: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from INIT to DOWN, Neighbor Down: Dead timer expired
Then it blew it's memory bounds:
*Jan 6 12:48:25.446: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 398876 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F905ECz 0x42F932E4z 0x42F98E34z 0x42F8A2F0z 0x4393A798z 0x4393A77Cz *Jan 6 12:48:55.230: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.256 from LOADING to FULL, Loading Done *Jan 6 12:49:27.978: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.267 from INIT to DOWN, Neighbor Down: Dead timer expired *Jan 6 12:49:35.278: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x43984B44, alignment 16 Pool: Processor Free: 1717756 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "IP RIB Update", ipl= 3, pid= 184 -Traceback= 0x43968B0Cz 0x439820ACz 0x41FD132Cz 0x41FD3104z 0x41FD3758z 0x41FCC360z 0x41F8FECCz 0x41F51254z 0x41F93B90z 0x41F93DA8z 0x41F97E68z 0x41F9808Cz 0x4203CE70z 0x4203CFC4z 0x42695898z 0x4268B364z *Jan 6 12:49:35.798: %COMMON_FIB-3-NOMEM: Memory allocation failure for validating prefix in IPv4 CEF [0x41F37C70] (fatal) (2436 subsequent failures). *Jan 6 12:49:35.798: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.And did this for the rest of the time....
*Jan 6 18:12:47.856: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 1004388 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz *Jan 6 18:13:17.864: %SYS-2-MALLOCFAIL: Memory allocation of 5000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 1001384 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Hello", ipl= 0, pid= 12 -Traceback= 0x439643A8z 0x43982348z 0x42F950C8z 0x42F95B74z 0x42F8CA50z 0x42F8CC2Cz 0x42FB9478z 0x42F857C0z 0x42F861B8z 0x4393A798z 0x4393A77Cz *Jan 6 18:13:22.644: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN. ' -Traceback= 0x4393A798z 0x4393A77Cz *Jan 6 18:13:47.876: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 1004116 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz *Jan 6 18:14:17.876: %SYS-2-MALLOCFAIL: Memory allocation of 5000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 968736 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42FA4E38z 0x42FA57BCz 0x42FA5D44z 0x42F9A15Cz 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz *Jan 6 18:14:22.644: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN. ' -Traceback= 0x4393A798z 0x4393A77Cz *Jan 6 18:14:47.904: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 968736 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz *Jan 6 18:15:17.944: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 Pool: Processor Free: 1010156 Cause: Memory fragmentation Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz *Jan 6 18:15:22.648: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN. ' -Traceback= 0x4393A798z 0x4393A77Cz
Somewhere in this range the first opensorce implementation displayed the first serious problems, with suprise, suprise XORP giving up. It dropped all of it's neighbors, and never reformed any adjacencies.
[ 2013/01/06 13:21:12.16138 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16211 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16289 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf [ 2013/01/06 13:21:12.16384 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down ------- OriginTable: ospf IGP next table = Redist:ospf
Then the SRX100B joined the cycling reboot club due to the hardware watchdog resetting the box.
Jan 6 15:50:21 SRX100-5_OSPF init: ipmi (PID 0) started panic: Hardware watchdog timeout cpuid = 0 KDB: stack backtrace: SP 0x0: not in kernel uart_z8530_class+0x0 (0,0,0,0) ra 0 sz 0 pid 22, process: idle: cpu0 Uptime: 1d3h32m33s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset: Stopping other CPUs U-Boot 1.1.6 (Build time: Dec 12 2009 - 17:17:55) SRX_100_LOWMEM board revision major:0, minor:0, serial #: AT3809AF0822 OCTEON CN5020-SCP pass 1.1, Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz data rate) DRAM: 512 MB Starting Memory POST... Checking datalines... OK
Somewhere between the 80K and 115K range, the EX3200 started to choke, gagged and then dropped a core:
Jan 6 13:40:02 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:02 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:03 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:03 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:03 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:03 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:04 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:04 EX3200-2_OSPF /kernel: last message repeated 4 times Jan 6 13:40:04 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:04 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:05 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:05 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:06 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:06 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:06 EX3200-2_OSPF /kernel: last message repeated 1 times Jan 6 13:40:06 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:07 EX3200-2_OSPF last message repeated 2 times Jan 6 13:40:07 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:08 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:08 EX3200-2_OSPF /kernel: last message repeated 2 times Jan 6 13:40:08 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:08 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:09 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:09 EX3200-2_OSPF /kernel: last message repeated 1 times Jan 6 13:40:09 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:09 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:10 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:10 EX3200-2_OSPF /kernel: last message repeated 2 times Jan 6 13:40:10 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:10 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:11 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:11 EX3200-2_OSPF /kernel: last message repeated 4 times Jan 6 13:40:11 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:11 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:12 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:12 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:12 EX3200-2_OSPF /kernel: last message repeated 8 times Jan 6 13:40:12 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:12 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:12 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:13 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:13 EX3200-2_OSPF /kernel: last message repeated 3 times Jan 6 13:40:13 EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid) Jan 6 13:40:13 EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 Jan 6 13:40:14 EX3200-2_OSPF last message repeated 34 times Jan 6 13:41:04 EX3200-2_OSPF /kernel: Percentage memory available(19)less than threshold(20 %)- 3 Jan 6 13:41:16 EX3200-2_OSPF rpd[12424]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.3 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from ExStart to Init due to 1WayRcvd (event reason: neighbor is in one-way mode) Jan 6 13:42:33 EX3200-2_OSPF rpd[12424]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.6 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from ExStart to Down due to InActiveTimer (event reason: neighbor was inactive and declared dead) Jan 6 13:43:15 EX3200-2_OSPF /kernel: Process (12424,rpd) has exceeded 85% of RLIMIT_DATA: used 57936 KB Max 65536 KB Jan 6 13:43:16 EX3200-2_OSPF rpd[12424]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (16384) in table inet.0 reached configured maximum (16384) Jan 6 13:43:18 EX3200-2_OSPF /kernel: Process (12424,rpd) attempted to exceed RLIMIT_DATA: attempted 66128 KB Max 65536 KB Jan 6 13:43:36 EX3200-2_OSPF init: routing (PID 12424) terminated by signal number 6. Core dumped! Jan 6 13:43:36 EX3200-2_OSPF init: routing (PID 12572) started Jan 6 13:43:38 EX3200-2_OSPF rpd[12572]: L2CKT acquiring mastership for primary Jan 6 13:43:38 EX3200-2_OSPF rpd[12572]: L2VPN acquiring mastership for primary Jan 6 13:43:38 EX3200-2_OSPF rpd[12572]: RPD_KRT_KERNEL_BAD_ROUTE: KRT: lost ifl 0 for route 192.168.0.0 Jan 6 13:43:38 EX3200-2_OSPF rpd[12572]: RPD_TASK_BEGIN: Commencing routing updates, version 12.2R2.4, built 2012-11-15 17:42:45 UTC by builder Jan 6 13:44:05 EX3200-2_OSPF dumpd: Core and context for rpd saved in /var/tmp/rpd.core-tarball.1.tgz
This also proved to be a cylce, overload the LSDB, drop a core, restart rpd, repeat
The rest of the routers seemed to keep chugging along despite all of the chaos. The J2300 was doing double duty just fine, the SRXs with 1GB of RAM chugged along, BIRD, Quagga, Vyatta and OpenOSPFd seemed to be keeping up fine. However, BIRD was taking from 20% to 40% of the CPU and OpenOSPFd was edging closer to 60%.
When there were about 180K external LSAs floating around, it didn't look like anything exciting was going to happen for quite a while. exaBGP had been feeding prefixes to the J2300 for about 5 hours. The 1GB routers (except XORP) just seemed to be cruising along -- despite the chaos introduced from the recycling group. At this point on every link there was about 1 Mbps of traffic on every link -- and this was pure OSPF traffic! There was no other traffc on the network besides the link-state routing protocol.
At this point I decided to kill the BGP connection to the J2300. exaBGP sent a cease message to the J2300 causing it to drop all of it's BGP routes, and start to pull back all of the external Type-5 LSAs it originated by aging them out.
Sun, 06 Jan 2013 20:39:18 INFO 6481 message Peer 172.20.1.23 ASN 65066 >> KEEPALIVE (no more UPDATE and no EOR) Sun, 06 Jan 2013 20:39:18 INFO 6481 message Peer 172.20.1.23 ASN 65066 Sending Notification (6,3) [Cease: Peer De-configured] Sun, 06 Jan 2013 20:39:18 INFO 6481 supervisor Performing dynamic route update Sun, 06 Jan 2013 20:39:18 INFO 6481 supervisor Updated peers dynamic routes successfully Sun, 06 Jan 2013 20:39:18 INFO 6481 processes Terminating process service-1
This introduced a lot more LSA updates as the J2300 sends out updates for all of the external LSAs. Traffic went up from about 1 Mbps to around the neighborhood of 3Mbps. The hardware based routers all seemed to be able to delete LSAs at about the same pace, but the opensource guys seemed to have a bit harder time keeping up. Quagga and Vyatta (which uses Quagga's daemons) were typically lagged about 10K behind the number that the J2300 still had left. BIRD was a bit further behind, but was definately working more than the Quaggites -- but OpenOSPFd was really lagging as the ospfd was really sucking up the CPU time.
load averages: 0.58, 0.63, 0.52 openospfd 17:26:35 26 processes: 25 idle, 1 on processor CPU states: 8.4% user, 0.0% nice, 20.0% system, 2.0% interrupt, 69.7% idle Memory: Real: 163M/260M act/tot Free: 723M Cache: 22M Swap: 0K/17M PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 22718 _ospfd 2 0 145M 124M sleep kqread 30:53 17.92% ospfd 2618 root 2 0 21M 22M sleep kqread 11:07 7.42% ospfd 9004 _ospfd 2 0 2664K 2324K sleep kqread 4:49 1.27% ospfd 9106 _pflogd 4 0 656K 376K sleep bpf 0:01 0.00% pflogd 1 root 10 0 480K 420K idle wait 0:01 0.00% init 12734 root 2 0 1732K 1740K sleep select 0:01 0.00% sendmail 14388 _syslogd 2 0 672K 856K idle poll 0:00 0.00% syslogd 17673 named 2 0 7632K 8424K idle select 0:00 0.00% named 31083 root 18 0 720K 612K sleep pause 0:00 0.00% ksh 26649 root 2 0 548K 1016K idle select 0:00 0.00% cron 12361 root 28 0 636K 1724K onproc - 0:00 0.00% top 9911 _iked 2 0 1624K 1076K idle kqread 0:00 0.00% iked 15138 root 2 0 1912K 1208K idle kqread 0:00 0.00% iked 25883 root 2 0 592K 532K idle netio 0:00 0.00% pflogd 4978 root 2 0 2124K 996K idle netio 0:00 0.00% named 24305 root 3 0 356K 912K idle ttyin 0:00 0.00% getty 31626 root 2 0 652K 848K idle netio 0:00 0.00% syslogd
When the rest of the "live" routers had 80K and 90K of LSAs left, ospfd on OpenBSD still add over 144,000. Compare the ospf summary on the OpenOSPFd box compared to the one on the J2300 at the same time:
# ospfctl show Router ID: 1.1.1.2 Uptime: 08:41:45 RFC1583 compatibility flag is disabled SPF delay is 1000 msec(s), hold time between two SPFs is 5000 msec(s) Number of external LSA(s) 144785 (Checksum sum 0x1bd1c442) Number of areas attached to this router: 1 Area ID: 0.0.0.0 Number of interfaces in this area: 5 Number of fully adjacent neighbors in this area: 5 SPF algorithm executed 798 time(s) Number LSA(s) 22 (Checksum sum 0x94514)
juniper@J2300-7> show ospf database summary Area 0.0.0.0: 15 Router LSAs 10 Network LSAs Externals: 81534 Extern LSAs Interface fe-0/0/0.22: Area 0.0.0.0: Interface fe-0/0/0.3: Area 0.0.0.0: Interface fe-0/0/0.312: Area 0.0.0.0: Interface fe-0/0/0.323: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: [edit]
At the same time, OpenOSPFd was hogging all of the CPU time on the KVM host. OpenOSPFd is running within PID 7655, BIRD is 2066, followed by Vyatta and Quagga
top - 18:57:37 up 1 day, 7:44, 9 users, load average: 0.91, 1.02, 1.14 Tasks: 213 total, 2 running, 211 sleeping, 0 stopped, 0 zombie Cpu(s): 15.1%us, 0.4%sy, 0.0%ni, 84.0%id, 0.3%wa, 0.2%hi, 0.1%si, 0.0%st Mem: 16252540k total, 11151688k used, 5100852k free, 19904k buffers Swap: 14352380k total, 109464k used, 14242916k free, 5687716k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7655 qemu 20 0 1362m 1.0g 9m S 112.1 6.5 186:24.09 qemu-kvm 2066 qemu 20 0 4247m 911m 10m S 15.6 5.7 128:10.85 qemu-kvm 2379 qemu 20 0 5346m 431m 9m S 5.0 2.7 33:07.99 qemu-kvm 18945 qemu 20 0 1365m 953m 9m S 3.0 6.0 28:58.08 qemu-kvm 1056 root 20 0 1011m 24m 7028 S 0.3 0.2 5:27.83 libvirtd 2333 qemu 20 0 4782m 860m 9m S 0.3 5.4 26:31.96 qemu-kvm 2381 root 20 0 0 0 0 S 0.3 0.0 1:51.12 vhost-2379 1 root 20 0 67116 25m 2084 S 0.0 0.2 0:01.08 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.48 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/u:0H 8 root RT 0 0 0 0 S 0.0 0.0 0:00.11 migration/0 9 root RT 0 0 0 0 S 0.0 0.0 0:00.13 watchdog/0 10 root RT 0 0 0 0 S 0.0 0.0 0:00.06 migration/1 12 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 13 root 20 0 0 0 0 S 0.0 0.0 0:00.40 ksoftirqd/1
It took about 35 minutes for the LSAs to get all cleared out, and everyone except the XORP box and the Olives came back into the network eventually. I was really quite suprised at how long it took to clear out all of from the LSDB. Working on real networks, that keep external routes to a bare miniumum, OSPF always seemed very, very quick to me. With tens of thousands of external LSAs - it really brought a rather small network, with no traffic, to it's knees and took forever to respond and could never really converge fully. BGP is really damn quick in comparison on this scale! Juniper is also pretty kind with the reflooding of LSAs that a Junos box originates, only refreshing them every 50 minutes. Had this been a Cisco, the flooding would have happened even more often - at 30 minutes after an LSAs birthday. I'm not sure which is better on a network with a boatload of external advertisements to support - on one hand you'll get more due to the more frequent refreshing, but on the other hand especially with some of the slower boxes on the network **cough, cough, openospd, cough ** you're more likely to have an LSA actually die of old age. Sounds like a fun place to do some tuning! Either way, this was in no way a converged network and it problably doesn't make much difference either way. This also was with only one router injecting external routes, so there really wasn't a lot of comparison that needed to be done by the none ASBRs -- adding another redistributing peer or two would really make the workload go up.
To prepare for the next OSPF Database Overload, I made some adjustments which should make it a bit easier to keep track of everything, and also to simulate an actual network a little better.
I setup another virtual machine to act as a NTP server to serve time for those that need it (KVMs just get time from the bare metal server). Generally, running a NTP server off of a VM is a pretty bad idea, but I'm not really looking for precision too much, so a few seconds of drift here and there isn't going to make too much of a difference. The same KVM is accepting remote syslog messages, so each router can spew complaints across to somebody who will listen, and hopefully all of the errors will wind up in one big file -- if they make it. And to keep an eye on things, the KVM is also running Nagios which is keeping tabs on all of the test participents with some simple ping probles. Generally ICMP isn't the best protocol to use for this, but it's quick and easy and supported by every IP implementation so it should suffice.
Generally speaking, in a real network there are more than just routers. So to simulate what a real enterprise-like network might be doing, each OSPF router participant is now responsible for announcing two user networks, one user network will be advertised as part of the Router LSA the router originates. The other will be announced through some sort of redistribution, so the advertisement is an External LSA of Type 5. The idea here is one network will be innate to the OSPF router, the other will have the potential to get lost in the shuffle once we start carelessly dumping thousands of routes into our LSDB. To test if these announcements are accurrate, there is a host on each network to simulate an actual user or network device like a stupid printer. Nagios will be keeping an eye on these "hosts" as well with a set of ping probes. The IP addressing for each host network contained in the Router LSA follows the format of <Cluster #><Router #>.0.0/25
where the router recieves the .1 address and the host is .2. For the host network announced in the External LSA, the network address is <Cluster #><Router #>.0.128/25
where the router owns the .129 address and the host is .130. The VM running Nagios was given an interface in each one of the cluster broadcast networks (1.0.0.10/24, 2.0.0.0/24 and 3.0.0.0/24) so it need not "route" to reach each OSPF router. In order to make sure that is is reaching each host network via a OSPF network advertisement (no cheating with static routes), the Nagios box is listening to OSPF LSAs by running an instance of Quagga. Quagga was chosen for ease of configuration, and if it doesn't work out BIRD will be used in it's place. The Nagios box recieved an extra Gigabtye of RAM to protect itself from any implosions due to lack of memory. The links on the Nagios box were also maxed out at 65535 to make it look very unattractive for any transit traffic, this is the same methodolgy behind the "OSPF Overload" command found on many implmentations. The OSPF priority on all of th links was set at 0 to keep the Nagios box from ever becoming a DR.
To simulate the hosts, I setup a minimal installation of Microcore Linux in a KVM and cloned it 41 times. The install is on a 24 MB disk, and uses 34 MB of RAM. The Microcore KVM for C1R1 is available here: vyatta_LSA1.img.bz2 (7.9 Mb) for download. It's a standard Micocore install, but it's been modified to autologin as root on a serial console. If you want to use this for your own purposes, remember to edit the /opt/bootlocal.sh
and /opt/bootsync.sh
files to change the IP address, default gateway and hostname of the image. Don't forget to run the command filetool.sh -b
if you want to make your purposes permanent. This image boots up in about 2 seconds, but you may need to add a second or two of delay in one of the boot*.sh files if the IP address doesn't seem to be taking. Note also that the root user in the image has no password and uses virtio drivers for the NIC, memory and hard disk. An XML dump of the virtual machine definition for libvirt is here: vyatta_LSA1.xml.
The last adjustment was to change all of the Olive interfaces from emulated Intel 10/100 NICs, to emulated Intel 10/100/1000 NICs. I did some quick tests and the new interfaces for the Olives seemed to be holding up perfectly. Note that all that was needed to change the config from the old fxp
intefaces to the new em
interfaces was one command in config mode, followd by a commit: replace pattern fxp with em
.
In case you're curious about the configuration behind the new Type 1 and Type 5 LSAs for the host networks, here is the configuration that was added to each router. In general, to announce a route as part of the Router LSA (Type 1) the network and/or interface is added into the routers OSPF configuration. To announce a route as a separate External LSA (Type 5), the network prefix or interface is redistributed through OSPF. Care was taken to use policy or a filter so only the single /25 network was redistributed just to keep things clean and consistent.
interfaces { ethernet eth0 { address 1.1.0.1/25 description LSA1 duplex auto hw-id 52:54:00:94:13:84 smp_affinity auto speed auto } ethernet eth2 { address 1.1.0.129/25 description LSA5 duplex auto hw-id 52:54:00:08:2d:6f smp_affinity auto speed auto } } policy { prefix-list HOST { rule 1 { action permit prefix 1.1.0.128/25 } } route-map HOST { rule 1 { action permit match { ip { address { prefix-list HOST } } } } rule 2 { action deny } } } protocols { ospf { area 0.0.0.0 { network 1.1.0.0/25 } redistribute { connected { metric-type 2 route-map HOST } } } }
redistribute 1.2.0.128/25 area 0.0.0.0 { interface em1 }
interfaces { em2 { unit 0 { family inet { address 1.3.0.1/25; } } } em3 { unit 0 { family inet { address 1.3.0.129/25; } } } } protocols { export EXPORT-LSA5; area 0.0.0.0 { interface em2.0 { metric 100; } } } policy-options { policy-statement EXPORT-LSA5 { term LSA5-HOST { from { protocol direct; route-filter 1.3.0.128/25 exact; } then accept; } then { metric 100; } } }
dummy0
in the OS, and use it enleau of the actual loopback interface, lo. Limiting the redistribution aalso proved to be pretty complex. BIRD proved to be the most difficult implementation to get both networks advertised out properly without announcing anything unintended. However, I did learn that BIRD seems to be really powerful in the policy aspect, able to do a lot of things that no other implementation seems to be able to. eth1 is the LSA1 interface with IP address 1.4.0.1/25.
filter OSPF_export { if ( source = RTS_DEVICE && net ~ [ 1.4.0.128/25 ] ) then accept; reject; } protocol direct { interface "*"; } protocol ospf OSPFol { export filter OSPF_export; area 0.0.0.0 { stub no; interface "dummy0" { stub yes; }; interface "eth1" { cost 100; }; }; }
! interface eth1 description LSA1 ip address 1.5.0.1/25 ip ospf cost 100 ipv6 nd suppress-ra ! interface eth2 description LSA5 ip address 1.5.0.129/25 ipv6 nd suppress-ra ! router ospf redistribute connected route-map LSA5 network 1.5.0.0/25 area 0.0.0.0 ! ip prefix-list LSA5 seq 5 permit 1.5.0.128/25 ! route-map LSA5 permit 1 match ip address prefix-list LSA5 set metric 100 !
protocols { ospf4 { area 0.0.0.0 { area-type: "normal" interface eth4 { link-type: "broadcast" vif eth4 { address 1.6.0.1 { priority: 128 hello-interval: 10 router-dead-interval: 40 interface-cost: 100 retransmit-interval: 5 transit-delay: 1 disable: false } } } } export: "EXPORT-OSPF" } } policy { policy-statement "EXPORT-OSPF" { term LSA5 { from { protocol: "connected" network4: 1.6.0.128/25 } then { metric: 100 accept { } } } } } interfaces { interface eth4 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } interface eth5 { description: "" disable: false discard: false unreachable: false management: false parent-ifname: "" iface-type: "" vid: "" default-system-config { } } }
interfaces { em2 { unit 0 { family inet { address 1.7.0.1/25; } } } em3 { unit 0 { family inet { address 1.7.0.129/25; } } } } protocols { ospf { export LSA5; interface em2.0 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 1.7.0.128/25 exact; } then { metric 100; accept; } } } }
interfaces { ge-0/0/1 { unit 2101 { vlan-id 2101; family inet { address 2.1.0.1/25; } } unit 2105 { vlan-id 2105; family inet { address 2.1.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface ge-0/0/1.2101 { metric 100; } } } } policy-options { policy-statement LSA5 { from { protocol direct; route-filter 2.1.0.128/25 exact; } then accept; } }
interfaces { fe-0/0/0 { unit 2201 { vlan-id 2201; family inet { address 2.2.0.1/25; } } unit 2205 { vlan-id 2205; family inet { address 2.2.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface fe-0/0/0.2201 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 2.2.0.128/25 exact; } then { metric 100; accept; } } } }
interface FastEthernet0/0.2301 encapsulation dot1Q 2301 ip address 2.3.0.1 255.255.255.128 ip ospf cost 100 ! interface FastEthernet0/0.2305 encapsulation dot1Q 2305 ip address 2.3.0.129 255.255.255.128 ! router ospf 1 redistribute connected metric 100 subnets route-map LSA5 network 2.3.0.1 0.0.0.0 area 0 ! ip prefix-list LSA5 seq 5 permit 2.3.0.128/25 route-map LSA5 permit 10 match ip address prefix-list LSA5 set metric 100 !
interfaces { ge-0/0/0 { unit 2401 { vlan-id 2401; family inet { address 2.4.0.1/25; } } unit 2405 { vlan-id 2405; family inet { address 2.4.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface ge-0/0/0.2401 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 2.4.0.128/25 exact; } then { metric 100; accept; } } } } security { policies { from-zone OSPF to-zone OSPF { policy ACCEPT-ALL { match { source-address any; destination-address any; application any; } then { permit; } } } } zones { security-zone OSPF { interfaces { ge-0/0/0.2401; ge-0/0/0.2405; } } } }
set interface "ethernet2.11" tag 2501 zone "OSPF" set interface "ethernet2.15" tag 2505 zone "OSPF" set interface ethernet2.11 ip 2.5.0.1/25 set interface ethernet2.11 route set interface ethernet2.15 ip 2.5.0.129/25 set interface ethernet2.15 route set interface ethernet2.11 ip manageable set interface ethernet2.15 ip manageable set interface ethernet2.11 manage ping set interface ethernet2.15 manage ping set access-list 1 set access-list 1 permit ip 2.5.0.128/25 1 set route-map name "LSA5" permit 1 set match ip 1 set metric 100 exit set protocol ospf set redistribute route-map "LSA5" protocol connected exit exit set interface ethernet2.11 protocol ospf area 0.0.0.0 set interface ethernet2.11 protocol ospf enable set interface ethernet2.11 protocol ospf cost 100
interface FastEthernet0/0.2601 encapsulation dot1Q 2601 ip address 2.6.0.1 255.255.255.128 ip ospf cost 100 ip ospf 1 area 0.0.0.0 ! interface FastEthernet0/0.2605 encapsulation dot1Q 2605 ip address 2.6.0.129 255.255.255.128 ! router ospf 1 redistribute connected subnets route-map LSA5 ! ip prefix-list LSA5 seq 5 permit 2.6.0.128/25 ! route-map LSA5 permit 10 match ip address prefix-list LSA5 !
/interface vlan add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=LSA1 \ use-service-tag=no vlan-id=2701 add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=LSA5 \ use-service-tag=no vlan-id=2705 /routing ospf instance set [ find default=yes ] disabled=no distribute-default=never in-filter=\ ospf-in metric-bgp=auto metric-connected=100 metric-default=1 \ metric-other-ospf=auto metric-rip=20 metric-static=20 name=default \ out-filter=ospf-out redistribute-bgp=no redistribute-connected=as-type-2 \ redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \ router-id=2.2.2.7 /routing ospf area set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\ backbone type=default /ip address add address=2.7.0.1/25 disabled=no interface=LSA1 network=2.7.0.0 add address=2.7.0.129/25 disabled=no interface=LSA5 network=2.7.0.128 /routing ospf interface add authentication=none authentication-key="" authentication-key-id=1 cost=\ 100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \ interface=LSA1 network-type=broadcast passive=no priority=1 \ retransmit-interval=5s transmit-delay=1s use-bfd=no /routing ospf network add area=backbone disabled=no network=2.7.0.0/25 /routing filter add action=accept chain=ospf-out disabled=no invert-match=no prefix=\ 2.7.0.128/25 set-bgp-prepend-path="" add action=discard chain=ospf-out disabled=no invert-match=no \ set-bgp-prepend-path=""
interfaces { fe-0/0/0 { unit 3201 { vlan-id 3201; family inet { address 3.2.0.1/25; } } unit 3205 { vlan-id 3205; family inet { address 3.2.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface fe-0/0/0.3201 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 3.2.0.128/25 exact; } then { metric 100; accept; } } } }
vlan 3,33,323,334,3301,3305 ! interface Vlan3301 ip address 3.3.0.1 255.255.255.128 ip ospf cost 100 ip ospf 1 area 0.0.0.0 ! interface Vlan3305 ip address 3.3.0.129 255.255.255.128 ! router ospf 1 redistribute connected subnets route-map LSA5 ! ip prefix-list LSA5 seq 5 permit 3.3.0.128/25 ! route-map LSA5 permit 1 match ip address prefix-list LSA5 set metric 100 !
interfaces { fe-0/0/0 { unit 3401 { vlan-id 3401; family inet { address 3.4.0.1/25; } } unit 3405 { vlan-id 3405; family inet { address 3.4.0.129/25; } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface fe-0/0/0.3401 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 3.4.0.128/25 exact; } then { metric 100; accept; } } } } security { policies { from-zone OSPF to-zone OSPF { policy ACCEPT-ALL { match { source-address any; destination-address any; application any; } then { permit; } } } } zones { security-zone OSPF { interfaces { fe-0/0/0.3401; fe-0/0/0.3405; } } } }
interfaces { ge-0/0/8 { vlan-tagging; unit 3501 { vlan-id 3501; family inet { address 3.5.0.1/25; } } unit 3505 { vlan-id 3505; family inet { address 3.5.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface ge-0/0/8.3501 { passive; metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 3.5.0.128/25 exact; } then { metric 100; accept; } } } }
/interface vlan add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=LSA1 \ use-service-tag=no vlan-id=3601 add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=LSA5 \ use-service-tag=no vlan-id=3605 /ip address add address=3.6.0.1/25 disabled=no interface=LSA1 network=3.6.0.0 add address=3.6.0.129/25 disabled=no interface=LSA5 network=3.6.0.128 /routing filter add action=accept chain=ospf-out disabled=no invert-match=no prefix=\ 3.6.0.128/25 set-bgp-prepend-path="" add action=discard chain=ospf-out disabled=no invert-match=no \ set-bgp-prepend-path="" /routing ospf instance set [ find default=yes ] disabled=no distribute-default=never in-filter=\ ospf-in metric-bgp=200000 metric-connected=100 metric-default=1000 \ metric-other-ospf=auto metric-rip=20000 metric-static=2000 mpls-te-area=\ backbone mpls-te-router-id=loopback0 name=default out-filter=ospf-out \ redistribute-bgp=no redistribute-connected=as-type-2 \ redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \ router-id=3.3.3.6 /routing ospf interface add authentication=none authentication-key="" authentication-key-id=1 cost=\ 100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \ interface=LSA1 network-type=broadcast passive=no priority=1 \ retransmit-interval=5s transmit-delay=1s use-bfd=no /routing ospf network add area=backbone disabled=no network=3.6.0.0/25
interface FastEthernet0/0.3701 encapsulation dot1Q 3701 ip address 3.7.0.1 255.255.255.128 ip ospf cost 100 ! interface FastEthernet0/0.3705 encapsulation dot1Q 3705 ip address 3.7.0.129 255.255.255.128 ! router ospf 1 redistribute connected subnets route-map LSA5 network 3.7.0.1 0.0.0.0 area 0 ! ip prefix-list LSA5 seq 5 permit 3.7.0.128/25 ! route-map LSA5 permit 10 match ip address prefix-list LSA5 set metric 100 !
And here is our LSDB, to show that everything is working as intended: Note that the 3.0.0.10 router is the Nagios monitoring station, NTP and syslog server.
The Link State Databasevyatta@vyatta:~$ show ip ospf database OSPF Router with ID (1.1.1.1) Router Link States (Area 0.0.0.0) Link ID ADV Router Age Seq# CkSum Link count 1.1.1.1 1.1.1.1 380 0x80000116 0x751a 6 1.1.1.2 1.1.1.2 1568 0x800000c1 0x35d8 6 1.1.1.3 1.1.1.3 695 0x800000fd 0x5d52 7 1.1.1.4 1.1.1.4 147 0x800000d3 0x674f 7 1.1.1.5 1.1.1.5 403 0x8000010d 0xa68d 7 1.1.1.6 1.1.1.6 205 0x8000016a 0x2c8d 6 1.1.1.7 1.1.1.7 21 0x80000167 0x78fb 6 2.2.2.1 2.2.2.1 1964 0x8000001c 0x506c 8 2.2.2.2 2.2.2.2 1456 0x80000027 0xbfd0 8 2.2.2.3 2.2.2.3 344 0x80000020 0x3005 9 2.2.2.4 2.2.2.4 1991 0x80000014 0x63c1 8 2.2.2.5 2.2.2.5 1389 0x8000001b 0xddff 8 2.2.2.6 2.2.2.6 1718 0x80000219 0x3f5d 8 2.2.2.7 2.2.2.7 814 0x80000095 0x3ad6 8 3.0.0.10 3.0.0.10 1287 0x80000055 0x8cff 3 3.3.3.1 3.3.3.1 1454 0x80000015 0x5b48 8 3.3.3.2 3.3.3.2 864 0x80000012 0x5e6e 10 3.3.3.3 3.3.3.3 358 0x80000013 0x36c0 8 3.3.3.4 3.3.3.4 2502 0x80000014 0x69a4 8 3.3.3.5 3.3.3.5 394 0x80000027 0xc8f2 8 3.3.3.6 3.3.3.6 1790 0x80000032 0xacd8 8 3.3.3.7 3.3.3.7 189 0x80000014 0x8bd8 8 Net Link States (Area 0.0.0.0) Link ID ADV Router Age Seq# CkSum 1.0.0.7 1.1.1.7 521 0x80000041 0xcbc5 1.1.2.1 1.1.1.1 1570 0x8000009e 0x5159 1.1.7.1 1.1.1.1 1490 0x8000009b 0x663d 1.2.3.3 1.1.1.3 874 0x80000069 0xb601 1.3.4.3 1.1.1.3 285 0x80000063 0xc7f1 1.5.6.5 1.1.1.5 1282 0x80000038 0xe117 1.6.7.7 1.1.1.7 1021 0x80000025 0x03e0 2.0.0.5 2.2.2.5 1354 0x80000013 0x6941 3.0.0.5 3.3.3.5 2394 0x80000021 0x0a79 11.1.1.11 1.1.1.1 200 0x80000021 0x31d3 22.2.2.21 1.1.1.2 1440 0x8000002f 0xfddd 44.4.4.41 1.1.1.4 147 0x80000016 0x8638 55.5.5.51 1.1.1.5 82 0x8000002f 0x7911 66.6.6.63 3.3.3.6 10 0x80000014 0x6022 77.7.7.71 1.1.1.7 2021 0x8000005e 0x6b99 AS External Link States Link ID ADV Router Age Seq# CkSum Route 1.1.0.128 1.1.1.1 280 0x800000d4 0xca19 E2 1.1.0.128/25 [0x0] 1.2.0.128 1.1.1.2 73 0x80000099 0xec62 E1 1.2.0.128/25 [0x0] 1.3.0.128 1.1.1.3 1473 0x8000007a 0xb07d E2 1.3.0.128/25 [0x0] 1.4.0.128 1.1.1.4 441 0x800000a5 0x2bbe E2 1.4.0.128/25 [0x0] 1.5.0.128 1.1.1.5 112 0x800000be 0xd1cf E2 1.5.0.128/25 [0x0] 1.6.0.128 1.1.1.6 1515 0x80000032 0xd853 E2 1.6.0.128/25 [0x0] 1.7.0.128 1.1.1.7 1521 0x80000074 0x6067 E2 1.7.0.128/25 [0x0] 2.1.0.128 2.2.2.1 555 0x8000000f 0x8613 E2 2.1.0.128/25 [0x0] 2.2.0.128 2.2.2.2 1580 0x8000001c 0x46df E2 2.2.0.128/25 [0x0] 2.3.0.128 2.2.2.3 344 0x80000015 0x60cc E2 2.3.0.128/25 [0x0] 2.4.0.128 2.2.2.4 2597 0x8000000b 0x44ee E2 2.4.0.128/25 [0x0] 2.5.0.128 2.2.2.5 314 0x80000014 0x9c0c E1 2.5.0.128/25 [0x0] 2.6.0.128 2.2.2.6 214 0x80000084 0x28df E2 2.6.0.128/25 [0x0] 2.7.0.128 2.2.2.7 813 0x80000011 0xe363 E2 2.7.0.128/25 [0x0] 3.1.0.128 3.3.3.1 447 0x8000000d 0x51e1 E2 3.1.0.128/25 [0x0] 3.2.0.128 3.3.3.2 1459 0x8000000a 0x45ee E2 3.2.0.128/25 [0x0] 3.3.0.128 3.3.3.3 358 0x80000011 0x43e9 E2 3.3.0.128/25 [0x0] 3.4.0.128 3.3.3.4 421 0x8000000d 0x1b12 E2 3.4.0.128/25 [0x0] 3.5.0.128 3.3.3.5 1394 0x8000001e 0xe633 E2 3.5.0.128/25 [0x0] 3.6.0.128 3.3.3.6 1584 0x8000002e 0x9691 E2 3.6.0.128/25 [0x0] 3.7.0.128 3.3.3.7 189 0x80000011 0xfa2a E2 3.7.0.128/25 [0x0] vyatta@vyatta:~$
What happends when you have a full BGP feed and you accidentally dump it into your IGP? To find out I modified my python script that blurts out random bogus prefixes to spew out a specified number of prefixes. To do this I chose a Class A network (/8 for you CIDR people) and had the python script break the entire block into the most subnets that it can fit, and then iterate through with less and less specific subnet masks until the proper number of networks is achieved. This pyhon script is bs-prefixes.py and has a hardcoded variable ClassA in the script that decides what /8 network to split up. It takes one argument, the number of prefixes. Note that you cannot have more than 16777216 prefixes unless you move to a different plane of geometry. Besides, if you want that many prefixes anyway you're even crazier than I am. I chose to use the 13/8 network, as 13 is a nice unlucky number and perfect to explode a network with, and the block belongs to Xerox and they probably wouldn't mind me copying their address space, HAR, HAR, HAR.
Example: bs-prefixes creating 7 prefixes. Perfect for feeding into exaBGP!user@Linux-box:~$ bs-prefixes.py 7 13.0.0.0/10 13.64.0.0/10 13.128.0.0/10 13.192.0.0/10 13.0.0.0/9 13.128.0.0/9 13.0.0.0/8 user@Linux-box:~$ bs-prefixes.py 7
This python script is then used in conjunction with another shell script announce.sh to be used as a dynamic process used by exaBGP to announce the specified number of prefixes via BGP to our unlucky network. The shell script has a bit of logic built in to throttle down the announcments to 128 per second. I found if I just let it fly unbounded, I had a good chance of blowing a buffer up somewhere and having my BGP session go active. I also turned on graceful restart to make for quicker recovery in case of a dropped BGP tcp session, and pointed back at my 2GB Olive box and away from the J2300. As of 12 Jan, 2013 the full Internet routig table is about 460,000 routes. So to simulate this we'll set up our test to generate 500,000 routes -- just to be on the safe side. The new exaBGP config file is exabgp-500k.conf.
To simulate an accidental table dump, we will let BGP load up the RIB completely on the Olive2GB box. Once it is steady, we'll add an export policy to dump all the BGP routes into OSPF. This would be akin to an enterprise network that has a couple of Internet access points with a full BGP feed, but are only supposed to advertise a default route into the rest of the enterprise network. Our export policy will emulate a novice network operator ( or more likely just a stupid one ) that made a major screw up.
Letting exaBGP pump 500,000 routes into the 2GB Olivecopek@sheddap:/etc/exabgp$ exabgp exabgp-500k.conf Sat, 12 Jan 2013 23:09:48 INFO 8326 configuration Performing reload of exabgp 1.3.4 Sat, 12 Jan 2013 23:09:48 INFO 8326 supervisor New Peer 172.20.1.66 Sat, 12 Jan 2013 23:09:48 INFO 8326 configuration Loaded new configuration successfully Sat, 12 Jan 2013 23:09:48 INFO 8326 processes Forked process service-1 trap: SIGINT: bad trap Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 >> OPEN version=4 asn=65069 hold_time=600 router_id=66.66.66.66 capabilities=[Graceful Restart, Multiprotocol for IPv4 unicast IPv6 unicast IPv4 flow-ipv4, 4Bytes AS 65069] Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 << OPEN version=4 asn=65066 hold_time=600 router_id=1.1.1.3 capabilities=[Cisco Route Refresh, Multiprotocol for IPv4 unicast, Route Refresh, Graceful Restart, 4Bytes AS 65066] Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 >> KEEPALIVE (OPENCONFIRM) Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 << KEEPALIVE (ESTABLISHED) Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 >> UPDATE (eors) Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 >> KEEPALIVE (no more UPDATE and no EOR) Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 << KEEPALIVE Sat, 12 Jan 2013 23:09:48 INFO 8326 message Peer 172.20.1.66 ASN 65066 << UPDATE (not parsed) Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.0.0/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.0.64/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.0.128/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.0.192/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.1.0/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.1.64/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.1.128/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.1.192/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.2.0/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.2.64/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.2.128/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.2.192/26 next-hop 10.0.0.0 Sat, 12 Jan 2013 23:09:58 INFO 8326 processes Command from process service-1 : announce route 13.0.3.0/26 next-hop 10.0.0.0The Olive with a RIB full of routes!
juniper@Olive2GB> show bgp summary Groups: 1 Peers: 1 Down peers: 0 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 500000 500000 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... 172.20.10.117 65069 500004 49 0 7 2:20:06 500000/500000/500000/0 0/0/0/0 juniper@Olive2GB> show route summary Autonomous system number: 65066 Router ID: 1.1.1.3 inet.0: 500104 destinations, 500104 routes (500104 active, 0 holddown, 0 hidden) Restart Complete Direct: 8 routes, 8 active Local: 7 routes, 7 active OSPF: 88 routes, 88 active BGP: 500000 routes, 500000 active Static: 1 routes, 1 active juniper@Olive2GB>A quick check of our Nagios network monitoring station reveals that our network is currently all green and happy!
To dump this into OSPF, we'll use the following policy.
policy-statement EXPORT-BGP-TO-OSPF { term STUPID { from protocol bgp; then accept; } }And apply it with the following....
[edit] juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF [edit] juniper@Olive2GB# commit commit complete [edit] juniper@Olive2GB#
The 2GB Olive loaded up it's LSDB pretty quickly, and started flooding the network immediately.
juniper@Olive2GB# run show ospf database summary Area 0.0.0.0: 21 Router LSAs 16 Network LSAs 1 OpaqArea LSAs Externals: 500020 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: [edit] juniper@Olive2GB#
Just keeping an eye on the Nagios monitoring page, the first router that had anything turn red was in Cluster 1, Router number 6 -- XORP.
A few minutes passed, and then a lot of things started to turn red, the NS208 went off the map, the Vyatta boxes LSA Type 5 host network advertisement was lost, the EX2200C lost it's host routes first followed shortly thereafter by all of it's OSPF routes. Then the SRX210HE went completely red. The RB133 went completely red and stayed that way forever, while it's Routerboard companion the RB750 reddened up all of it's OSPF routes on the Nagios display. The lower memory Cisco boxes all joined in the fray as well, with the 3750 leading the charge followed by the 1760, the 3640 and finally the 2811. The EX3200 didn't want to be left out, and went red for OSPF as well. All of the reactions up until this point were pretty much the same as was done with the slow LSA buildup. Nothing too new here.
Then we had some new action...the OpenOSPFd box was running full tilt on the CPU for quite some time, and then blew it's memory bounds with the cry of: OpenOSPFd process dies # UVM: pid 16359 (ospfd), uid 85 killed: out of swap
.
The sudden dump of hundreds of thousands of LSAs into the network had a much different wave of destruction than I thought it would have. I assumed before hand that the entire network would just explode pretty much all at once. However, the initial wave of chaos was mostly isolated to the boxes directly connected to the router responsible for flooding the LSAs into the network. The unforunate neighbors, acted as a temporary safety buffer further on into the network. Their loaded CPUs, flapping adjacencies, crashes, reboots and even deaths insolated routers further away from the 2GB Olive box. The network failed so badly, that the flooding process was severly impeded, and wound up being drawn out over a very long time. The further the box was in the network from the nasty LSA injector, the longer it seemed to take to load up it's LSDB. However, the overload of LSAs did eventually propogate towards everyone, just a bit slower and more chaotically. In turn the half of a million LSAs made it to every corner of the network.
The remaining participants in Cluster 1 were the first to actually slot all of the new external LSAs into memory - Vyatta, BIRD, Quagga and the 1GB Olive all managed the feat.
Vyatta with 500k+ of external LSAsvyatta@vyatta:~$ show ip ospf OSPF Routing Process, Router ID: 1.1.1.1 Supports only single TOS (TOS0) routes This implementation conforms to RFC2328 RFC1583Compatibility flag is disabled OpaqueCapability flag is disabled Initial SPF scheduling delay 200 millisec(s) Minimum hold time between consecutive SPFs 1000 millisec(s) Maximum hold time between consecutive SPFs 10000 millisec(s) Hold time multiplier is currently 1 SPF algorithm last executed 5.024s ago SPF timer is inactive Refresh timer 10 secs This router is an ASBR (injecting external routing information) Number of external LSA 500020. Checksum Sum 0xd4687780 Number of opaque AS LSA 0. Checksum Sum 0x00000000 Number of areas attached to this router: 1 Area ID: 0.0.0.0 (Backbone) Number of interfaces in this area: Total: 6, Active: 6 Number of fully adjacent neighbors in this area: 3 Area has no authentication SPF algorithm executed 1983 times Number of LSA 38 Number of router LSA 21. Checksum Sum 0x000a5a14 Number of network LSA 17. Checksum Sum 0x0008311c Number of summary LSA 0. Checksum Sum 0x00000000 Number of ASBR summary LSA 0. Checksum Sum 0x00000000 Number of NSSA LSA 0. Checksum Sum 0x00000000 Number of opaque link LSA 0. Checksum Sum 0x00000000 Number of opaque area LSA 0. Checksum Sum 0x00000000 vyatta@vyatta:~$BIRD Stuffed full of LSAs
bird> show ospf OSPFol: RFC1583 compatibility: disabled RT scheduler tick: 2 Number of areas: 1 Number of LSAs in DB: 500058 Area: 0.0.0.0 (0) [BACKBONE] Stub: No NSSA: No Transit: No Number of interfaces: 6 Number of neighbors: 9 Number of adjacent neighbors: 7 bird>
Curiously, Vyatta was sucking up a lot more CPU time than the Quagga instance was. However, as I was watching the LSA totals build up in Cluster 2 and 3, Quagga let out a death cry:
quagga-router# [156054.409876] Out of memory: Kill process 1584 (ospfd) score 695 or sacrifice child [156054.410789] Killed process 1584 (ospfd) total-vm:782616kB, anon-rss:707220kB, file-rss:0kB
Cluster 2 and 3 were well behind in loading up LSAs. The boxes still alive and kicking without complaint were up at about 250K of external LSAs
Representaive from Cluster 2, only has a little more than half of the LSAs at this point...juniper@SRX100-6_OSPF> show ospf database summary Area 0.0.0.0: 21 Router LSAs 8 Network LSAs 1 OpaqArea LSAs Externals: 252992 Extern LSAs Interface fe-0/0/0.2: Area 0.0.0.0: Interface fe-0/0/0.212: Area 0.0.0.0: Interface fe-0/0/0.22: Area 0.0.0.0: Interface fe-0/0/0.2201: Area 0.0.0.0: Interface fe-0/0/0.223: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@SRX100-6_OSPF>
After about 2 hours, things seemed to be converged as they were going to get. The Nagios map had settled down without change to look like this:
So when things had more or less gotten as bad as it looked like things were going to get, here is each participants status:
Fully loaded LSDB, however over time it appeared to be very, very slowly consuming all of it's memory.
The ospfd process terminated. The box still responded to pings, but did not attempt to participate in OSPF again.
The route injector was happily keeping it's 500K routes it injected up to date
Fully loaded LSDB. The CPU was running a fair amount, about 20% to 40%, but it seemed steady
The ospfd process died. The OS still allowed for pings, but it was done routing.
No xorp processes were left running, all had terminated. The OS still allowed the box to be pinged.
The 1GB Olive was running steady.
The routing process, rpd, was in a perpetual restart loop.
Was still working on loading LSAs, was up to 305469 external LSAs after 2 hours
The OSPF process was in a perpetual restart loop.
The entire system was in a perpetual reboot loop.
This one was also in a perpetual system reboot loop.
The OSPF process was in a perpetual restart loop.
The system responded to pings, but was otherwise completely dead. It did not particpate in routing, and would not offer a login prompt. It had closed the telnet connection I had open to it.
Was in a perpetual system restart loop, like all of the other Junos boxes in flow mode.
Still loading LSAs and holding steady, up to 385259.
OSPF was in a perpetual restart loop, and the system had disabled CEF forwarding mode.
Perpetual system restart loop.
RPD was in a perpetual restart loop.
This box was completely dead. No response to pings, no action on the console port.
Perpetual system restart loop.
If you care to go through the syslog messages, and see the crys for help and make your own conclusions about what happened, the syslog output is available for download here: run2_results.log.bz2. This file is compressed with bzip2 down to 143K, and expands to 2.3M. This is all of the syslog messages up until this point.
After two hours, at 15:20, I decided to stop C1R3 from keeping it's 500K of external LSAs fresh, and let it prematurely expire all of them by stopping the redistribution of BGP into OSPF:
[edit] juniper@Olive2GB# delete protocols ospf export EXPORT-BGP-TO-OSPF [edit] juniper@Olive2GB# commit commit complete [edit] juniper@Olive2GB#
You can see that now all of the external LSAs for any network in the 13.0.0.0/8 space it originated was now aged out.
juniper@Olive2GB> show ospf database OSPF database, Area 0.0.0.0 Type ID Adv Rtr Seq Age Opt Cksum Len Router 1.1.1.1 1.1.1.1 0x80000140 391 0x2 0x4844 96 Router *1.1.1.3 1.1.1.3 0x80000120 43 0x22 0xe89 108 Router 1.1.1.4 1.1.1.4 0x800000f0 1281 0x2 0x95b7 96 Router 1.1.1.7 1.1.1.7 0x800001a2 206 0x22 0xcc70 96 Router 2.2.2.1 2.2.2.1 0x8000013b 252 0x22 0x4ed7 96 Router 2.2.2.2 2.2.2.2 0x80000129 1567 0x22 0x578d 96 Router 2.2.2.3 2.2.2.3 0x80000076 17 0x22 0x50e6 108 Router 2.2.2.5 2.2.2.5 0x8000014e 113 0x22 0x279d 96 Router 2.2.2.6 2.2.2.6 0x80000255 1161 0x22 0x5a2 96 Router 2.2.2.7 2.2.2.7 0x800000c7 586 0x2 0x2a16 96 Router 3.0.0.10 3.0.0.10 0x80000096 310 0x2 0x64e3 60 Router 3.3.3.1 3.3.3.1 0x80000075 553 0x22 0x2dd 96 Router 3.3.3.2 3.3.3.2 0x800000d7 2787 0x22 0xfde8 120 Router 3.3.3.3 3.3.3.3 0x80000052 2742 0x22 0xee5 108 Router 3.3.3.4 3.3.3.4 0x800000ad 2531 0x22 0x1758 96 Router 3.3.3.5 3.3.3.5 0x80000169 160 0x22 0x4657 96 Router 3.3.3.7 3.3.3.7 0x80000050 500 0x22 0xdaa8 96 Network 1.0.0.7 1.1.1.7 0x8000005b 1935 0x22 0xd2c6 44 Network 1.1.7.1 1.1.1.1 0x800000bb 987 0x2 0x265d 32 Network *1.3.4.3 1.1.1.3 0x8000007b 552 0x22 0x970a 32 Network 3.0.0.7 3.3.3.7 0x80000034 108 0x22 0xb4e9 44 Network 22.2.2.22 2.2.2.2 0x80000009 2424 0x22 0xf5f0 32 Network 44.4.4.41 1.1.1.4 0x80000002 3600 0x2 0xfde2 32 Network 55.5.5.53 3.3.3.5 0x80000014 3600 0x22 0xc6c2 32 OSPF AS SCOPE link state database Type ID Adv Rtr Seq Age Opt Cksum Len Extern 1.1.0.128 1.1.1.1 0x800000f3 1048 0x2 0x8c38 36 Extern *1.3.0.128 1.1.1.3 0x80000093 552 0x22 0x7e96 36 Extern 1.4.0.128 1.1.1.4 0x800000c5 137 0x2 0xeade 36 Extern 1.7.0.128 1.1.1.7 0x8000008e 876 0x22 0x2c81 36 Extern 2.1.0.128 2.2.2.1 0x8000005c 261 0x22 0xeb60 36 Extern 2.2.0.128 2.2.2.2 0x80000031 3199 0x22 0x1cf4 36 Extern 2.3.0.128 2.2.2.3 0x80000032 1297 0x20 0x26e9 36 Extern 2.5.0.128 2.2.2.5 0x8000006c 153 0x22 0xeb64 36 Extern 2.6.0.128 2.2.2.6 0x800000a0 1324 0x20 0xeffb 36 Extern 2.7.0.128 2.2.2.7 0x8000003d 586 0x2 0x8b8f 36 Extern 3.1.0.128 3.3.3.1 0x80000025 1280 0x22 0x21f9 36 Extern 3.2.0.128 3.3.3.2 0x8000001e 1064 0x22 0x1d03 36 Extern 3.3.0.128 3.3.3.3 0x8000002e 1492 0x20 0x907 36 Extern 3.4.0.128 3.3.3.4 0x80000022 2926 0x22 0xf027 36 Extern 3.5.0.128 3.3.3.5 0x8000005c 684 0x22 0x6a71 36 Extern 3.7.0.128 3.3.3.7 0x80000030 500 0x20 0xbc49 36 Extern *13.1.122.0 1.1.1.3 0x80000008 3600 0x22 0xb603 36 Extern *13.1.122.63 1.1.1.3 0x80000008 3600 0x22 0xc7f0 36 Extern *13.1.122.64 1.1.1.3 0x80000008 3600 0x22 0xbdf9 36 Extern *13.1.122.127 1.1.1.3 0x80000008 3600 0x22 0xc3f4 36 Extern *13.1.122.128 1.1.1.3 0x80000008 3600 0x22 0xb9fd 36 Extern *13.1.122.191 1.1.1.3 0x80000008 3600 0x22 0xc275 36 Extern *13.1.122.192 1.1.1.3 0x80000008 3600 0x22 0xb87e 36 Extern *13.1.122.255 1.1.1.3 0x80000008 3600 0x22 0xbbfc 36 Extern *13.1.123.0 1.1.1.3 0x80000008 3600 0x22 0xb007 36 Extern *13.1.123.63 1.1.1.3 0x80000008 3600 0x22 0xbcfa 36 Extern *13.1.123.64 1.1.1.3 0x80000008 3600 0x22 0xb204 36 Extern *13.1.123.127 1.1.1.3 0x80000008 3600 0x22 0xb8fe 36 Extern *13.1.123.128 1.1.1.3 0x80000008 3600 0x22 0xae08 36 Extern *13.1.123.191 1.1.1.3 0x80000008 3600 0x22 0xb77f 36 Extern *13.1.123.192 1.1.1.3 0x80000008 3600 0x22 0xad88 36 Extern *13.1.124.0 1.1.1.3 0x80000008 3600 0x22 0xa017 36 Extern *13.1.124.63 1.1.1.3 0x80000008 3600 0x22 0xb105 36 Extern *13.1.124.64 1.1.1.3 0x80000008 3600 0x22 0xa70e 36 Extern *13.1.124.127 1.1.1.3 0x80000008 3600 0x22 0xad09 36 Extern *13.1.124.128 1.1.1.3 0x80000008 3600 0x22 0xa312 36 ... .. .
The task of removing all of these nasty advertisements from the network, proved to be just as destructive if not moreso, than introducing them.
The Vyatta CPU immediately pegged at 100%. The entire network went back into utter chaos again, with once fairly steady routers blinking between red and green. Then the ospfd Quagga process that was running on the Nagios machine imploded with the same out of memory condition as C1R5. Out of mercy, and the desire to keep an eye on the chaos I had mercy and restarted the Quagga daemons on the Nagios box. As there was still a ton of advertisements swamping the network, the router LSAs and host network LSAs were lost in the chrun. It took a about 50 minutes before things calmed down enough again that Quagga on the Nagios box started populating the operating systems routing table with some of the router LSAs and the host routes.
I was also suprised at how long some of the boxes kept the aged out LSAs in memory. For instance, the 2GB Olive box, although it had propmtly removed all of the routes from it's main routing table, still had 400,000+ LSAs still stuffed in it's LSDB. It purged about 100,000 of them in about 5 minutes, the remainder took about 1 hour and 30 minutes before there was no sign of any of them left anywhere in it's DB structures. It took almost an hour until I even began to see LSA counts start to drop in any of the other routers. BIRD was the first implementation to clear out the old LSAs, cleansing itself well before the 2GB Olive box.
BIRD Purged Well Before Anyone Elsebird> show ospf OSPFol: RFC1583 compatibility: disabled RT scheduler tick: 2 Number of areas: 1 Number of LSAs in DB: 37 Area: 0.0.0.0 (0) [BACKBONE] Stub: No NSSA: No Transit: No Number of interfaces: 6 Number of neighbors: 4 Number of adjacent neighbors: 4 bird>The Originator still had 294k Externals in it's LSDB
juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 17 Router LSAs 12 Network LSAs Externals: 294494 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB>
At 16:18, the Vyatta box started going completely bonkers with syslog messages complaining about Link State Updates equal to the Max LSA Age. In the syslog server there were 1,384,807 lines of these messages; and a ton of messages stating that the vyatta box had been throttled due to excessive message rate.
an 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.63),ar(1.1.1.3)]: LS age is equal to MaxAge. Jan 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.64),ar(1.1.1.3)]: LS age is equal to MaxAge. Jan 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.127),ar(1.1.1.3)]: LS age is equal to MaxAge.
At just shy of two hours after the redistribution problem was fixed, the network seemed to slow it's surging in and out and start to recover. There were still a lot of routers that were chock full of external LSAs...and still loading them!!! The J2300, which never had the full 500K value showing up in it's LSDB finally maxed out at about 16:30
juniper@J2300-7> show ospf database summary Area 0.0.0.0: 21 Router LSAs 23 Network LSAs 1 OpaqArea LSAs Externals: 500020 Extern LSAs Interface fe-0/0/0.22: Area 0.0.0.0: Interface fe-0/0/0.3: Area 0.0.0.0: Interface fe-0/0/0.312: Area 0.0.0.0: Interface fe-0/0/0.3201: Area 0.0.0.0: Interface fe-0/0/0.323: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@J2300-7>
Two hours after the LSAs were expired, the RB750G login prompt reappeared. The box was VERY sluggish for a while, but eventually started to respond to commands again. A few minutes later, the Vyatta VM had purged all of the old LSAs from memory. A lot of the host network advertisements began to reappar back on the network. Some of them surged in and out a lot, due to processes restarting and boxes rebooting (NS208, and all the Cisco devices).
At 16:50, the originator of all of the nasty LSAs had finally purged them all:
juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 16 Router LSAs 11 Network LSAs Externals: 15 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB>
Three hours after recover, old LSAs were still floating around as evidenced by these messages on the NS208 console:
ospf: receive self-orginated newer lsa with same seq -2147483604, but bigger checksum
And finally, a full six hours after the event started, and four hours after the madness was stopped, the network had seemed to once again reach a steady state.
And speaking of keeping LSAs for a long time, at 22:33, a full 8 hours after the LSAs were all expired, the J2300 still has them all in it's LSDB! And just for fun, here is a copy of the LSDB from the J2300 here: J2300_LSDB_500K.txt.bz2 This is a 2.1M bzip2 compressed file that expands to 37M of Link State Fun! And this is just the overview!
Let's take a look at what routers recovered, and which ones didn't. As well as what it took to get them talking to the rest of the network again.
Survived, but with some heavy CPU usage and massive syslogging.
Had to restart the ospfd process.
Survived!
Survived, some medium CPU usage.
Had to restart zebra and ospfd. Note: This host wasn't running daemon tools. If it had been, quagga may have been in an endless process restart cycle.
Had to restart the xorp processes.
Survived.
Survived.
Survived, but had to wait a very long time for the LSDB to clear out enough that it had enough CPU power to successfully announce itself to the network again.
Came back on it's own without intervention.
Came back on it's own without intervention.
Came back on it's own without intervention.
Came back on it's own without intervention.
Had to reboot the system as man of the former adjacencies it had were now stuck in ExStart or Exchange, was able to do this from the cli remotely.
Came back on it's own without intervention.
Survived.
Attempted to renable CEF like the switch was telling me to do, but this created another error on the box:
Jan 13 22:46:28.369: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (26 subsequent failures).
Attempted to reload the box from the cli through the console port, but the command failed due to lack of memory. Then the cli locked up completely, and the box finally quit spewing errors about low memory and disabling CEF (among other complaints). When it rebooted the memory overflowed again and they cycle repeated.
Came back on it's own without intervention.
Came back on it's own without intervention.
Had to power cylce the box, but to no avail. Stale LSAs killed this box repeatedly.
After 12 hours, there were still old stale LSAs floating round, and was still rebooting spontaneously.
Fortunately, there are a lot of different ways to protect router from being overloaded with LSAs to the point that it crashes and burns in a fit of LSAs.
set protocols ospf prefix-export-limit <integer from 0 to 4294967295>
. This basically will stop any redistribution once the router hits the limit specified. Of course if you set this to a silly-high number you're not really protecting anything.
Example:Protecting our Network with the prefix-export-limit
command.
[edit protocols ospf] juniper@Olive2GB# set prefix-export-limit 25 [edit protocols ospf] juniper@Olive2GB# commit commit completeAnd our OSPF database:
juniper@Olive2GB# run show ospf database summary Area 0.0.0.0: 8 Router LSAs 8 Network LSAs Externals: 7 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: [edit protocols ospf] juniper@Olive2GB#And check that our box is still bursting at the seams with routes learned via BGP:
juniper@Olive2GB# run show route summary Autonomous system number: 65066 Router ID: 1.1.1.3 inet.0: 500048 destinations, 500048 routes (500048 active, 0 holddown, 0 hidden) Restart Complete Direct: 8 routes, 8 active Local: 7 routes, 7 active OSPF: 32 routes, 32 active BGP: 500000 routes, 500000 active Static: 1 routes, 1 active [edit protocols ospf] juniper@Olive2GB#And repeat the same redistibution nightmare - 500,000 BGP routes into OSPF!
juniper@Olive2GB# set export EXPORT-BGP-TO-OSPF [edit protocols ospf] juniper@Olive2GB# commit commit complete [edit protocols ospf] juniper@Olive2GB#And we check our LSDB again a few minutes after we repeated our stupid mistake (but this time with a routing condom on):
juniper@Olive2GB# run show ospf database summary Area 0.0.0.0: 8 Router LSAs 8 Network LSAs Externals: 6 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: [edit protocols ospf] juniper@Olive2GB# startAnd we still have more or less the same number of external LSAs, in fact now we're missing one! Checking the logs on the Junos box reveals the following message:
Jan 14 21:59:46 Olive2GB rpd[1206]: RPD_OSPF_OVERLOAD: OSPF instance master topology default is going into overload state: number of export prefixes (26) exceeded maximum allowed (25)So basically it looks like the box protected the network but refusing to export anything if the prefix limit was reached. We'll verify this by checking the LSAs the router is originating, deleting our stupid export policy, and then check the self originated LSAs again.
[edit protocols ospf] juniper@Olive2GB# run show ospf database advertising-router self OSPF database, Area 0.0.0.0 Type ID Adv Rtr Seq Age Opt Cksum Len Router *1.1.1.3 1.1.1.3 0x80000153 271 0x22 0xf525 96 Network *1.2.3.3 1.1.1.3 0x8000009f 1729 0x22 0x4a37 32 Network *1.3.4.3 1.1.1.3 0x800000ac 494 0x22 0x353b 32 [edit protocols ospf] juniper@Olive2GB# delete export EXPORT-BGP-TO-OSPF [edit protocols ospf] juniper@Olive2GB# commit commit complete [edit protocols ospf] juniper@Olive2GB# run show ospf database advertising-router self OSPF database, Area 0.0.0.0 Type ID Adv Rtr Seq Age Opt Cksum Len Router *1.1.1.3 1.1.1.3 0x80000154 5 0x22 0x5abf 96 Network *1.2.3.3 1.1.1.3 0x8000009f 1751 0x22 0x4a37 32 Network *1.3.4.3 1.1.1.3 0x800000ac 516 0x22 0x353b 32 OSPF AS SCOPE link state database Type ID Adv Rtr Seq Age Opt Cksum Len Extern *1.3.0.128 1.1.1.3 0x80000001 5 0x22 0xa304 36 [edit protocols ospf] juniper@Olive2GB#Sure enough, after deleting the policy that blew the limit, the router is now originaing some external LSAs.
This method will protect your network from the killer flood of LSAs, but any LSAs that the router should have been injecting, like maybe a default route will be sacrificed to pay homage to the router gods in the process. You may loose some valuable routes this way, but you're not likely to take out your entire network in a chaotic outage that lasts for hours on end.
Does not appear to support this type of feature.
Does not appear to support this type of feature.
Junos doesn't support this until version 10.2
Does not appear to support this type of feature.
Does not appear to support this type of feature.
Does not appear to support this type of feature.
Junos doesn't support this until version 10.2.
maximum-lsa
You can also set a warning threshold to fire off an alarm and send a trap to a NMS when you're approaching the LSA limit that's been set by specifying a percentage with warning-threshold
. When this theshold is violated, essential the router moves into the ignore state. While the OSPF process is in the ignore state, it essentially stops listening to all LSAs. Of course, this means that the router drops all of it's OSPF neighbors during that time..
When the OSPF process first starts ignoring LSAs, it also starts a timer that determines how long the router will cast away all of the LSAs that happen to collide with one of it's interfaces, and it increments a counter that keeps track of how many times it's been in the ignore state. When the ignore timer expires, it starts listening to advertisements again, and another timer starts: the retry timer. The retry timer is a short test period to see if the oversized mass of LSAs that blew the threshold is still floating around or not. If the timer expires, the network has passed the safety check and the database protection state is reset back to the beginning with the ignore counter set to zero. However, if the LSA nasties swamp the router again when it's OSPF adjacencies come back up, it moves back into the ignore state once more and increments the counter.
Whenever the ignore counter is incremented, there is one more check that is done. The ignore counter has a maximum value it can reach before the router moves into the dreaded isolation state. In the isolation state, the router has decided that the network is too crappy to even bother with, and would rather isolate itself than deal with all of the LSAs again. When the router is isolated, it has basically given up and will need operator intervention to bring it back up.
The timers can be tweaked: the ignore timer is set with ignore-time <seconds>
, the retry timer is set with reset-time <seconds>
. The maximum value of times the OSPF process will move into the ignore state before isolation is set with the ignore-count <integer>
. The defaults for Junos if database protection is enabled are: 600 seconds for the reset-time, and 300 seconds for the ignore time. The maximum value of the ignore counter is 5.
Lets do a quick run and see how all of this works in practice. We'll set the maximum number of LSAs to 640. We'll configure a warning alarm to fire off at 80% of the threshold value, and we'll set all the timers to fairly low values so we're not waiting around all day for things to happen.
Database-Protection Values for Quick Test
protocols { ospf { database-protection { maximum-lsa 640; warning-threshold 80; ignore-count 3; ignore-time 60; reset-time 300; } } }
You can see the values of all of the pertinant values with the command show ospf overview
. The timers and counters are all shown under the line "Database protection state:".
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes DoNotAge uncapable AS scope LSAs received with no DC bit: 5 Area scope LSAs received with no DC bit: 14 Database protection state: Normal Warning threshold: 80 percent Non self-generated LSAs: Current 25, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 0, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 8 Neighbors Up (in full state): 4 DoNotAge uncapable Area scope LSAs received with no DC bit: 14 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 87 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>
We'll do a quick neighbor check:
juniper@EX3200-2_OSPF> show ospf neighbor Address Interface State ID Pri Dead 11.1.1.11 ge-0/0/1.11 Full 1.1.1.1 128 39 2.0.0.6 ge-0/0/1.2 Full 2.2.2.6 128 31 2.0.0.10 ge-0/0/1.2 2Way 3.0.0.10 0 31 2.0.0.7 ge-0/0/1.2 Full 2.2.2.7 128 37 2.1.7.7 ge-0/0/1.217 Full 2.2.2.7 128 37 juniper@EX3200-2_OSPF>
And then we'll have the nasty operator of our Olive box start flooding an additionall 5000 external LSAs. In an instant, our neighbors are gone:
juniper@EX3200-2_OSPF> show ospf neighbor juniper@EX3200-2_OSPF>
Inspecting our logs, we find a bunch of nasty messages: a warning, followed by an error and notification that all of our neighbors were murdered.
Jan 16 21:19:14 EX3200-2_OSPF rpd[1081]: RPD_OSPF_LSA_WARNING_EXCEEDED: OSPF realm ospf-v2 number of non-local LSAs exceeded warning limit Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_LSA_MAXIMUM_EXCEEDED: OSPF realm ospf-v2 number of non-local LSAs exceeded maximum limit Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 11.1.1.11 (realm ospf-v2 ge-0/0/1.11 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum) Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.6 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum) Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.10 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from 2Way to Down due to KillNbr (event reason: exceeded database protection maximum) Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.7 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum) Jan 16 21:19:15 EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.1.7.7 (realm ospf-v2 ge-0/0/1.217 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum)
Checking our OSPF process, we find that the router is currently ignoring everybody, and has incremented the ignore counter to 1.
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (34 seconds remaining) Warning threshold: 80 percent Non self-generated LSAs: Current 0, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 1, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 89 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>
We'll kill the nasty behavior of our big LSA injector, and let the Ignore timer run it's course to zero.
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes DoNotAge uncapable AS scope LSAs received with no DC bit: 5 Area scope LSAs received with no DC bit: 13 Database protection state: Retry (288 seconds remaining) Warning threshold: 80 percent Non self-generated LSAs: Current 24, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 1, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 8 Neighbors Up (in full state): 4 DoNotAge uncapable Area scope LSAs received with no DC bit: 13 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 96 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>We can see now the OSPF process is in the "Retry" state and has another counter going. Note that we reformed our neighbors, and have 24 LSAs in our LSDB, which is within our limits. Letting this timer expire we see that our OSPF process's bursting LSDB wounds have all healed, and our "Ignore counter" is back to zero.
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes DoNotAge uncapable AS scope LSAs received with no DC bit: 5 Area scope LSAs received with no DC bit: 13 Database protection state: Normal Warning threshold: 80 percent Non self-generated LSAs: Current 24, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 0, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 8 Neighbors Up (in full state): 4 DoNotAge uncapable Area scope LSAs received with no DC bit: 13 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 97 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>
Now let's push the Ignore counter passed the allowed limit and see what happens.
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Isolate Warning threshold: 80 percent Non self-generated LSAs: Current 0, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 4, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 101 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>
We've moved into the Isolation state. The router has given up, and will need to be revived by manual intervention. Note there are no counters happily counting down, just the dreaded word Isolate. To revivie the OSPF process, you need to enter the command clear ospf database-protection
. This command can be run at any time, and will set any counters and timers back to the begnning even if the router hasn't quite given up and gone into the isolation state.
For the next run, we'll try some timers that will hopefully let the router come back to life on it's own, but not contribute to the churn so much
OSPF Database Protection Settings to be Used for the Next Test Runprotocols { ospf { database-protection { maximum-lsa 2560; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Same as all the other Junos boxes that support this. 1GB of RAM *5 = 5120 LSA max.
protocols { ospf { database-protection { maximum-lsa 5120; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Most versions of IOS 12.0 or greater support similar command and logic for LSA suppression as Junos does. ( I think IOS came first actually.). To enable LSDB protection on IOS, it's is done with the max-lsa <maximum number of non self-genrated LSAs> <warning threshold> ignore-time <minutes> reset-time <minutes> ignore-count <integer>
<
Let's do another quick test to see how IOS behaves with the following config applied to our 3640:
router ospf 1 max-lsa 640 ignore-time 1 reset-time 3 ignore-count 3
Something to watch out for, applying this command caused our OSPF process to restart! This did not happen with Junos.
C3640-1(config-router)#$ 75 ignore-count 3 ignore-time 1 reset-time 3 C3640-1(config-router)# Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Interface down or detached Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.1 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detachede Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached Jan 16 22:11:49.632: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached Jan 16 22:11:49.740: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from LOADING to FULL, Loading Done Jan 16 22:11:49.740: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from LOADING to FULL, Loading Donexit C3640-1(config)#exit C3640-1# Jan 16 22:11:55.016: %SYS-5-CONFIG_I: Configured from console by console Jan 16 22:11:57.836: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
And we can see that our protection has been enabled by examining the OSPF routing process, indicated by "Maximum number of non self-generated LSA allowed"
C3640-1#sh ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.3 Start time: 00:00:23.396, Time elapsed: 00:18:55.816 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 640 Threshold for warning message 75% Ignore-time 1 minutes, reset-time 3 minutes Ignore-count allowed 3, current ignore-count 0 It is an autonomous system boundary router Redistributing External Routes from, connected with metric mapped to 100, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 16. Checksum Sum 0x17ABDA Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 6 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 Area BACKBONE(0) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:11:10.608 ago SPF algorithm executed 2 times Area ranges are Number of LSA 28. Checksum Sum 0x3D99A7 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 15 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C3640-1#
After flooding another 5000 LSAs into our network, the 3640 complained to the console with the following message:
Jan 16 22:27:26.388: %OSPF-4-OSPF_MAX_LSA_THR: Threshold for maximum number of non self-generated LSA has been reached "ospf 1" - 480 LSAs Jan 16 22:27:26.484: %OSPF-4-OSPF_MAX_LSA: Maximum number of non self-generated LSA has been exceeded "ospf 1" - 641 LSAs
And checking our OSPF process status we see the router is Ignoring all neighbors:
C3640-1#sh ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.3 Start time: 00:00:23.396, Time elapsed: 00:25:04.932 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 640 Threshold for warning message 75% Ignore-time 1 minutes, reset-time 3 minutes Ignore-count allowed 3, current ignore-count 1 Ignoring all neighbors due to max-lsa limit, time remaining: 00:00:03 It is an autonomous system boundary router Redistributing External Routes from, connected with metric mapped to 100, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 1. Checksum Sum 0x26C0532 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 0 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 Area BACKBONE(0) (Inactive) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:00:52.276 ago SPF algorithm executed 1 times Area ranges are Number of LSA 1. Checksum Sum 0x3D2054 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C3640-1#
It actually took a minute for the 3640 to drop it's neighbors after the threshold was reached. Junos was instantaneous. Not shown, but before the neighbors actually dropped, the 3540 had actually loaded up more LSAs than the max specified. This one had Number of external LSA 921. Checksum Sum 0x42D1AF9
in its LSDB.
Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Interface down or detached Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.1 on FastEthernet0/0.2 from INIT to DOWN, Neighbor Down: Interface down or detached Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached Jan 16 22:28:26.569: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached Jan 16 22:28:26.621: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached0 1.1.1.3 18 0x80000002 0x0069F8 0
Letting the reset timer expire put's us back at the initial state, and our neighbors come back up, and we're back at the beginning again. Let's blow the ignore counter and see what happens:
C3640-1#sh ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.3 Start time: 00:00:23.396, Time elapsed: 00:50:41.748 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 640 Threshold for warning message 75% Ignore-time 1 minutes, reset-time 3 minutes Ignore-count allowed 3, current ignore-count 4 Permanently ignoring all neighbors due to max-lsa limit It is an autonomous system boundary router Redistributing External Routes from, connected with metric mapped to 100, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 1. Checksum Sum 0x2A5CDFC5 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 0 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 Area BACKBONE(0) (Inactive) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:02:26.144 ago SPF algorithm executed 1 times Area ranges are Number of LSA 1. Checksum Sum 0xA0651E Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C3640-1#
Note the word permanent! This router has given up on our LSA stuffed network. In order to get IOS to come back from isolation, the OSPF process needs to be restarted:
C3640-1#clear ip ospf 1 process Reset OSPF process? [no]: yes C3640-1# Jan 16 22:56:55.269: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached Jan 16 22:56:55.269: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached Jan 16 22:56:55.357: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from LOADING to FULL, Loading Done Jan 16 22:56:59.681: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from LOADING to FULL, Loading Done Jan 16 22:57:03.025: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
So four the next run, we'll use the following parameters on the 3640:
OSPF Database Protection Settings to be Used for the Next Test Runrouter ospf 1 max-lsa 640 75 ignore-time 30 reset-time 60 ignore-count 10Note: The warning threshold of 75% and the reset-time of 60 are the defaults, so they aren't actually visible in the config
Exactly the same as C2R2
protocols { ospf { database-protection { maximum-lsa 2560; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
ScreenOS uses a command under the OSPF protocol of lsa-threshold <time duration> <max number of LSAs>
set vrouter OSPF protocol ospf lsa-threshold 3600 640
Note, this only rate-limits the flooding. It doesn't seem to limit the database size. This may slow down the NS208 from imploding, but I don't think it will stop it.
ns208-> get vrouter OSPF protocol ospf VR: OSPF RouterId: 2.2.2.5 ---------------------------------- Status: enabled State: autonomous system boundary router Auto-Vlink creation: disabled Number of areas: 1 Number of external LSA(s): 5042 External LSAs with DNA: 0 Advertising default-route lsa: disabled Default-route learnt by ospf: will be added to the routing table RFC 1583 compatibility: disabled Hello packet flooding protection: disabled LSA flooding protection: enabled (threshold 640 packets per 3600 second(s)) Maximum Retransmit limit: For nbrs on demand-circuits 12 For nbrs on non-demand-circuits 24 Area 0.0.0.0 Total number of interfaces is 6, Active number of interfaces is 6 Intra-SPF algorithm executed 22 times Last Intra-SPF executed before 00:00:32 Number of LSA(s) is 22 Inter-SPF algorithm executed: 22 times Last Inter-SPF executed before 00:02:26 Extern-SPF algorithm executed: 13 times Last Extern-SPF executed before 00:02:26 SPF Aborted: 2 times ns208->
Same command as the Cisco 3640, but adjusted to account for having twice the amount of memory. Applying this command didn't seem to cause the OSPF process to restart, but removig it did.
router ospf 1 max-lsa 1280 75 ignore-time 30 reset-time 60 ignore-count 10
It does not appear the RouterOS supports this functionality in software.
Same as all the other Junos boxes that support this. 512MB of RAM *5 = 2560 LSA max.
protocols { ospf { database-protection { maximum-lsa 2560; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Junos doesn't support this until version 10.2
Same command as the C3640, except applying it didn't seem to restart our OSPF process.
router ospf 1 max-lsa 640 75 ignore-time 30 reset-time 60 ignore-count 10
Exactly the same as C2R2
protocols { ospf { database-protection { maximum-lsa 5120; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Exactly the same as C2R2
protocols { ospf { database-protection { maximum-lsa 5120; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Same as the othe Junos devices with 512MB.
protocols { ospf { database-protection { maximum-lsa 2560; warning-threshold 75; ignore-count 10; ignore-time 1800; reset-time 3600; } } }
Same as the other RouterBoard. Doesn't appear to support this functionality.
Same command as the other IOS boxes. Applying it does cause the OSPF process to pop on this platform.
router ospf 1 max-lsa 480 75 ignore-time 30 reset-time 60 ignore-count 10
This is a repeat of test 2, but the with LSA protection applied to the routers above. The protection definately hasn't been applied to every router on the list, notably Cluster 1 doesn't have a single router that supports any of it. However, all of the switches, Cisco boxes and SRXs have some sort of database protection applied. In addition, the Netscreen has LSA rate-limiting applied.
First, we'll load up our 2GB Olive with 500K of BGP prefixes
juniper@Olive2GB# run show bgp summary Groups: 1 Peers: 1 Down peers: 0 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 500000 500000 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... 172.20.10.117 65069 500188 232 0 245 11:25:14 500000/500000/500000/0 0/0/0/0 [edit] juniper@Olive2GB#
A quick check on or network with Nagios reveals everything is once again nice and green.
Nagios: Green, before the export of 500k BGP routes into OSPF.And off we go...
juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF [edit] juniper@Olive2GB# commit
Almost immediately, every router that had some sort of database protection applied starts to complain that it's LSAs: The EX3200, EX2200C, SRX100B, SRX100H in packet mode, SRX100H in flow mode, SRX210HE, 3640, 1760, 2811 and finally the 3750.
The NS208, which doesn't have database protection, but does throttle LSA advertisements sort of froze when it reached a certain level. Checking the OSPF process status repeately revealed the same number of LSAs in it's database. Note it has also aborted SPF calculations several times already.
ns208-> get vrouter OSPF protocol ospf VR: OSPF RouterId: 2.2.2.5 ---------------------------------- Status: enabled State: autonomous system boundary router Auto-Vlink creation: disabled Number of areas: 1 Number of external LSA(s): 13266 External LSAs with DNA: 0 Advertising default-route lsa: disabled Default-route learnt by ospf: will be added to the routing table RFC 1583 compatibility: disabled Hello packet flooding protection: disabled LSA flooding protection: enabled (threshold 640 packets per 3600 second(s)) Maximum Retransmit limit: For nbrs on demand-circuits 12 For nbrs on non-demand-circuits 24 Area 0.0.0.0 Total number of interfaces is 6, Active number of interfaces is 6 Intra-SPF algorithm executed 179 times Last Intra-SPF executed before 00:02:21 Number of LSA(s) is 29 Inter-SPF algorithm executed: 179 times Last Inter-SPF executed before 00:02:21 Extern-SPF algorithm executed: 125 times Last Extern-SPF executed before 00:02:21 SPF Aborted: 18 times ns208->
OpenOSPFd and Quagga both blew their memory bounds, deciding not to participate in the test any longer. XORP gave up and quit before any one could even say "Too many LSAs", and both of the RouterBoard boxes looked like they quit too, with the RB133 console port not budging, and the RB750G closing my telnet connection.
The 1GB boxes that survived the inital flood really loaded up thier LSDBs significantly faster, with Vyataa, BIRD, the 1GB Olive and the J2300 stuffed full of 500,000+ LSAs in a few minutes. On test run 2, with a lot of boxes thrashing about and rebooting it took hours for the J2300 to learn all of the external advertisements. After only 15 minutes, it looked like the network was really at more or less a converged state.
After about 15 minutes, things seemed to be already at a steady state. The Nagios map had settled down without change to look like this:
Each participants status:
Fully loaded LSDB.
The ospfd process terminated. The box still responded to pings, but did not attempt to participate in OSPF again.
The route injector was happily keeping it's 500K routes it injected up to date
Fully loaded LSDB.
The ospfd process for quagga was dead. The OS still responded to pings.
No xorp processes were left running, all had terminated. The OS still allowed the box to be pinged.
The 1GB Olive was running steady.
The EX3200 was igoring LSAs.
Ignoring LSAs.
Ignoring LSAs
Ignoring LSAs.
After surving for a longer peroid of time, the NS208 started to go into a reboot loop. It took a much longer time for it to overload and burst it's memory seams than before.
Ignoring LSAs.
The system responded to pings, but was otherwise completely dead. It did not particpate in routing, and would not offer a login prompt. It had closed the telnet connection I had open to it.
The SRX100B was ignoring LSAs.
Fully loaded LSDB.
Ignoring LSAs.
Ignoring LSAs.
Ignoring LSAs.
This box appeared to be completely dead. No response to pings, no action on the console port.
Ignoring LSAs. It also started to lodge a complaint about the OSPFd process hogging all of the CPUs clock cycles:.
Jan 20 09:28:41.466: %SYS-3-CPUHOG: Task is running for (2003)msecs, more than (2000)msecs (4/1),process = OSPF Router 1. -Traceback= 8142DAD0 813FFFDC 80561688 80565EFC
Since things already seemed pretty stable, and every full OSPF participant was fat full of all of the half a million LSAs, I decided to let the network cook in for another 15 minutes and stop the redistribution only a half an hour after it began. I programmed up a timed commit on the 2GB Olive to withdraw it's LSAs precisely at 10:30.
juniper@Olive2GB# delete protocols ospf export EXPORT-BGP-TO-OSPF [edit] juniper@Olive2GB# commit at 10:30 configuration check succeeds commit at will be executed at 2013-01-20 10:30:00 UTC The configuration has been changed but not committed Exiting configuration mode juniper@Olive2GB>
Right on queue, the 2GB Olive expired all of it's LSAs that fell in the 13/8 range.
juniper@Olive2GB> show ospf database OSPF database, Area 0.0.0.0 Type ID Adv Rtr Seq Age Opt Cksum Len Router 1.1.1.1 1.1.1.1 0x8000016b 341 0x2 0xcd96 96 Router 1.1.1.2 1.1.1.2 0x800000e7 1984 0x2 0x380c 96 Router *1.1.1.3 1.1.1.3 0x800000d7 82 0x22 0x3075 96 Router 1.1.1.4 1.1.1.4 0x800000d9 875 0x2 0x5d0a 96 Router 1.1.1.5 1.1.1.5 0x80000108 1889 0x2 0xf45b 108 Router 1.1.1.6 1.1.1.6 0x80000028 86 0x2 0x1ce7 96 Router 1.1.1.7 1.1.1.7 0x800000c1 65 0x22 0x73ae 96 Router 2.2.2.4 2.2.2.4 0x80000026 330 0x22 0x9862 96 Router 2.2.2.5 2.2.2.5 0x80000045 173 0x22 0x3c92 96 Router 2.2.2.7 2.2.2.7 0x80000271 469 0x2 0xd0c3 96 Router 3.0.0.10 3.0.0.10 0x8000014a 1587 0x2 0x902a 60 Router 3.3.3.1 3.3.3.1 0x8000002e 347 0x22 0xfedd 96 Router 3.3.3.2 3.3.3.2 0x80000020 212 0x22 0x4d21 96 Router 3.3.3.4 3.3.3.4 0x8000002d 340 0x22 0x18d7 96 Router 3.3.3.6 3.3.3.6 0x8000019d 262 0x2 0x3f2b 96 Network 1.0.0.4 1.1.1.4 0x800000f3 13 0x2 0x21f8 48 Network 1.1.2.2 1.1.1.2 0x800000dc 931 0x2 0xc4a5 32 Network 1.1.7.1 1.1.1.1 0x800000b2 433 0x2 0x3854 32 Network 1.2.3.2 1.1.1.2 0x800000b2 1478 0x2 0x167a 32 Network *1.3.4.3 1.1.1.3 0x80000087 2021 0x22 0x7f16 32 Network 2.0.0.4 2.2.2.4 0x80000001 336 0x22 0xd146 32 Network 2.0.0.5 2.2.2.5 0x80000003 215 0x22 0xff0f 32 Network 3.0.0.2 3.3.3.2 0x80000001 212 0x22 0x1af7 32 Network 11.1.1.31 3.3.3.1 0x80000011 343 0x22 0xfbe6 32 Network 22.2.2.21 1.1.1.2 0x800000a8 2135 0x2 0x8ae3 32 OpaqArea 1.0.0.0 3.3.3.6 0x8000017b 3600 0x42 0x7402 28 OSPF AS SCOPE link state database Type ID Adv Rtr Seq Age Opt Cksum Len Extern 1.1.0.128 1.1.1.1 0x800000f9 1566 0x2 0x803e 36 Extern 1.2.0.128 1.1.1.2 0x800000dc 2259 0x0 0x66a5 36 Extern *1.3.0.128 1.1.1.3 0x80000095 2021 0x22 0x7a98 36 Extern 1.4.0.128 1.1.1.4 0x800000dc 169 0x2 0xbcf5 36 Extern 1.5.0.128 1.1.1.5 0x800000e6 2085 0x2 0x81f7 36 Extern 1.6.0.128 1.1.1.6 0x80000017 464 0x2 0xf38 36 Extern 1.7.0.128 1.1.1.7 0x80000070 1014 0x22 0x6863 36 Extern 2.1.0.128 2.2.2.1 0x80000010 849 0x22 0x8414 36 Extern 2.2.0.128 2.2.2.2 0x80000010 3102 0x22 0x5ed3 36 Extern 2.4.0.128 2.2.2.4 0x80000010 826 0x22 0x3af3 36 Extern 2.5.0.128 2.2.2.5 0x8000001c 343 0x22 0x8c14 36 Extern 2.7.0.128 2.2.2.7 0x80000189 468 0x2 0xefdd 36 Extern 3.1.0.128 3.3.3.1 0x80000010 2867 0x22 0x4be4 36 Extern 3.2.0.128 3.3.3.2 0x8000000f 1106 0x22 0x3bf3 36 Extern 3.4.0.128 3.3.3.4 0x80000010 2208 0x22 0x1515 36 Extern 3.5.0.128 3.3.3.5 0x80000010 3205 0x22 0x325 36 Extern 3.6.0.128 3.3.3.6 0x8000017c 265 0x2 0xf6e1 36 Extern 3.7.0.128 3.3.3.7 0x80000018 276 0x20 0xec31 36 Extern *13.0.14.192 1.1.1.3 0x80000001 3600 0x22 0x7b30 36 Extern *13.0.14.255 1.1.1.3 0x80000001 3600 0x22 0x7eae 36 Extern *13.0.15.0 1.1.1.3 0x80000001 3600 0x22 0x73b8 36 Extern *13.0.15.63 1.1.1.3 0x80000001 3600 0x22 0x7fac 36 Extern *13.0.15.64 1.1.1.3 0x80000001 3600 0x22 0x75b5 36 Extern *13.0.15.127 1.1.1.3 0x80000001 3600 0x22 0x7bb0 36 Extern *13.0.15.128 1.1.1.3 0x80000001 3600 0x22 0x71b9 36 Extern *13.0.15.191 1.1.1.3 0x80000001 3600 0x22 0x7a31 36 Extern *13.0.15.192 1.1.1.3 0x80000001 3600 0x22 0x703a 36 Extern *13.0.16.0 1.1.1.3 0x80000001 3600 0x22 0x45ec 36 Extern *13.0.16.127 1.1.1.3 0x80000001 3600 0x22 0x70ba 36 Extern *13.0.16.255 1.1.1.3 0x80000001 3600 0x22 0x68c2 36 Extern *13.0.17.191 1.1.1.3 0x80000001 3600 0x22 0x6445 36 Extern *13.0.17.192 1.1.1.3 0x80000001 3600 0x22 0x5a4e 36 juniper@Olive2GB>
The other routers began to expire and pull LSAs from their LSDBs at a massive rate, much, much quicker than previously with the low memory routers stirring up network chaos. A mere 5 minutes later, hundreds of thousand of LSAs were already missing from the other full participants. Then at 10:36, the Quagga OSPF daemon on the Nagios box blew it's memory bounds, turning the hosts page into a sea of red.
At 10:39, the 2GB Olive had most of the LSAs cleared from it's LSDB. However, there was still some good network churn going on as there were anywhere from 400 to 800 external LSAs in the LSDB. Below are three checks at about 1 second intervals. Note the varied number of external LSAs.
juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 15 Router LSAs 10 Network LSAs Externals: 394 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 15 Router LSAs 10 Network LSAs Externals: 426 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 15 Router LSAs 10 Network LSAs Externals: 449 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB>
This external LSA variance went on until 32 minutes after it started to withdraw it's external LSAs.
At 11:02, the Vyatta box was already down to 122894 externals
vyatta@vyatta:~$ sh ip ospf OSPF Routing Process, Router ID: 1.1.1.1 Supports only single TOS (TOS0) routes This implementation conforms to RFC2328 RFC1583Compatibility flag is disabled OpaqueCapability flag is disabled Initial SPF scheduling delay 200 millisec(s) Minimum hold time between consecutive SPFs 1000 millisec(s) Maximum hold time between consecutive SPFs 10000 millisec(s) Hold time multiplier is currently 1 SPF algorithm last executed 13.519s ago SPF timer is inactive Refresh timer 10 secs This router is an ASBR (injecting external routing information) Number of external LSA 122894. Checksum Sum 0xf1bdda12 Number of opaque AS LSA 0. Checksum Sum 0x00000000 Number of areas attached to this router: 1 Area ID: 0.0.0.0 (Backbone) Number of interfaces in this area: Total: 6, Active: 6 Number of fully adjacent neighbors in this area: 3 Area has no authentication SPF algorithm executed 904 times Number of LSA 30 :
And BIRD was totally clear.
bird> show ospf OSPFol: RFC1583 compatibility: disabled RT scheduler tick: 2 Number of areas: 1 Number of LSAs in DB: 46 Area: 0.0.0.0 (0) [BACKBONE] Stub: No NSSA: No Transit: No Number of interfaces: 6 Number of neighbors: 7 Number of adjacent neighbors: 4 bird>
The 1GB Olive still had more than 300K of LSAs in it's memory, but it had already learned that they were all past their lifetime:
Olive1GBjuniper@Olive1GB> show ospf database | except " 3600 0x22 " | match ^External | count Count: 0 lines juniper@Olive1GB>
And the J2300 had already expired more than half of them:
juniper@J2300-7> show ospf dataabase | except " 3600 0x22 " | match ^Extern | count Count: 237801 lines
At 11:20, the RB750G let me back into it with a telnet session, and was already neigbhored up with everyone who was still playing the OSPF game. Vyatta began it's massive spew of MaxLSA messages to the syslog server. It continued this for another six minutes, then decided to take a break. Once it was done, it had a free and clean LSDB:
vyatta@vyatta:~$ sh ip ospf OSPF Routing Process, Router ID: 1.1.1.1 Supports only single TOS (TOS0) routes This implementation conforms to RFC2328 RFC1583Compatibility flag is disabled OpaqueCapability flag is disabled Initial SPF scheduling delay 200 millisec(s) Minimum hold time between consecutive SPFs 1000 millisec(s) Maximum hold time between consecutive SPFs 10000 millisec(s) Hold time multiplier is currently 1 SPF algorithm last executed 3.534s ago SPF timer is inactive Refresh timer 10 secs This router is an ASBR (injecting external routing information) Number of external LSA 16. Checksum Sum 0x0006cd4d Number of opaque AS LSA 0. Checksum Sum 0x00000000 Number of areas attached to this router: 1 Area ID: 0.0.0.0 (Backbone) Number of interfaces in this area: Total: 6, Active: 6 Number of fully adjacent neighbors in this area: 3 Area has no authentication SPF algorithm executed 966 times Number of LSA 28 Number of router LSA 19. Checksum Sum 0x000862be Number of network LSA 9. Checksum Sum 0x0005a665 Number of summary LSA 0. Checksum Sum 0x00000000 Number of ASBR summary LSA 0. Checksum Sum 0x00000000 Number of NSSA LSA 0. Checksum Sum 0x00000000 Number of opaque link LSA 0. Checksum Sum 0x00000000 Number of opaque area LSA 0. Checksum Sum 0x00000000 vyatta@vyatta:~$
All of the boxes with OSPF database protection were still ignoring the network, as exampled by the SRX100H
juniper@SRX100-6_OSPF> show ospf overview Instance: master Router ID: 2.2.2.2 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (719 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120 Ignore time: 1800, Reset time: 3600 Ignore count: Current 3, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 178 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@SRX100-6_OSPF>
Right before the hour mark, where all of the withdrawn LSAs should have expired whether or not they had been explicitly withdrawn I restarted the Quagga daemons on the Nagios box so it could see what was going on in the network.
A full hour after the slew of prefixes in the 13/8 were pulled from the network by the 2GB Olive, I checked the state of each router again:
The Vyatta box had purged itself of all of the LSAs from the "accident." However it stared spewing another round of MaxAge LSAs to the syslog server again.
Still dead.
Had expired and purged all of the exported LSAs.
juniper@Olive2GB> show ospf database summary Area 0.0.0.0: 17 Router LSAs 11 Network LSAs 1 OpaqArea LSAs Externals: 19 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive2GB>
Still happy!
Still a dead parrot.
The box was RIP, but this RIP had nothing to do with Bellman-Ford alogorythms.
Purged all expired LSAs.
juniper@Olive1GB> show ospf database summary Area 0.0.0.0: 17 Router LSAs 11 Network LSAs 1 OpaqArea LSAs Externals: 19 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.117: Area 0.0.0.0: Interface em1.167: Area 0.0.0.0: Interface em1.77: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: juniper@Olive1GB>
Ignoring LSAs.
juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (1625 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560 Ignore time: 1800, Reset time: 3600 Ignore count: Current 4, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 172 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>
Ignoring LSAs.
juniper@SRX100-6_OSPF> show ospf overview Instance: master Router ID: 2.2.2.2 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (1570 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120 Ignore time: 1800, Reset time: 3600 Ignore count: Current 4, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 193 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@SRX100-6_OSPF>
Ignoring LSAs
C3640-1#show ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.3 Start time: 00:00:23.404, Time elapsed: 14:00:16.232 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 640 Threshold for warning message 75% Ignore-time 30 minutes, reset-time 60 minutes Ignore-count allowed 10, current ignore-count 2 - time remaining: 00:24:40 It is an autonomous system boundary router Redistributing External Routes from, connected with metric mapped to 100, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 19. Checksum Sum 0x3ACC949 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 5 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 Area BACKBONE(0) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:01:20.192 ago SPF algorithm executed 68 times Area ranges are Number of LSA 29. Checksum Sum 0x3C9020 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 10 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C3640-1#
Ignoring LSAs.
juniper@SRX210HE_OSPF> show ospf overview Instance: master Router ID: 2.2.2.4 Route table index: 0 AS boundary router LSA refresh time: 50 minutes DoNotAge uncapable AS scope LSAs received with no DC bit: 5 Area scope LSAs received with no DC bit: 10 Database protection state: Retry (3145 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 45, Warning 3840, Allowed 5120 Ignore time: 1800, Reset time: 3600 Ignore count: Current 3, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 11 Neighbors Up (in full state): 5 DoNotAge uncapable Area scope LSAs received with no DC bit: 10 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 177 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@SRX210HE_OSPF>
Hadn't rebooted in more than 30 minutes -- a new record! And it was back in the network.
ns208-> get vrouter OSPF protocol ospf VR: OSPF RouterId: 2.2.2.5 ---------------------------------- Status: enabled State: autonomous system boundary router Auto-Vlink creation: disabled Number of areas: 1 Number of external LSA(s): 19 External LSAs with DNA: 0 Advertising default-route lsa: disabled Default-route learnt by ospf: will be added to the routing table RFC 1583 compatibility: disabled Hello packet flooding protection: disabled LSA flooding protection: enabled (threshold 640 packets per 3600 second(s)) Maximum Retransmit limit: For nbrs on demand-circuits 12 For nbrs on non-demand-circuits 24 Area 0.0.0.0 Total number of interfaces is 6, Active number of interfaces is 6 Intra-SPF algorithm executed 153 times Last Intra-SPF executed before 00:01:04 Number of LSA(s) is 28 Inter-SPF algorithm executed: 153 times Last Inter-SPF executed before 00:01:04 Extern-SPF algorithm executed: 201 times Last Extern-SPF executed before 00:01:04 SPF Aborted: 22 times ns208->
Note the number of times the SPF algorythm was aborted on the NS208.
Ignoring LSAs.
C2811-1#sh ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.6 Start time: 00:00:41.716, Time elapsed: 13:44:54.176 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 1280 Current number of non self-generated LSA 46 Threshold for warning message 75% Ignore-time 30 minutes, reset-time 60 minutes Ignore-count allowed 10, current ignore-count 2 - time remaining: 00:22:08 Event-log enabled, Maximum number of events: 1000, Mode: cyclic It is an autonomous system boundary router Redistributing External Routes from, connected, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 19. Checksum Sum 0x78607D1 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 5 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 IETF NSF helper support enabled Cisco NSF helper support enabled Reference bandwidth unit is 100 mbps Area BACKBONE(0.0.0.0) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:01:35.676 ago SPF algorithm executed 73 times Area ranges are Number of LSA 29. Checksum Sum 0x38E832 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 10 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C2811-1#
Was back in the network!
[admin@RB750G] /routing ospf lsa> print AREA TYPE ID ORIGINATOR SEQUENCE-NU... AGE backbone router 1.1.1.1 1.1.1.1 0x80000174 433 backbone router 1.1.1.3 1.1.1.3 0x800000DA 2237 backbone router 1.1.1.4 1.1.1.4 0x800000DC 440 backbone router 1.1.1.6 1.1.1.6 0x80000029 2140 backbone router 1.1.1.7 1.1.1.7 0x800000DA 415 backbone router 2.2.2.3 2.2.2.3 0x80000009 443 backbone router 2.2.2.4 2.2.2.4 0x80000032 440 backbone router 2.2.2.5 2.2.2.5 0x8000005C 273 backbone router 2.2.2.6 2.2.2.6 0x8000001D 272 backbone router 2.2.2.7 2.2.2.7 0x8000027E 294 backbone router 3.0.0.10 3.0.0.10 0x8000000D 152 backbone router 3.3.3.1 3.3.3.1 0x80000035 614 backbone router 3.3.3.2 3.3.3.2 0x8000003B 154 backbone router 3.3.3.3 3.3.3.3 0x80000005 206 backbone router 3.3.3.4 3.3.3.4 0x8000003E 506 backbone router 3.3.3.6 3.3.3.6 0x800001A3 279 backbone router 3.3.3.7 3.3.3.7 0x80000005 279 backbone network 1.0.0.4 1.1.1.4 0x800000F9 698 backbone network 1.1.7.1 1.1.1.1 0x800000B4 701 backbone network 1.3.4.3 1.1.1.3 0x80000089 1189 backbone network 2.0.0.7 2.2.2.7 0x8000000F 404 backbone network 3.0.0.7 3.3.3.7 0x80000003 349 backbone network 11.1.1.21 2.2.2.1 0x80000001 447 backbone network 11.1.1.31 3.3.3.1 0x80000012 621 backbone network 44.4.4.42 2.2.2.4 0x80000002 440 backbone network 44.4.4.43 3.3.3.4 0x80000003 509 backbone network 55.5.5.53 3.3.3.5 0x80000001 587 backbone network 77.7.7.73 3.3.3.7 0x80000002 300 backbone opaque-area 1.0.0.0 3.3.3.6 0x8000017E 1802 external as-external 1.1.0.128 1.1.1.1 0x800000FD 620 external as-external 1.3.0.128 1.1.1.3 0x80000097 487 external as-external 1.4.0.128 1.1.1.4 0x800000DE 455 external as-external 1.6.0.128 1.1.1.6 0x80000018 2506 external as-external 1.7.0.128 1.1.1.7 0x80000072 1239 external as-external 2.1.0.128 2.2.2.1 0x80000011 1736 external as-external 2.2.0.128 2.2.2.2 0x80000012 991 external as-external 2.3.0.128 2.2.2.3 0x80000003 261 external as-external 2.4.0.128 2.2.2.4 0x80000011 1712 external as-external 2.5.0.128 2.2.2.5 0x8000001E 629 external as-external 2.6.0.128 2.2.2.6 0x80000003 165 external as-external 2.7.0.128 2.2.2.7 0x8000018C 272 external as-external 3.1.0.128 3.3.3.1 0x80000012 758 external as-external 3.2.0.128 3.3.3.2 0x80000010 2871 external as-external 3.3.0.128 3.3.3.3 0x80000004 207 external as-external 3.4.0.128 3.3.3.4 0x80000011 3051 external as-external 3.5.0.128 3.3.3.5 0x80000012 1092 external as-external 3.6.0.128 3.3.3.6 0x8000017D 2219 external as-external 3.7.0.128 3.3.3.7 0x8000001B 408 [admin@RB750G] /routing ospf lsa>
Ignoring LSAs.
juniper@SRX100-5_OSPF> show ospf overview Instance: master Router ID: 3.3.3.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (1198 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560 Ignore time: 1800, Reset time: 3600 Ignore count: Current 4, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 175 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@SRX100-5_OSPF>
Had completely cleaned itself of more than half of the nasty LSAs.
copek@J2300-7> show ospf database summary Area 0.0.0.0: 20 Router LSAs 15 Network LSAs 1 OpaqArea LSAs Externals: 213082 Extern LSAs Interface fe-0/0/0.22: Area 0.0.0.0: Interface fe-0/0/0.3: Area 0.0.0.0: Interface fe-0/0/0.312: Area 0.0.0.0: Interface fe-0/0/0.3201: Area 0.0.0.0: Interface fe-0/0/0.323: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: copek@J2300-7>
And it had expired all of them, keeping only the "good" externals in it's LSDB
juniper@J2300-7> show ospf database | except " 3600 0x22 " | match ^Extern | count Count: 19 lines juniper@J2300-7>
The box had blown it's memory stack again and the ospf process restarted.
Jan 20 10:41:54.449: %SYS-2-MALLOCFAIL: Memory allocation of 38128 bytes failed from 0x18ECE58, alignment 0 Pool: Processor Free: 29940 Cause: Not enough free memory Alternate Pool: None Free: 0 Cause: No Alternate pool -Process= "HQM Stack Process", ipl= 0, pid= 142 -Traceback= 1EA47C0 1EA4F0C 294A8A0 294CDC0 294D024 2BC1BDC 18ECE5C 18C453C 1A72C88 1A69758 C3750-1# Jan 20 10:42:36.434: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.7 on Vlan3 from LOADING to FULL, Loading Done C3750-1#
However, the box was back in the network fully particpating in OSPF. The ospf process restart seems to have prematurely cut off the ignore state. But by the time this had happened, the number of external LSAs floating around in area 0.0.0.0 was at a safe level.
Ignoring LSAs.
juniper@SRX100-7_OSPF> show ospf overview Instance: master Router ID: 3.3.3.4 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (1050 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120 Ignore time: 1800, Reset time: 3600 Ignore count: Current 4, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 160 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@SRX100-7_OSPF>
Ignoring LSAs.
juniper@EX2200C-3> show ospf overview Instance: master Router ID: 3.3.3.5 Route table index: 0 AS boundary router LSA refresh time: 50 minutes Database protection state: Ignore (1033 seconds remaining) Warning threshold: 75 percent Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560 Ignore time: 1800, Reset time: 3600 Ignore count: Current 4, Allowed 10 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 0 Neighbors Up (in full state): 0 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 126 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed {master:0} juniper@EX2200C-3>
Miracuralously, this box was back from the dead. Fully neighbored up with the other active particpants.
Like the 3750, this box had blown it's memory bounds despite the LSDB protection.
-Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 8142BDA4 8142C27C 8142EAF8 8142EC00 8141483C 8142D6B4 813FFFDC 80561688 80565EFC Jan 20 10:34:31.112: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 80900 Cause: Memory fragmentation Alternate Pool: I/O Free: 651604 Cause: Memory fragmentation -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC Jan 20 10:35:01.121: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 88488 Cause: Memory fragmentation Alternate Pool: I/O Free: 655364 Cause: Memory fragmentation -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC Jan 20 10:36:04.012: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 127292 Cause: Memory fragmentation Alternate Pool: I/O Free: 661224 Cause: Memory fragmentation -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC Jan 20 10:37:27.025: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 Pool: Processor Free: 209132 Cause: Memory fragmentation Alternate Pool: I/O Free: 684212 Cause: Memory fragmentation -Process= "OSPF Router 1", ipl= 0, pid= 157 -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC Jan 20 10:41:35.398: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from FULL to DOWN, Neighbor Down: Too many retransmissions Jan 20 10:42:35.401: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from DOWN to DOWN, Neighbor Down: Ignore timer expired Jan 20 10:42:36.438: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from LOADING to FULL, Loading Done
After an hour though, it was ignoring the network LSAs
C1760-1#sh ip ospf 1 Routing Process "ospf 1" with ID 3.3.3.7 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 480 Threshold for warning message 75% Ignore-time 30 minutes, reset-time 60 minutes Ignore-count allowed 10, current ignore-count 3 - time remaining: 00:46:56 It is an autonomous system boundary router Redistributing External Routes from, connected, includes subnets in redistribution Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 19. Checksum Sum 0x3AE19A0 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 5 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 24365 Area BACKBONE(0) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:01:22.415 ago SPF algorithm executed 16 times Area ranges are Number of LSA 29. Checksum Sum 0x46971C Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 10 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C1760-1#
I waited another hour to let all of the ingore timers expire, and then checked to see what the final damage was two hours after the "event". Most everything had recovered by itself as evidenced by Nagios's monitoring page, a field of green with only a couple of bloody red gashes where OpenOSPFd, Quagga and XORP lived.
To bring the network fully back to life only took three actions:
It's very obvious that having routers remove themselves from OSPF had a really drastic effect on how the network behaved. The flooding of 500K external LSAs still had the same end effect, most of the network was unaccessible, but the timescales involved with reaching a converged states were really reduced. The damage done was really less severe as well. Most boxes on were able to recover on their own, no box had to be rebooted by hand, and segments of the network didn't have to be shut down. The recovery time, although it still took nearly 2 hours to come back as far as it could (due mostly to ignore LSA timers needing to expire) was remarkably quick compared to the network without any protection. The overall network churn was magnitudes of order less.
If all of the routers on the network had supported LSA database protection like the Junos > 10.2 version boxes, or the Ciscos I have no doubt the convergence and recovery times would have been even quicker. With a mix of boxes like here, the ignore timers need to be paced out enough that any stray LSAs that are floating around have a chance to expire by themselves, so when the protected boxes -- expecially the low memory ones that tend to develop serious problems **cough cough** 3750 **cough**, don't try to participate in OSPF to early and implode. Some of this will also depend on how vigilant the network operators are. If somebody leaks too many routes into the network on Friday afternoon, and nobody does anything until Monday, a lot of the boxes would have gone into isolation and yet others may have exploded during the periodic refreshes and refloods. On a carefully monitored network, the operators would have been altered to problems pretty quickly and would have a much easier time intervene and bring things back to normal.
With carefull tuning, and study of the network devices to realize what their limitations are, database protection can be a powerful tool in the arsenal against catastrophic mistakes. One would have to identify the weaker devices, like the Cisco 3750, and protect these to a higher degree than the other routers. Setting warning levels on database protection as well can serve as an early warning agianst creeping levels of external LSAs tha overtime might start to cause problems.
So we've seen that in great numbers, OSPF which is genarally regarded as quite quick can actually be pretty slow at delivering large amounts of routing information. On the otherhand, BGP which was designed to handel routing information at large scales is (in my opinion falsely) regarded as being fairly slow. So why not have a race!
Right away, I had to disqualify any router that didn't have enough memory to hold either a big BGP table or a big OSPF LSDB -- so anybody who is under 1GB is out! I also tried loading up the SRX100Hs and the SRX210HE with a big full BGP table, but their poor little MIPS processors were soon pretty overwhelmed and started to drop BGP connections. I also tried like hell to get BIRD to listen to a BGP port, but I couldn't get it to work at all. I booted XORP out of the race due to the persistent stability issues I've had with it. So the remaining players are: Vyatta, the Olives, Quagga, and the J2300. Since the topology now isolated the J2300, it was also given leave to be a member of Cluster 1. To try to make things fair, the Olive that is originating the routes was forced to be the DR for the segment.
So which is faster at blasting 500,000 prefixes from the Olive2GB to the rest of the routers, OSPF or BGP?
To test BGP, each router was set up in an internal BGP full mesh, with the 2GB Olive waiting to blast 500,000 BGP prefixes to all of the unsuspecting neighbors. OSPF is used to do what it should at this point, just provide topological information. We setup a new BGP group, and use two new policies on our iBGP neighbors: NHS to fix the next hop, and REJECT. REJECT is used simply to hold back all of the routes learned from exaBGP.
Additional Config on the 2GB Olive for iBGProuting-options { router-id 1.1.1.3; autonomous-system 65066; } protocols { bgp { group iBGP { type internal; local-address 1.1.1.3; export [ REJECT NHS ]; neighbor 1.1.1.1; neighbor 1.1.1.5; neighbor 1.1.1.7; neighbor 3.3.3.2; } } } policy-options { policy-statement FIX-NH { then { next-hop 172.20.1.1; } } policy-statement NHS { then { next-hop self; } } policy-statement REJECT { then reject; } }
With the 2GB Olive loaded up with 500,000 BGP routes. We delete the REJECT export policy and unleash the deluge.
[edit protocols bgp group iBGP] juniper@Olive2GB# delete export REJECT [edit protocols bgp group iBGP] juniper@Olive2GB# commit commit complete
I tried several methods of trying to put a stopwatch on how long it took each router to learn all the BGP routes. None of my scientific approaches like setting off maximum prefix warnings seemed to work really well or provide a common comparison across all of the different platforms. So I restorted to alternately excuting a command to check the time on the router, and then to check the overall status of the routing table.
BGP proved to be fairly quick, especially for the routers that are all VMs and just speaking across an OpenVswitch to one another
Adversting 500,000 BGP prefixes:Vyatta | Quagga | Olive | J2300 |
---|---|---|---|
5 seconds | 6 seconds | 12 seconds | 83 seconds |
To withraw the routes, the REJECT policy was simply reapplied. BGP proved to be even quicker at retracting prefixes.
Withdrawing 500,000 BGP prefixes:Vyatta | Quagga | Olive | J2300 |
---|---|---|---|
5 seconds | 4 seconds | 9 seconds | 38 seconds |
Next up was OSPF, using the same 500k routes, and the same export policy that readvertised BGP routes into OSPF that was used in the previous test runs. Not suprisingly, OSPF was a lot slower. Once again the machines talking directly to the OpenVswitch learned all of the new LSAs first, while the J2300 with it's old FastEthernet interfaces took almost twice as long. The three VM based routers all converged at more or less the same time.
Adversting 500,000 OSPF prefixes:Vyatta | Quagga | Olive | J2300 |
---|---|---|---|
549 seconds | 549 seconds | 549 seconds | 941 seconds |
Widthrawing OSPF was another story entirely. The Olive was a bit quicker pulling expired routes from it's inet.0 table, while Vyatta took an inordinate amount of time to yank the routes from its RIB. Quagga couldn't finish as it blew it's memory bounds. The J2300 was working so hard that the mgd daemon quit responding:
juniper@J2300-7> show route summary error: the routing subsystem is not responding to management requests
I'm not sure if it dropped a core, or it' still working on getting rid of the old LSAs. I can still ping the box, but the cli is basically frozen.
Withdrawing 500,000 OSPF prefixes:Vyatta | Quagga | Olive | J2300 |
---|---|---|---|
2119 seconds | Empty set. | 420 seconds | ? seconds |
BGP was defintely a ton faster at this scale. The other thing that doesn't really show up well is the reliablity and CPU time involved. While BGP prefixes were being learned or withdrawn, there were short CPU spikes. While OSPF LSAs were changing, there was a very slow logartyhmic kind of buildup in CPU usage over time -- especially in the Vyatta box. I also exercised BGP quite a bit, advertising and widthrawing all of the routes a few dozens of times. If there were any route dampening policies in place they surely would have had a lot of fun. Anyway, all of the BGP implmentations sood up very well and acted in a consistent and predictable manner. I only pushed and pulled OSPF once, and had 1 box implode, 1 box drag down to the point I can't really see what is going on, and 1 box push it's CPU usage up and up and up -- taking what I considered an inordinate amount of time to reconverge.
Definately, at scale BGP is demonstratively a far superior protocol.
Through all of this chaos, only two boxes wethered all of the tests. The 1GB Olive stood very strong through everything (once the NICs were corrected) taking everything that was thrown at it. The Vyatta box showed alot of strain at times, but came through relatively unscathed. So do they get a reward? YES! 500,000 prefixes via BGP and OSPF simultaneously!
A quick check to make sure we're ready to go:
juniper@Olive2GB# run show bgp summary Groups: 2 Peers: 5 Down peers: 2 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 500000 500000 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... 1.1.1.1 65066 424 289541 0 0 3:31:28 0/0/0/0 0/0/0/0 1.1.1.5 65066 2488 461405 0 1 2:23:27 Active 1.1.1.7 65066 3055 78531 0 1 6:38:00 0/0/0/0 0/0/0/0 3.3.3.2 65066 4 10 0 15 1:28:16 Active 172.20.10.117 65069 500354 416 0 2 20:36:37 500000/500000/500000/0 0/0/0/0 [edit] juniper@Olive2GB# run show route advertising-protocol bgp 1.1.1.1 [edit] juniper@Olive2GB# run show route advertising-protocol bgp 1.1.1.7 [edit] juniper@Olive2GB# run show ospf database summary Area 0.0.0.0: 4 Router LSAs 2 Network LSAs Externals: 3 Extern LSAs Interface em1.1: Area 0.0.0.0: Interface em1.123: Area 0.0.0.0: Interface em1.134: Area 0.0.0.0: Interface em1.33: Area 0.0.0.0: Interface em2.0: Area 0.0.0.0: Interface lo0.0: Area 0.0.0.0: [edit] juniper@Olive2GB#
juniper@Olive2GB# delete protocols bgp group iBGP export REJECT [edit] juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF [edit] juniper@Olive2GB# commit commit complete [edit] juniper@Olive2GB#
juniper@Olive1GB> show route summary Autonomous system number: 65066 Router ID: 1.1.1.7 inet.0: 501818 destinations, 1003354 routes (501818 active, 0 holddown, 0 hidden) Direct: 7 routes, 7 active Local: 6 routes, 6 active OSPF: 501548 routes, 501548 active BGP: 501792 routes, 256 active Static: 1 routes, 1 active juniper@Olive1GB>
Jan 21 21:34:50 vyatta kernel: [522722.103536] Out of memory: Kill process 1544 (ospfd) score 406 or sacrifice child Jan 21 21:34:50 vyatta kernel: [522722.107877] Killed process 1544 (ospfd) total-vm:424928kB, anon-rss:419876kB, file-rss:0kB Jan 21 21:34:50 vyatta kernel: [522722.190039] ospfd: page allocation failure: order:0, mode:0x201da Jan 21 21:34:50 vyatta kernel: [522722.190045] Pid: 1544, comm: ospfd Not tainted 3.3.8-1-586-vyatta-virt #1 Jan 21 21:34:50 vyatta kernel: [522722.190048] Call Trace: Jan 21 21:34:50 vyatta kernel: [522722.190057] [] ? warn_alloc_failed+0xc0/0xd1 Jan 21 21:34:50 vyatta kernel: [522722.190063] [ ] ? __alloc_pages_nodemask+0x577/0x5ce Jan 21 21:34:50 vyatta kernel: [522722.190067] [ ] ? filemap_fault+0x26a/0x32f Jan 21 21:34:50 vyatta kernel: [522722.190073] [ ] ? __do_fault+0x97/0x403 Jan 21 21:34:50 vyatta kernel: [522722.190078] [ ] ? handle_pte_fault+0x389/0x93a Jan 21 21:34:50 vyatta kernel: [522722.190084] [ ] ? common_interrupt+0x29/0x30 Jan 21 21:34:50 vyatta kernel: [522722.190088] [ ] ? handle_mm_fault+0x1e0/0x1f6 Jan 21 21:34:50 vyatta kernel: [522722.190094] [ ] ? do_page_fault+0x2cd/0x2e9 Jan 21 21:34:50 vyatta kernel: [522722.190099] [ ] ? tick_program_event+0x1c/0x1f Jan 21 21:34:50 vyatta kernel: [522722.190104] [ ] ? hrtimer_interrupt+0x143/0x1f5 Jan 21 21:34:50 vyatta kernel: [522722.190108] [ ] ? kvm_async_pf_task_wait+0x167/0x167 Jan 21 21:34:50 vyatta kernel: [522722.190112] [ ] ? error_code+0x5a/0x60 Jan 21 21:34:50 vyatta kernel: [522722.190117] [ ] ? detect_ht+0xc4/0x169 Jan 21 21:34:50 vyatta kernel: [522722.190120] [ ] ? kvm_async_pf_task_wait+0x167/0x167 Jan 21 21:34:50 vyatta kernel: [522722.190123] Mem-Info: Jan 21 21:34:50 vyatta kernel: [522722.190124] DMA per-cpu: Jan 21 21:34:50 vyatta kernel: [522722.190127] CPU 0: hi: 0, btch: 1 usd: 0 Jan 21 21:34:50 vyatta kernel: [522722.190129] Normal per-cpu: Jan 21 21:34:50 vyatta kernel: [522722.190131] CPU 0: hi: 186, btch: 31 usd: 0 Jan 21 21:34:50 vyatta kernel: [522722.190132] HighMem per-cpu: Jan 21 21:34:50 vyatta kernel: [522722.190134] CPU 0: hi: 42, btch: 7 usd: 0 Jan 21 21:34:50 vyatta kernel: [522722.190140] active_anon:225514 inactive_anon:31 isolated_anon:0 Jan 21 21:34:50 vyatta kernel: [522722.190141] active_file:8 inactive_file:11 isolated_file:0 Jan 21 21:34:50 vyatta kernel: [522722.190142] unevictable:0 dirty:0 writeback:0 unstable:0 Jan 21 21:34:50 vyatta kernel: [522722.190143] free:13235 slab_reclaimable:685 slab_unreclaimable:15401 Jan 21 21:34:50 vyatta kernel: [522722.190145] mapped:3 shmem:112 pagetables:572 bounce:0 Jan 21 21:34:50 vyatta kernel: [522722.190153] DMA free:4448kB min:784kB low:980kB high:1176kB active_anon:9676kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:1292kB kernel_stack:0kB pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jan 21 21:34:50 vyatta kernel: [522722.190159] lowmem_reserve[]: 0 869 1000 1000 Jan 21 21:34:50 vyatta kernel: [522722.190168] Normal free:48364kB min:44216kB low:55268kB high:66324kB active_anon:762424kB inactive_anon:0kB active_file:32kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:890008kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:2740kB slab_unreclaimable:60312kB kernel_stack:488kB pagetables:960kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:28213 all_unreclaimable? yes Jan 21 21:34:50 vyatta kernel: [522722.190174] lowmem_reserve[]: 0 0 1047 1047 Jan 21 21:34:50 vyatta kernel: [522722.190182] HighMem free:128kB min:128kB low:1792kB high:3456kB active_anon:129956kB inactive_anon:124kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:134112kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:448kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jan 21 21:34:50 vyatta kernel: [522722.190188] lowmem_reserve[]: 0 0 0 0 Jan 21 21:34:50 vyatta kernel: [522722.190192] DMA: 6*4kB 59*8kB 9*16kB 7*32kB 4*64kB 4*128kB 5*256kB 3*512kB 0*1024kB 0*2048kB 0*4096kB = 4448kB Jan 21 21:34:50 vyatta kernel: [522722.190200] Normal: 179*4kB 135*8kB 66*16kB 27*32kB 16*64kB 27*128kB 29*256kB 16*512kB 10*1024kB 1*2048kB 3*4096kB = 48388kB Jan 21 21:34:50 vyatta kernel: [522722.190209] HighMem: 18*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 128kB Jan 21 21:34:50 vyatta kernel: [522722.190217] 125 total pagecache pages Jan 21 21:34:50 vyatta kernel: [522722.190219] 0 pages in swap cache Jan 21 21:34:50 vyatta kernel: [522722.190221] Swap cache stats: add 0, delete 0, find 0/0 Jan 21 21:34:50 vyatta kernel: [522722.190223] Free swap = 0kB Jan 21 21:34:50 vyatta kernel: [522722.190224] Total swap = 0kB Jan 21 21:34:50 vyatta kernel: [522722.192911] 262126 pages RAM Jan 21 21:34:50 vyatta kernel: [522722.192914] 33792 pages HighMem Jan 21 21:34:50 vyatta kernel: [522722.192915] 3545 pages reserved Jan 21 21:34:50 vyatta kernel: [522722.192917] 160 pages shared Jan 21 21:34:50 vyatta kernel: [522722.192918] 245105 pages non-shared
The last one standing is the 1GB Olive, but the 2GB Olive is still feeding it routes!
juniper@Olive2GB# run show bgp summary Groups: 2 Peers: 7 Down peers: 2 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 727097 725300 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... 1.1.1.1 65066 43 13290 0 1 20:12 0/0/0/0 0/0/0/0 1.1.1.5 65066 2488 461405 0 1 3:01:45 Active 1.1.1.7 65066 3139 95008 0 1 7:16:18 0/0/0/0 0/0/0/0 3.3.3.2 65066 4 10 0 15 2:06:34 Connect 172.20.10.117 65069 500366 428 0 2 21:14:55 500000/500000/500000/0 0/0/0/0 172.20.10.118 65069 203010 10 0 0 26:48 203008/203008/203008/0 0/0/0/0 172.20.10.119 65069 24091 10 0 0 7:16 22292/24089/24089/0 0/0/0/0 [edit protocols bgp group exaBGP] juniper@Olive2GB#
And Vyatta's BGP connection times out, letting it restart OSPF!
But now we have our first complaints from the Olive. It's getting a bit unreponsive and logging some of it's concerns at 830K routes!
Jan 21 21:45:19 Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 998900 KB of memory, 123 percent of available
At almost 1.1 million routes ( in both BGP and OSPF) the Olive is inching up on it's available swap space
Jan 21 22:07:18 Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1308533 KB of memory, 171 percent of available
And now we're getting some pretty good scheduler slips
Jan 21 22:12:19 Olive1GB rpd[4952]: RPD_SCHED_SLIP: 8 sec scheduler slip, user: 1 sec 249321 usec, system: 0 sec, 0 usec
Finally, at just under 1.25 Million routes (*2, OSPF + BGP), some real problems:
Jan 21 22:35:19 Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1437839 KB of memory, 194 percent of available Jan 21 22:38:31 Olive1GB rpd[4952]: RPD_SCHED_SLIP: 183 sec scheduler slip, user: 1 sec 767385 usec, system: 0 sec, 138129 usec Jan 21 22:38:31 Olive1GB rpd[4952]: RPD_BFD_WRITE_ERROR: bfd_send: write error on pipe to bfdd (Broken pipe) Jan 21 22:38:31 Olive1GB rpd[4952]: bgp_pp_recv: rejecting connection from 1.1.1.3 (Internal AS 65066), peer in state Established Jan 21 22:38:31 Olive1GB rpd[4952]: bgp_pp_recv:2939: NOTIFICATION sent to 1.1.1.3+51359 (proto): code 6 (Cease) subcode 5 (Connection Rejected) Jan 21 22:38:31 Olive1GB rpd[4952]: RPD_PPM_WRITE_ERROR: ppm_send: write error on pipe to ppmd (Broken pipe) Jan 21 22:38:31 Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1438843 KB of memory, 194 percent of available
BGP is down. The router is unresponsive. I'd call it pretty much unusalbe at this point. And we got a massive scheuler slip of more than 4 minutes!
Jan 21 22:42:36 Olive1GB rpd[4952]: RPD_SCHED_SLIP: 242 sec scheduler slip, user: 3 sec 535128 usec, system: 1 sec, 683421 usec
So the ironman of routing was the Olive running Junos 10.0R4.7.
First, a few caveots and regrets
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version 0 | C | Plenty | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Router ID - www.blackhole-networks.com | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Area ID - OSPF Overload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum OK | Construction | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | PAGE IS A | +- GENERAL -+ | MESS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+