Overloading the OSPF LSDB

I recently did a lab demo where I illustrated the dangers of sloppy redistribution policies between different routing protocols (BGP and OSPF). I pumped about 30,000 BGP NLRIs into OSPF, and then formed a redistibution loop and got some really awesome route oscillation going on. This got me wondering just how many routes I could pump into OSPF, and what would happen when the lid on the LSDB overflowed.

I was planning on pumping a dump of a semi-current Internet routing table of about 400,000 prefixes I converted into static routes into OSPF, but then I found an easier and "more scalable" solution when I came across a really cool program called exaBGP. In a very brief summary, exaBGP is a BGP route announcer. The really cool thing about it, is you can use scripts to announce and withdraw routes dyanmically. I coded up a Python script to generate a bogus IPv4 prefix, and used this in conjunction with a shell script and exaBGP to advertise prefixes to a BGP router, who will then sloppily redistribute these into OSPF. The Python script doesn't take any care to avoid martians or bogons -- it just spits out a random network with a random CIDR mask.

I was also planning on just pumping my original test devices, a couple of Olives, full of OSPF routes to see what happened -- but then I got it in my head to see what all kinds of various implementations and platforms would do with a LSDB stuffed full of routes. So, I cobbled together a collection of hardware platforms I have and built up some virtual machines to take care of some sofware implementations to see what happens.

The Boxes

I managed to dig up and/or construct the following routing platforms:

I figured it would be the most fun to let all of the boxes neighbor on up with each other and subject them all to more or less the same treatment. The diagram below represents the topology:

Test Topology

Diagram: OSPF-Overload_topology.svg

I tried to "exercise" each implmentation a bit, so I created three clusters. Each router in a cluster peers with all of the other routers in a shared broadcast domain, then with two other peers via Non-Broacast-Multiple Access (NBMA) links, and then with two other rotuers in the other clusters via a point-to-multipoint link. When an implmentation didn't support a link type or I ran into a RFC intepetation issue, a broadcast network was substituted instead. Every implementation wound up supporting broadcast links nicely, and anything besides broadcast links wound up being a problem with most of the open platforms. MTUs were made adjusted on a link-by-link basis as needed.

Topology Details

Router Details

Vyatta

Router 1 in Cluster 1

Descripton: This is an install of the Vyatta Community Edition 32-bit Virtualization ISO V6.5 running on a KVM with 1 GB RAM and was installed on a 2GB virtual disk.

			vyatta@vyatta:~$ show version 
			Version:      VC6.5R1
			Description:  Vyatta Core 6.5 R1
			Copyright:    2006-2012 Vyatta, Inc.
			Built by:     autobuild@vyatta.com
			Built on:     Fri Nov 16 16:39:16 UTC 2012
			Build ID:     1211161701-334fb58
			System type:  Intel 32bit Virtual
			Boot via:     disk
			Hypervisor:   KVM
			HW model:     Bochs
			HW S/N:       Not Specified
			HW UUID:      C7CEB5F4-FBF9-475E-3FA3-C9136AF3141B
			Uptime:       20:40:21 up 4 min,  1 user,  load average: 0.00, 0.03, 0.02
			

Config:

			 interfaces {
			     ethernet eth1 {
			         duplex auto
			         hw-id 52:54:00:60:c3:d4
			         smp_affinity auto
			         speed auto
			         vif 1 {
			             address 1.0.0.1/24
			             ip {
			                 ospf {
			                     cost 1000
			                     dead-interval 40
			                     hello-interval 10
			                     priority 128
			                     retransmit-interval 5
			                     transmit-delay 1
			                 }
			             }
			             mtu 1496
			         }
			         vif 11 {
			             address 11.1.1.11/24
			             ip {
			                 ospf {
			                     cost 100
			                     dead-interval 40
			                     hello-interval 10
			                     priority 128
			                     retransmit-interval 5
			                     transmit-delay 1
			                 }
			             }
			         }
			         vif 112 {
			             address 1.1.2.1/24
			             ip {
			                 ospf {
			                     cost 10
			                     dead-interval 40
			                     hello-interval 10
			                     priority 128
			                     retransmit-interval 5
			                     transmit-delay 1
			                 }
			             }
			         }
			         vif 117 {
			             address 1.1.7.1/24
			             ip {
			                 ospf {
			                     cost 10
			                     dead-interval 40
			                     hello-interval 10
			                     priority 128
			                     retransmit-interval 5
			                     transmit-delay 1
			                 }
			             }
			             mtu 1496
			         }
			     }
			     loopback lo {
			         address 1.1.1.1/32
			     }
			 }
			 protocols {
			     ospf {
			         area 0.0.0.0 {
			             network 1.0.0.0/24
			             network 11.1.1.0/24
			             network 1.1.2.0/24
			             network 1.1.7.0/24
			         }
			         passive-interface lo
			     }
			 }
			

Impressions: I played with one in a VM by it's lonesome before, but his was my first time actually trying to get a Vyatta box to talk to another box. It feels like a wierd mix of Quagga and XORP -- which are in turn IOS-like and a Junos-like knockoffs. The config was like Junos, but with a strange meld of Cisco and Junos heirarchy. It has a Junos like commit and rollback process. The operational commands were very IOS like. I had a lot of problems getting Vyatta to work right. I had to routinely kick and non-broadcast type interfaces by removing and replacing the config, and I had a lot of problems with it not consistently remembering my commits from one reboot to the next. At first I liked this platform, but after working with it a bit I came to hate it almost as much as XORP. I would never use this on a real network.

Issues:

Where to get it: Go here if you want to experience the pain and frustration yourself -- www.vyatta.org

OpenOSPFd

Router 2 in Cluster 1

Descripton: This is an install of the stock ospfd OpenOSPFd daemon that is packaged with OpenBSD 5.2. The 64-bit version of OpenBSD 5.2 was installed on a KVM with 1 GB RAM and was installed on a 2GB virtual disk.

			# uname -a
			OpenBSD openospfd 5.2 GENERIC#309 amd64
			# md5 /usr/sbin/ospfd
			MD5 (/usr/sbin/ospfd) = dba2cdcb812566de71c7b43e804e8434
			

Config: /etc/ospfd.conf

			# $OpenBSD: ospfd.conf,v 1.4 2007/06/19 16:49:56 reyk Exp $
			
			# global configuration
			router-id 1.1.1.2
			metric 10
			router-priority 128
			
			# areas
			area 0.0.0.0 {
			        interface vlan1 { metric 1000 }
			        interface vlan22 
			        interface vlan112 { router-priority 250 }
			        interface vlan123 
			        interface lo1 { passive }
			}
			

Impressions: Back in the early 90's I used OpenBSD as my firewall device because I liked it's packetfilter a lot, it seemed like a really clean and simple and pretty pure BSD. However I was a bit turned off by the attitues of the OpenBSD developers -- "we need to take everyting and clean it up because everyting else is a messy poorly written security nightmare." Meanwhile, I had constant stability problems with OpenBSD on two different boxes. I tried FreeBSD in it's place when the pf packet filter was ported to it and never looked back. FreeBSD was rock solid on the same hardware that OpenBSD had fits on. Anyway, I had the same impression of the developers of all of the OpenBSD related projects -- especially OpenBGPd and OpenOSPFd so I was a bit skeptical going in. I installed OpenBSD for the first time again in about 10 years, and it hasn't changed a bit. In some ways this was really good, as the binary base is really tiny in comparison to most other modern OSes, and I didn't have to spend much time learning anything new. The config was a standard UNIX file which is parsed by OpenOSPF at boot. It was gated/Junos-like and wasn't very difficult. Howver, there was a lot of stuff missing from the implmentation -- like anything besides broadcast links. I had to change all of the interfaces that the opensospfd implementation particapated in to broadcast links. I also wasn't very impressed with the feedback I was given when from the operational tools or by the ospfd daemon iteslf -- especially when it encountered an error. The implementation was also clearly not checking things it was supposed to be chekcing either -- like agreeing on link type. Running this is a very UNIX daemon like experience, it configs, runs and debugs just like a standard UNIX daemon. What really mystified me though, was it was near impossible to get a version of the daemon itself. For the version I had to post a MD5 sum and the OS version! Overall though, it wasn't too bad an experience and the support on it really seemed tailored to running as part of a firewall cluster than anything else.

Issues:

Where to get it: At www.openbsd.org

Olive 2GB

Router 3 in Cluster 1

Descripton: This is KVM with 2GB RAM running Junos 10.0R4.7.

			juniper@Olive2GB> show version brief 
			Hostname: Olive2GB
			Model: olive
			JUNOS Base OS boot [10.0R4.7]
			JUNOS Base OS Software Suite [10.0R4.7]
			JUNOS Kernel Software Suite [10.0R4.7]
			JUNOS Crypto Software Suite [10.0R4.7]
			JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R4.7]
			JUNOS Packet Forwarding Engine Support (M20/M40) [10.0R4.7]
			JUNOS Online Documentation [10.0R4.7]
			JUNOS Voice Services Container package [10.0R4.7]
			JUNOS Border Gateway Function package [10.0R4.7]
			JUNOS Services AACL Container package [10.0R4.7]
			JUNOS Services LL-PDF Container package [10.0R4.7]
			JUNOS Services Stateful Firewall [10.0R4.7]
			JUNOS AppId Services [10.0R4.7]
			JUNOS IDP Services [10.0R4.7]
			JUNOS Routing Software Suite [10.0R4.7]
			
			juniper@Olive2GB>show system boot-messages 
			Copyright (c) 1996-2010, Juniper Networks, Inc.
			All rights reserved.
			Copyright (c) 1992-2006 The FreeBSD Project.
			Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
			        The Regents of the University of California. All rights reserved.
			JUNOS 10.0R4.7 #0: 2010-08-22 03:07:19 UTC
			    builder@ormonth.juniper.net:/volume/build/junos/10.0/release/10.0R4.7/obj-i386/bsd/sys/compile/JUNIPER
			MPTable: 
			Timecounter "i8254" frequency 1193182 Hz quality 0
			CPU: QEMU Virtual CPU version 1.0.1 (3092.96-MHz 686-class CPU)
			  Origin = "GenuineIntel"  Id = 0x623  Stepping = 3
			  Features=0x78bfbfd
			  Features2=0x80802001,>
			  AMD Features=0x20100800
			  AMD Features2=0x1
			real memory  = 2147475456 (2047 MB)
			avail memory = 2093432832 (1996 MB)
			

Config: This box has some extra config, as it's the first box I picked to do the evil deed of exporting BGP routes into OSPF.

			interfaces {
			    fxp0 {
			        unit 0 {
			            family inet {
			                address 172.20.1.66/24;
			            }
			        }
			    }
			    fxp1 {
			        vlan-tagging;
			        unit 1 {
			            vlan-id 1;
			            family inet {
			                address 1.0.0.3/24;
			            }
			        }
			        unit 33 {
			            vlan-id 33;
			            family inet {
			                address 33.3.3.31/24;
			            }
			        }
			        unit 123 {
			            vlan-id 123;
			            family inet {
			                address 1.2.3.3/24;
			            }
			        }
			        unit 134 {
			            vlan-id 134;
			            family inet {
			                address 1.3.4.3/24;
			            }
			        }
			    }
			    lo0 {
			        unit 0 {
			            family inet {
			                address 1.1.1.3/32;
			            }
			        }
			    }
			}
			routing-options {
			    static {
			        route 172.20.10.0/24 {
			            next-hop 172.20.1.1;
			            no-readvertise;
			        }
			    }
			    router-id 1.1.1.3;
			    autonomous-system 65066;
			}
			protocols {
			    bgp {
			        group exaBGP {
			            type external;
			            multihop {
			                ttl 10;
			            }
			            local-address 172.20.1.66;
			            import FIX-NH;
			            peer-as 65069;
			            neighbor 172.20.10.117;
			        }
			    }
			    ospf {
			        traceoptions {
			            file ospf.log;
			            flag error;
			            flag state;
			        }
			        export EXPORT-BGP-TO-OSPF;
			        area 0.0.0.0 {
			            interface lo0.0 {
			                passive;
			            }
			            interface fxp1.1 {
			                metric 1000;
			            }
			            interface fxp1.123 {
			                metric 10;
			            }
			            interface fxp1.33 {
			                interface-type p2mp;
			                metric 100;
			                hello-interval 30;
			                dead-interval 120;
			                neighbor 33.3.3.32;
			            }
			            interface fxp1.134 {
			                metric 10;
			            }
			        }
			    }
			}
			policy-options {
			    policy-statement EXPORT-BGP-TO-OSPF {
			        term STUPID {
			            from protocol bgp;
			            then accept;
			        }
			    }
			    policy-statement FIX-NH {
			        then {
			            next-hop 172.20.1.1;
			        }
			    }
			}
			

Impressions: I love Junos! This is definately the cleanest, easiest, and safest config method of the bunch. There are a few imitators such as XORP and Vyatta, but they don't come close to the refinement of Junos.

Issues:

Where to get it: At www.juniper.net

BIRD

Router 4 in Cluster 1

Descripton: This is an install of the BIRD Internet Routing Daemon on a KVM running Ubuntu 12.04 x86_64 with 1 Gig of RAM. BIRD was installed from the Ubuntu repositories with the command apt-get install bird.

			router@bird:~$ sudo birdc
			BIRD 1.3.4 ready.
			bird> 
			

Interface Config: /etc/network/interfaces All of the IP configuration was left up to the OS.

			# The loopback network interface
			auto lo
			iface lo inet loopback
			
			auto lo:0
			iface lo:0 inet static
			   address 1.1.1.3
			   netmask 255.255.255.255
			
			# The primary network interface
			auto eth0
			iface eth0 inet manual
			
			auto eth0.0001
			iface eth0.0001 inet static
			   address 1.0.0.4
			   netmask 255.255.255.0
			   mtu 1496
			   vlan_raw_device eth0
			
			auto eth0.0044
			iface eth0.0044 inet static
			   address 44.4.4.41
			   netmask 255.255.255.0
			   vlan_raw_device eth0
			
			auto eth0.0134
			iface eth0.0134 inet static
			   address 1.3.4.4
			   netmask 255.255.255.0
			   mtu 1496
			   vlan_raw_device eth0
			
			auto eth0.0145
			iface eth0.0145 inet static
			   address 1.4.5.4
			   netmask 255.255.255.0
			   vlan_raw_device eth0
			

BIRD Config: /etc/bird.conf All of the IP configuration was left up to the OS.

			# Override router ID
			router id 1.1.1.4;
			
			protocol kernel {
			#	learn;			# Learn all alien routes from the kernel
				persist;		# Don't remove routes on bird shutdown
				scan time 20;		# Scan kernel routing table every 20 seconds
				export all;		# Default is export none
			}
			
			protocol device {
				scan time 10;		# Scan interfaces every 10 seconds
			}
			
			protocol ospf OSPFol {
			       tick 2;
				rfc1583compat no;
				area 0.0.0.0 {
					stub no;
					interface "eth0.0001" {
						cost 1000;
						priority 128;
					};
					interface "eth0.0044" {
						cost 100;
						priority 128;
					};
					interface "eth0.0134" {
						cost 10;
					};
					interface "eth0.0145" {
						cost 10;
						type ptp;
					};
					interface "lo:0" {
					};
				};
			}
			
			

Impressions: This was actually my first time using BIRD. Like, OpenOSPFd, it is very UNIX daemon-like in it's configuration and operation. However, it has a pretty nice interactive cli for querying the status of the daemon that seemed to work pretty wll and be very intuitive. Overall BIRD was very easy to configure and debug and supported everyting I was trying to do (simplisitc OSPF) very easily. They have a good sense of humor too. In the style of GNU, BIRD stands for the BIRD Internet Routing Daemon.

Issues:

Where to get it: At BIRD Internet Routing Daemon

Quagga

Router 5 in Cluster 1

Descripton: This is an install of the stock quagga set of daemons that are packaged with Ubuntu 12.04 and was installed with apt-get install quagga. This is running on the same base install as the VM runnig BIRD -- also on a KVM with 1 Gb RAM.

			quagga-router# sh ver
			Quagga 0.99.20.1 (quagga-router).
			Copyright 1996-2005 Kunihiro Ishiguro, et al.
			quagga-router# 
			

Config: /etc/quagga/Quagga.conf

			interface eth0
			 ipv6 nd suppress-ra
			!
			interface eth0.0001
			 ip address 1.0.0.5/24
			 ip ospf cost 1000
			 ip ospf priority 128
			 ipv6 nd suppress-ra
			!
			interface eth0.0055
			 ip ospf cost 100
			 ip ospf priority 128
			 ipv6 nd suppress-ra
			!
			interface eth0.0145
			 ip address 1.4.5.5/24
			 ip ospf cost 10
			 ip ospf network point-to-point
			 ip ospf priority 128
			 ipv6 nd suppress-ra
			!
			interface eth0.0156
			 ip address 1.5.6.5/24
			 ip ospf cost 10
			 ip ospf priority 128
			 ipv6 nd suppress-ra
			!
			interface lo
			 description "Loopback Interface"
			!
			interface lo:0
			 ip address 1.1.1.5/32
			 ipv6 nd suppress-ra
			!
			router ospf
			 ospf router-id 1.1.1.5
			 passive-interface lo:0
			 network 1.0.0.0/24 area 0.0.0.0
			 network 1.1.1.5/32 area 0.0.0.0
			 network 1.4.5.0/24 area 0.0.0.0
			 network 1.5.6.0/24 area 0.0.0.0
			 network 55.5.5.0/24 area 0.0.0.0
			!
			

Impressions: I have a soft spot in my heart for Quagga as it's what I used to learn Cisco IOS and BGP (before I could afford a pair of used 3620 routers off eBay). I've run it on a ton of servers and VMs, including NetBSD, FreeBSD and all sorts of Linux distros. The config is very Cisco IOS-like, with a bit of sanity thrown in as far as network masks. However, I've also had a lot of problems with it. Quagga is a set of daemons -- a control daemon, and then one for each protocol you run -- and each has their own config and own interface, which is very fractured. It's possible to set Quagga up to use a integrated config, which is what I always do, but it always turns out to be a battle to get it working right and it's almost never smooth. I've also had some problems in my years of running it -- from OSPF stability issues, to memory leaks, to crashes, to wierd multicasting problems. On later versions of FreeBSD I seemed to always have issues getting it to use any multicasting -- so things like OSPF never worked right. Quagga also always seems to like to get in a battle with the underlying OS in terms of interface configurations. Even with my complaints, I still run Quagga on many a Linux box.

Issues:

Where to get it: At http://www.nongnu.org/quagga/

XORP

Router 6 in Cluster 1

Descripton: This is an install of the stock XORP suite available from the Ubuntu 12.04 repositories and was installed with apt-get install xorp. This is the same base KVM and OS install that was used for BIRD and Quagga, 64-bit Ubuntu 12.04 on a 2GB VM disk with 1 GB RAM. .

		router@xorp:~$ xorpsh
		Welcome to XORP on xorp
		router@xorp> show version 
		Version 1.8.3
		router@xorp> 
		

Config: /etc/xorp/config.boot

		/* XORP configuration file
		 *
		 * Configuration format: 1.1
		 * XORP version: 1.8.3
		 * Date: 2013/01/05 23:09:19.188435
		 * Host: xorp
		 * User: router
		 */
		
		protocols {
		    ospf4 {
		        router-id: 1.1.1.6
		        rfc1583-compatibility: false
		        ip-router-alert: false
		        area 0.0.0.0 {
		            area-type: "normal"
		            interface lo {
		                link-type: "broadcast"
		                vif lo {
		                    address 1.1.1.6 {
		                        priority: 128
		                        hello-interval: 10
		                        router-dead-interval: 40
		                        interface-cost: 1
		                        retransmit-interval: 5
		                        transit-delay: 1
		                        passive {
		                            disable: false
		                            host: false
		                        }
		                        disable: false
		                    }
		                }
		            }
		            interface eth0 {
		                link-type: "broadcast"
		                vif eth0 {
		                    address 1.0.0.6 {
		                        priority: 128
		                        hello-interval: 10
		                        router-dead-interval: 40
		                        interface-cost: 1000
		                        retransmit-interval: 5
		                        transit-delay: 1
		                        disable: false
		                    }
		                }
		            }
		            interface eth1 {
		                link-type: "broadcast"
		                vif eth1 {
		                    address 66.6.6.61 {
		                        priority: 128
		                        hello-interval: 10
		                        router-dead-interval: 40
		                        interface-cost: 100
		                        retransmit-interval: 5
		                        transit-delay: 1
		                        disable: false
		                    }
		                }
		            }
		            interface eth2 {
		                vif eth2 {
		                    address 1.5.6.6 {
		                        priority: 128
		                        hello-interval: 10
		                        router-dead-interval: 40
		                        interface-cost: 10
		                        retransmit-interval: 5
		                        transit-delay: 1
		                        disable: false
		                    }
		                }
		            }
		            interface eth3 {
		                link-type: "broadcast"
		                vif eth3 {
		                    address 1.6.7.6 {
		                        priority: 128
		                        hello-interval: 10
		                        router-dead-interval: 40
		                        interface-cost: 10
		                        retransmit-interval: 5
		                        transit-delay: 1
		                        disable: false
		                    }
		                }
		            }
		        }
		    }
		}
		fea {
		    unicast-forwarding4 {
		        disable: false
		        forwarding-entries {
		            retain-on-startup: false
		            retain-on-shutdown: false
		        }
		    }
		}
		interfaces {
		    restore-original-config-on-shutdown: false
		    interface eth0 {
		        description: ""
		        disable: false
		        discard: false
		        unreachable: false
		        management: false
		        parent-ifname: ""
		        iface-type: ""
		        vid: ""
		        default-system-config {
		        }
		    }
		    interface eth1 {
		        description: ""
		        disable: false
		        discard: false
		        unreachable: false
		        management: false
		        parent-ifname: ""
		        iface-type: ""
		        vid: ""
		        default-system-config {
		        }
		    }
		    interface eth2 {
		        description: ""
		        disable: false
		        discard: false
		        unreachable: false
		        management: false
		        parent-ifname: ""
		        iface-type: ""
		        vid: ""
		        default-system-config {
		        }
		    }
		    interface eth3 {
		        description: ""
		        disable: false
		        discard: false
		        unreachable: false
		        management: false
		        parent-ifname: ""
		        iface-type: ""
		        vid: ""
		        default-system-config {
		        }
		    }
		    interface lo {
		        description: ""
		        disable: false
		        discard: false
		        unreachable: false
		        management: false
		        parent-ifname: ""
		        iface-type: ""
		        vid: ""
		        vif lo {
		            disable: false
		            address 1.1.1.6 {
		                prefix-length: 32
		                disable: false
		            }
		        }
		    }
		}
		

Impressions: I actually used XORP on a couple of FreeBSD boxes some time back as an alternative to Quagga. Quagga was having some serious issues annoucing OSPF hellos to 225.0.0.1 on a couple of FreeBSD boxes I had -- so I searched for something else. Somehow I wound up compling XORP from source by hand, as none of the packaged versions seemed to work right. After some fiddling it filled the gap that Quagga had left in my FreeBSD world. Everything in XORP is done through a pretty poor knock off of the Junos cli. XORP and the OS had some real battles when it came to who was the one to setup VLAN interfaces on Linux. In the end I was so frustrated, that on the XORP box I setup three additional interfaces on the VM and let OpenvSwitch add the VLAN tags to the traffic coming off the interfaces. XORP also couldn't form any point-to-point adjacencies with any other router, so I had to change all of these to broadcast links. XORP also has the same annoying mannerisms as Vyatta when it comes to saving the config. You have commit it, then save it to a file by hand, and then copy it to be the boot config. There are probably easier ways to do this, but I found it a real annoyance. I kept forgetting to do the manual save file routine after I got things working. Junos has spoiled me. In the end, I actually hated XORP more than Vyatta. In retrospect, when I was tinkering with this back on by FreeBSD boxen, I have no idea why I didn't try BIRD instead.

Issues:

Where to get it: At XORP if you dare.

OpenOSPFd

Router 7 in Cluster 1

Descripton: This is a clone of the 2GB Olive, except with half the RAM. An olive running Junos 10.0R4.7.

			juniper@Olive1GB> show version 
			Hostname: Olive1GB
			Model: olive
			JUNOS Base OS boot [10.0R4.7]
			JUNOS Base OS Software Suite [10.0R4.7]
			JUNOS Kernel Software Suite [10.0R4.7]
			JUNOS Crypto Software Suite [10.0R4.7]
			JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R4.7]
			JUNOS Packet Forwarding Engine Support (M20/M40) [10.0R4.7]
			JUNOS Online Documentation [10.0R4.7]
			JUNOS Voice Services Container package [10.0R4.7]
			JUNOS Border Gateway Function package [10.0R4.7]
			JUNOS Services AACL Container package [10.0R4.7]
			JUNOS Services LL-PDF Container package [10.0R4.7]
			JUNOS Services Stateful Firewall [10.0R4.7]
			JUNOS AppId Services [10.0R4.7]
			JUNOS IDP Services [10.0R4.7]
			JUNOS Routing Software Suite [10.0R4.7]
			
			juniper@Olive1GB> show system boot-messages 
			Copyright (c) 1996-2010, Juniper Networks, Inc.
			All rights reserved.
			Copyright (c) 1992-2006 The FreeBSD Project.
			Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
			        The Regents of the University of California. All rights reserved.
			JUNOS 10.0R4.7 #0: 2010-08-22 03:07:19 UTC
			    builder@ormonth.juniper.net:/volume/build/junos/10.0/release/10.0R4.7/obj-i386/bsd/sys/compile/JUNIPER
			MPTable: 
			Timecounter "i8254" frequency 1193182 Hz quality 0
			CPU: QEMU Virtual CPU version 1.0.1 (3092.96-MHz 686-class CPU)
			  Origin = "GenuineIntel"  Id = 0x623  Stepping = 3
			  Features=0x78bfbfd
			  Features2=0x80802001,>
			  AMD Features=0x20100800
			  AMD Features2=0x1
			real memory  = 1073733632 (1023 MB)
			avail memory = 1038938112 (990 MB)
			
			

Config:

			interfaces {
			    fxp1 {
			        vlan-tagging;
			        unit 1 {
			            vlan-id 1;
			            family inet {
			                address 1.0.0.7/24;
			            }
			        }
			        unit 77 {
			            vlan-id 77;
			            family inet {
			                address 77.7.7.71/24;
			            }
			        }
			        unit 117 {
			            vlan-id 117;
			            family inet {
			                address 1.1.7.7/24;
			            }
			        }
			        unit 167 {
			            vlan-id 167;
			            family inet {
			                address 1.6.7.7/24;
			            }
			        }
			    }
			    lo0 {
			        unit 0 {
			            family inet {
			                address 1.1.1.7/32;
			            }
			        }
			    }
			}
			routing-options {
			    router-id 1.1.1.7;
			}
			protocols {
			    ospf {
			        traceoptions {
			            file ospf.log;
			            flag error;
			        }
			        area 0.0.0.0 {
			            interface lo0.0 {
			                passive;
			            }
			            interface fxp1.1 {
			                metric 1000;
			            }
			            interface fxp1.77 {
			                interface-type nbma;
			                metric 100;
			                neighbor 77.7.7.72;
			                neighbor 77.7.7.73;
			            }
			            interface fxp1.117 {
			                metric 10;
			            }
			            interface fxp1.167 {
			                metric 10;
			            }
			        }
			    }
			}
			

Impressions: I love Junos -- especially after dealing with XORP and Vyatta!

Issues:

Where to get it: At Juniper Networks

Juniper Networks EX3200

Router 1 in Cluster 2

Descripton: This is a Juniper Networks EX3200-24T switch. It's running Junos 12.2R2.4 and has 512MB of RAM.

			juniper@EX3200-2_OSPF> show version 
			Hostname: EX3200-2_OSPF
			Model: ex3200-24t
			JUNOS Base OS boot [12.2R2.4]
			JUNOS Base OS Software Suite [12.2R2.4]
			JUNOS Kernel Software Suite [12.2R2.4]
			JUNOS Crypto Software Suite [12.2R2.4]
			JUNOS Online Documentation [12.2R2.4]
			JUNOS Enterprise Software Suite [12.2R2.4]
			JUNOS Packet Forwarding Engine Enterprise Software Suite [12.2R2.4]
			JUNOS Routing Software Suite [12.2R2.4]
			JUNOS Web Management [12.2R2.4]
			JUNOS FIPS mode utilities [12.2R2.4]
			
			juniper@EX3200-2_OSPF> show chassis hardware 
			Hardware inventory:
			Item             Version  Part number  Serial number     Description
			Chassis                                XX0000000000      EX3200-24T
			Routing Engine 0 REV 11   750-021261   XX0000000000      EX3200-24T, 8 POE
			FPC 0            REV 11   750-021261   XX0000000000      EX3200-24T, 8 POE
			  CPU                     BUILTIN      BUILTIN           FPC CPU
			  PIC 0                   BUILTIN      BUILTIN           24x 10/100/1000 Base-T
			Power Supply 0   REV 03   740-020957   XX0000000000      PS 320W AC
			Fan Tray                                                 Fan Tray
			

Config: The EX3200 does not allow for specification of neighbors on a point-to-mulitpoint interface.

			interfaces {
			    ge-0/0/1 {
			        vlan-tagging;
			        unit 2 {
			            vlan-id 2;
			            family inet {
			                address 2.0.0.1/24;
			            }
			        }
			        unit 11 {
			            vlan-id 11;
			            family inet {
			                address 11.1.1.21/24;
			            }
			        }
			        unit 212 {
			            vlan-id 212;
			            family inet {
			                address 2.1.2.1/24;
			            }
			        }
			        unit 217 {
			            vlan-id 217;
			            family inet {
			                address 2.1.7.1/24;
			            }
			        }
			    }
			    lo0 {
			        unit 0 {
			            family inet {
			                address 2.2.2.1/32;
			            }
			        }
			    }
			}
			routing-options {
			    router-id 2.2.2.1;
			}
			protocols {
			    ospf {
			        area 0.0.0.0 {
			            interface ge-0/0/1.2 {
			                metric 1000;
			            }
			            interface ge-0/0/1.212 {
			                interface-type p2p;
			                metric 10;
			            }
			            interface ge-0/0/1.217 {
			                interface-type p2p;
			                metric 10;
			            }
			            interface ge-0/0/1.11 {
			                metric 100;
			            }
			            interface lo0.0 {
			                passive;
			            }
			        }
			    }
			}
			

Impressions: This switch is setup as a full blown OSPF router, no need to do everything of routed VLAN interfaces (RVI, SVI, Bridge Interface, etc). This is a very fast, but very noisy switch from the fans.

Issues:

Where to get it: At Juniper Networks EX3200

SRX100H in Packet Mode

Router 2 in Cluster 2

Descripton: This is a Juniper Networks SRX100H with 1GB RAM with Junos 11.1R6.4 configured to run in packet mode, vice flow mode.

juniper@SRX100-6_OSPF> show version 
Hostname: SRX100-6_OSPF
Model: srx100h
JUNOS Software Release [12.1R4.7]

juniper@SRX100-6_OSPF> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX0000000000      SRX100H
Routing Engine   REV 19   750-021773   XX0000000000      RE-SRX100H
FPC 0                                                    FPC
  PIC 0                                                  8x FE Base PIC
Power Supply 0  

juniper@SRX100-6_OSPF> show system boot-messages | match memory 
real memory  = 1073741824 (1024MB)
avail memory = 526499840 (502MB)

Config: The command set security forwarding-options family mpls mode packet-based makes the SRX forward packets individually, rather than based on stateful flows, after a reboot. This can be seen with the operational command below:

juniper@SRX100-6_OSPF> show security flow status 
  Flow forwarding mode:
    Inet forwarding mode: packet based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: packet based
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off
And the config:
interfaces {
    fe-0/0/0 {
        vlan-tagging;
        unit 2 {
            vlan-id 2;
            family inet {
                address 2.0.0.2/24;
            }
        }
        unit 22 {
            vlan-id 22;
            family inet {
                address 22.2.2.22/24;
            }
        }
        unit 212 {
            vlan-id 212;
            family inet {
                address 2.1.2.2/24;
            }
        }
        unit 223 {
            vlan-id 223;
            family inet {
                address 2.2.3.2/24;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 2.2.2.2/32;
            }
        }
    }
}
routing-options {
    router-id 2.2.2.2;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0 {
                passive;
            }
            interface fe-0/0/0.2 {
                metric 1000;
            }
            interface fe-0/0/0.212 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.223 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.22 {
                metric 100;
            }
        }
    }
}
security {
    forwarding-options {
        family {
            mpls {
                mode packet-based;
            }
        }
    }
}

Impressions: The SRX100 is an awesome little platform. Quick, cheap and it does everything imaginable. These are great platforms for learning and testing! I bought a pair of these when they first came out to play with the High Availability features. Yes, these little boxes can form a nice little HA pair.

Issues:

Where to get it: At Juniper Networks SRX100

c3640

Router 3 in Cluster 2

Descripton: This is a Cisco 3640 with 128 MB of RAM running IOS 12.4(25b) Telco train.

C3640-1>sh ver
Cisco IOS Software, 3600 Software (C3640-TELCO-M), Version 12.4(25b), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2009 by Cisco Systems, Inc.
Compiled Wed 12-Aug-09 12:52 by prod_rel_team

ROM: System Bootstrap, Version 11.1(20)AA2, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)

C3640-1 uptime is 1 hour, 10 minutes
System returned to ROM by power-on
System restarted at 17:32:39 UTC Fri Jan 4 2013
System image file is "flash:c3640-telco-mz.124-25b.bin"

Cisco 3640 (R4700) processor (revision 0x00) with 92160K/38912K bytes of memory.
Processor board ID 21961615
R4700 CPU at 100MHz, Implementation 33, Rev 1.0
2 FastEthernet interfaces
2 Token Ring interfaces
3 Serial interfaces
DRAM configuration is 64 bits wide with parity disabled.
125K bytes of NVRAM.
32768K bytes of processor board System flash (Read/Write)
          
Configuration register is 0x2102

Config:

interface Loopback0
 ip address 2.2.2.3 255.255.255.255
!
interface FastEthernet0/0
 no ip address
 duplex auto
 speed auto
!
interface FastEthernet0/0.2
 encapsulation dot1Q 2
 ip address 2.0.0.3 255.255.255.0
 ip ospf cost 10
 ip ospf priority 128
!
interface FastEthernet0/0.33
 encapsulation dot1Q 33
 ip address 33.3.3.32 255.255.255.0
 ip mtu 1496
 ip ospf network point-to-multipoint
 ip ospf cost 100
 ip ospf priority 255
 ip ospf mtu-ignore
!
interface FastEthernet0/0.223
 encapsulation dot1Q 223
 ip address 2.2.3.3 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
!
interface FastEthernet0/0.234
 encapsulation dot1Q 234
 ip address 2.3.4.3 255.255.255.0
 ip ospf network point-to-point
 ip ospf priority 128
!
router ospf 1
 router-id 2.2.2.3
 log-adjacency-changes
 passive-interface Loopback0
 network 2.0.0.0 0.0.0.255 area 0
 network 2.2.2.3 0.0.0.0 area 0
 network 2.2.3.0 0.0.0.255 area 0
 network 2.3.4.0 0.0.0.255 area 0
 network 33.3.3.0 0.0.0.255 area 0
 neighbor 33.3.3.33 cost 100
 neighbor 33.3.3.31 cost 100
!

Impressions: I collected five of these in total from eBay over the years, and these are what I would consider to be my first "real" routers. I had a couple of Cisco 3620's as well, but those were too limited to do anything fun with. These are pretty versitile little boxes, if a tad bit slow, but they supported a ton of different interfaces. I used these to learn IS-IS and BGP, voice, ATM, as well as MPLS on Cisco. Until I came across a couple of 2811s for cheap, one of these functioned as my terminal server with an ASYNC-32A card.

Issues:

Where to get it: Since these are EOS and EOL for a long time now, you're only option is eBay.

Juniper Networks SRX210HE

Router 4 in Cluster 2

Descripton: This is a Juniper Networks SRX210HE running Junos 10.4R6.5 with 1GB of RAM that supports Power over Ethernet (PoE).

juniper@SRX210HE_OSPF> show version 
Hostname: SRX210HE_OSPF
Model: srx210he-poe
JUNOS Software Release [12.1R4.7]

juniper@SRX210HE_OSPF> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX0000000000      SRX210he-poe
Routing Engine   REV 05   750-034596   XX000000          RE-SRX210HE-POE
FPC 0                                                    FPC
  PIC 0                                                  2x GE, 6x FE, 1x 3G
Power Supply 0  

juniper@SRX210HE_OSPF> show system boot-messages | match memory 
real memory  = 1073741824 (1024MB)
avail memory = 527036416 (502MB)

Config:

interfaces {
    ge-0/0/0 {
        vlan-tagging;
        unit 2 {
            vlan-id 2;
            family inet {
                address 2.0.0.4/24;
            }
        }
        unit 44 {
            vlan-id 44;
            family inet {
                address 44.4.4.42/24;
            }
        }
        unit 234 {
            vlan-id 234;
            family inet {
                address 2.3.4.4/24;
            }
        }
        unit 245 {
            vlan-id 245;
            family inet {
                address 2.4.5.4/24;
            }
        }
    }
}
routing-options {
    router-id 2.2.2.4;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface ge-0/0/0.2 {
                metric 1000;
            }
            interface ge-0/0/0.234 {
                interface-type p2p;
                metric 10;
            }
            interface ge-0/0/0.245 {
                interface-type p2p;
                metric 10;
            }
            interface ge-0/0/0.44 {
                metric 100;
            }
        }
    }
}
security {
    zones {
        security-zone OSPF {
            host-inbound-traffic {
                system-services {
                    ping;
                }
                protocols {
                    ospf;
                }
            }
            interfaces {
                ge-0/0/0.2;
                lo0.0;
                ge-0/0/0.234;
                ge-0/0/0.245;
                ge-0/0/0.44;
            }
        }
    }
}
This one is functioning in flow (stateful) mode:
juniper@SRX210HE_OSPF> show security flow status 
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off

Impressions: This is an updated version of the SRX210H, same thing but has a slightly faster processor which is ironically the same one found in the SRX100 series. I bought a pair of SRX210Hs right when they came out several years ago, and one has been functioning as my main firewall/router ever since at home. Although there have been some growing pains, as with any new product, this has really matured into a really nice and very capable litlle box and I am very happy with it. I used my original SRX210s to learn flow mode Junos.

Issues:

Where to get it: At Juniper Networks

Juniper Networks Netscreen 208

Router 5 in Cluster 2

Descripton: This is a Netscreen NS-208 running ScreenOS 5.4.0r18.0 with 128MB of RAM.

ns208-> get system
Product Name: NetScreen-208
Serial Number: 0000000000000000, Control Number: 00000000
Hardware Version: 0110(0)-(12), FPGA checksum: 00000000, VLAN1 IP (0.0.0.0)
Software Version: 5.4.0r18.0, Type: Firewall+VPN
Compiled by build_master at: Tue Aug 17 08:51:27 PDT 2010
Base Mac: 0012.1ea3.8af0
File Name: screenos_image, Checksum: 6c2f30ed

Config:

set vrouter name "OSPF" id 1025 
set vrouter "OSPF"
unset auto-route-export
set protocol ospf
set enable
exit
set zone id 1002 "OSPF"
set zone "OSPF" vrouter "OSPF"
set zone "OSPF" tcp-rst 
set interface "ethernet2.2" tag 2 zone "OSPF"
set interface "ethernet2.4" tag 245 zone "OSPF"
set interface "ethernet2.5" tag 256 zone "OSPF"
set interface "ethernet2.7" tag 55 zone "OSPF"
set interface "loopback.2" zone "OSPF"
set interface ethernet2.2 ip 2.0.0.5/24
set interface ethernet2.2 route
set interface ethernet2.4 ip 2.4.5.5/24
set interface ethernet2.4 route
set interface ethernet2.5 ip 2.5.6.5/24
set interface ethernet2.5 route
set interface ethernet2.7 ip 55.5.5.52/24
set interface ethernet2.7 route
set interface loopback.2 ip 2.2.2.5/32
set interface loopback.2 route
set interface ethernet2.2 ip manageable
set interface ethernet2.4 ip manageable
set interface ethernet2.5 ip manageable
set interface ethernet2.7 ip manageable
set interface loopback.2 ip manageable
set interface ethernet2.2 manage ping
set interface ethernet2.4 manage ping
set interface ethernet2.5 manage ping
set interface ethernet2.7 manage ping
set interface ethernet2.7 manage ssh
set interface ethernet2.7 manage telnet
set interface ethernet2.7 manage snmp
set interface ethernet2.7 manage ssl
set interface ethernet2.7 manage web
set interface ethernet2.7 manage mtrace
set interface loopback.2 manage ping
set vrouter "OSPF"
set router-id 2.2.2.5
exit
set interface ethernet2.2 protocol ospf area 0.0.0.0
set interface ethernet2.2 protocol ospf enable
set interface ethernet2.2 protocol ospf priority 128
set interface ethernet2.2 protocol ospf cost 1000
set interface ethernet2.4 protocol ospf area 0.0.0.0
set interface ethernet2.4 protocol ospf link-type p2p
set interface ethernet2.4 protocol ospf enable
set interface ethernet2.4 protocol ospf priority 128
set interface ethernet2.4 protocol ospf cost 10
set interface ethernet2.5 protocol ospf area 0.0.0.0
set interface ethernet2.5 protocol ospf link-type p2p
set interface ethernet2.5 protocol ospf enable
set interface ethernet2.5 protocol ospf priority 128
set interface ethernet2.5 protocol ospf cost 10
set interface loopback.2 protocol ospf area 0.0.0.0
set interface loopback.2 protocol ospf passive
set interface ethernet2.7 protocol ospf area 0.0.0.0
set interface ethernet2.7 protocol ospf enable
set interface ethernet2.7 protocol ospf priority 128
set interface ethernet2.7 protocol ospf cost 100
set vrouter "OSPF"
exit

Impressions: I don't have to touch Netscreens too often, and for that I'm glad. I'm not a big fan of ScreenOS, but they are very reliable little boxes and a lot of people still use them. Additionally, flow mode Junos inherited a lot of the concepts and security modes of operation from the Netscreens. I find ScreenOS a really clunky take on Cisco's cli, except you use set,get and unset instead of show and no. I don't like working with it. It's not really intuitive, and for some reason has the backspace key mapped to ^H, so you have to remember to setup your terminal ahead of time (or change your profile) before connecting to one of these. To really do anything with these, I think you need to use NSM.

Issues:

Where to get it: These have been EOS and EOL for a quite a while, so you're only hope is eBay.

Cisco 2811

Router 6 in Cluster 2

Descripton: This is a Cisco 2811 running the Advanced Enterprise Train of IOS 15.1 with 256 MB of RAM

C2811-1>sh ver
Cisco IOS Software, 2800 Software (C2800NM-ADVENTERPRISEK9-M), Version 15.1(1)XB, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2009 by Cisco Systems, Inc.
Compiled Mon 21-Dec-09 01:14 by prod_rel_team

ROM: System Bootstrap, Version 12.4(1r) [hqluong 1r], RELEASE SOFTWARE (fc1)

C2811-1 uptime is 6 minutes
System returned to ROM by power-on
System image file is "flash:c2800nm-adventerprisek9-mz.151-1.XB.bin"

Cisco 2811 (revision 53.51) with 247808K/14336K bytes of memory.
Processor board ID FTX1045A5UA
1 DSL controller
18 FastEthernet interfaces
2 Serial(sync/async) interfaces
1 ATM interface
1 Virtual Private Network (VPN) Module
DRAM configuration is 64 bits wide with parity enabled.
239K bytes of non-volatile configuration memory.
125440K bytes of ATA CompactFlash (Read/Write)

Config:

!
interface Loopback0
 ip address 2.2.2.6 255.255.255.255
!
interface FastEthernet0/0
 no ip address
 duplex auto
 speed auto
!
interface FastEthernet0/0.2
 encapsulation dot1Q 2
 ip address 2.0.0.6 255.255.255.0
 ip ospf cost 1000
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
interface FastEthernet0/0.66
 encapsulation dot1Q 66
 ip address 66.6.6.62 255.255.255.0
 ip ospf cost 100
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
interface FastEthernet0/0.256
 encapsulation dot1Q 256
 ip address 2.5.6.6 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
interface FastEthernet0/0.267
 encapsulation dot1Q 267
 ip address 2.6.7.6 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
router ospf 1
 router-id 2.2.2.6
 log-adjacency-changes
 passive-interface Loopback0
 network 2.0.0.6 0.0.0.0 area 0.0.0.0
 network 2.2.2.6 0.0.0.0 area 0.0.0.0
!

Impressions: These were very nice little capable all around boxes. I'm currently using one with a NM-32A port as my terminal server for my lab.

Issues:

Where to get it: These are EOS now, so you need to search eBay if you want one.

Mikrotik 750g

Router 7 in Cluster 2

Descripton: This is a RouterBoard 750G running MikroTik RouterOS 5.22 with 32 MB of RAM.

[admin@RB750G] > system routerboard print 
       routerboard: yes
             model: 750G
     serial-number: 000000000000
  current-firmware: 2.41
  upgrade-firmware: 2.41
[admin@RB750G] > system resource print
                   uptime: 3d52m55s
                  version: 5.22
              free-memory: 17956KiB
             total-memory: 29696KiB
                      cpu: MIPS 24Kc V7.4
                cpu-count: 1
            cpu-frequency: 680MHz
                 cpu-load: 1%
           free-hdd-space: 33308KiB
          total-hdd-space: 61440KiB
  write-sect-since-reboot: 210579
         write-sect-total: 296987
               bad-blocks: 0%
        architecture-name: mipsbe
               board-name: RB750G
                 platform: MikroTik
[admin@RB750G] > 

Config: This is obtained with the /export command at the RouterOS cli. As with the other routers, parts I considered irrelevant to this exercise are not shown.

/interface bridge
add admin-mac=00:00:00:00:00:00 ageing-time=5m arp=enabled auto-mac=yes \
    disabled=no forward-delay=15s l2mtu=65535 max-message-age=20s mtu=1500 \
    name=Loopback1 priority=0x8000 protocol-mode=none transmit-hold-count=6
/interface ethernet
set 0 arp=enabled auto-negotiation=yes bandwidth=unlimited/unlimited \
    disabled=no full-duplex=yes l2mtu=1520 mac-address=00:0C:42:A5:C5:7F \
    master-port=none mtu=1500 name=OSPF speed=100Mbps
/interface vlan
add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=CLUSTER2 \
    use-service-tag=no vlan-id=2
add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=VLAN217 \
    use-service-tag=no vlan-id=217
add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=VLAN267 \
    use-service-tag=no vlan-id=267
add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1496 name=P2MP7 \
    use-service-tag=no vlan-id=77
/routing ospf instance
set [ find default=yes ] disabled=no distribute-default=never in-filter=\
    ospf-in metric-bgp=auto metric-connected=20 metric-default=1 \
    metric-other-ospf=auto metric-rip=20 metric-static=20 name=default \
    out-filter=ospf-out redistribute-bgp=no redistribute-connected=no \
    redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \
    router-id=2.2.2.7
/routing ospf area
set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\
    backbone type=default
/routing ospf-v3 area
set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\
    backbone type=default
/ip address
add address=2.2.2.7/32 disabled=no interface=Loopback1 network=2.2.2.7
add address=2.0.0.7/24 disabled=no interface=CLUSTER2 network=2.0.0.0
add address=2.1.7.7/24 disabled=no interface=VLAN217 network=2.1.7.0
add address=2.6.7.7/24 disabled=no interface=VLAN267 network=2.6.7.0
add address=77.7.7.72/24 disabled=no interface=P2MP7 network=77.7.7.0
/routing ospf interface
add authentication=none authentication-key="" authentication-key-id=1 cost=10 \
    dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\
    Loopback1 network-type=broadcast passive=yes priority=1 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=\
    1000 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \
    interface=CLUSTER2 network-type=broadcast passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=10 \
    dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\
    VLAN217 network-type=point-to-point passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=10 \
    dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\
    VLAN267 network-type=point-to-point passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=\
    100 dead-interval=2m disabled=no hello-interval=30s instance-id=0 \
    interface=P2MP7 network-type=nbma passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
/routing ospf nbma-neighbor
add address=77.7.7.71 disabled=no instance=default poll-interval=2m priority=\
    128
add address=77.7.7.73 disabled=no instance=default poll-interval=2m priority=\
    0
/routing ospf network
add area=backbone disabled=no network=2.2.2.7/32
add area=backbone disabled=no network=2.0.0.0/24
add area=backbone disabled=no network=2.1.7.0/24
add area=backbone disabled=no network=2.6.7.0/24
add area=backbone disabled=no network=77.7.7.0/24

Impressions: I first picked up a RouterBoard about 5 or 6 years ago just to play with, and because it was so damn cheap -- I mean REALLY cheap! These things have evolved over the years to be very capable little boxes. I've been using a RB600 as my Wireless Access Point (with 3 radios in it), for several years. I have a RB750 or RB750G scattered throuthout the house as well. The cli is kind of unique, but a bit like some of the old Marconi ATM switches I've used a couple of times. The cli takes a bit of getting used to, but you can wade your way through it without too many problems. However, MikroTik has a pretty capable GUI client called winbox. Unfortunately, it's Windows only and I'm a Linux/FreeBSD user, but it runs fine under wine. In the later versions of RouterOS, the built in WebFig interface (http and/or https) has really become very slick and extremely capable. It is easily the best web based router/swich config manager I've used. It's intuitive, fast, doesn't use a lot of router or client resources and is quite full featured. I'm a big cli user, but for RouterBoards I wind up using WebFig more than anything else. As far as the RB750G goes, it is amazingly cheap and capbable -- doing MPLS, MPLS-TE, VPLS, L3VPNs, BGP, IPv6, etc. The only thing missing is IS-IS, and one thing that bugs me a bit -- an actual interface called "loopback". If you want to use a loopback interface, you have to create a bridge interface and not bind any ports to it. The RB750s are kind of a pain to setup as well, as they have no serial ports, so you have to connect via the first ethernet port. If you forget how one was setup or the password after it's been lying around awhile, you have to factory reset the whole box. Kind of a pain, but it keeps the cost down.

Issues:

Where to get it: RouterBoards are available at http://routerboard.com/ and RouterOS is available at Mikrotik. The RB750G isn't available any more, it's been replaced with the RB750GL which looks to have twice the memory.

Juniper Networks SRX100B

Router 1 in Cluster 3

Descripton: This is a Juniper Networks SRX100B running Junos 12.1R3.5 with 512 MB of RAM

juniper@SRX100-5_OSPF> show version 
Hostname: SRX100-5_OSPF
Model: srx100b
JUNOS Software Release [12.1R4.7]

juniper@SRX100-5_OSPF> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX0000000000      SRX100B
Routing Engine   REV 02   750-021773   XX0000000000      RE-SRX100B
FPC 0                                                    FPC
  PIC 0                                                  8x FE Base PIC
Power Supply 0  

juniper@SRX100-5_OSPF> show system boot-messages | match memory 
real memory  = 536870912 (512MB)
avail memory = 304193536 (290MB)

juniper@SRX100-5_OSPF> 

Config:

interfaces {
    fe-0/0/1 {
        vlan-tagging;
        unit 3 {
            vlan-id 3;
            family inet {
                address 3.0.0.1/24;
            }
        }
        unit 11 {
            vlan-id 11;
            family inet {
                address 11.1.1.33/24;
            }
        }
        unit 312 {
            vlan-id 312;
            family inet {
                address 3.1.2.1/24;
            }
        }
        unit 317 {
            vlan-id 317;
            family inet {
                address 3.1.7.1/24;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 3.3.3.1/32;
            }
        }
    }
}
routing-options {
    router-id 3.3.3.1;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0 {
                passive;
            }
            interface fe-0/0/1.3 {
                metric 1000;
            }
            interface fe-0/0/1.312 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/1.317 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/1.11 {
                metric 100;
            }
        }
    }
}
security {
    zones {
        security-zone OSPF {
            host-inbound-traffic {
                system-services {
                    ping;
                }
                protocols {
                    ospf;
                }
            }
            interfaces {
                fe-0/0/1.3;
                lo0.0;
                fe-0/0/1.312;
                fe-0/0/1.317;
                fe-0/0/1.11;
            }
        }
    }
}
And this one is running in flow mode:
juniper@SRX100-5_OSPF> show security flow status 
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off

Impressions: This is the same thing as a SRX100H, but only has half of the RAM enabled. This is upgradable to a full SRX100H via a software license which "turns on" the remaining RAM. This is one of the first batch of SRX100s I bought.

Where to get it: At Juniper Networks SRX100

Issues:

Juniper Networks J2300

Router 2 in Cluster 2

Descripton: This is a Juniper Networks J2300 running Junos 9.3R4.4 with 1 GB of RAM.

juniper@J2300-7> show version 
Hostname: J2300-7
Model: j2300
JUNOS Software Release [9.3R4.4]

juniper@J2300-7> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX00000000        J2300
Routing Engine   REV 07   750-009992   XX00000000        RE-J.1
FPC 0            REV 05   750-011147   XX00000000        FPC
  PIC 0                                                  2x FE, 2x E1
Power Supply 0  

juniper@J2300-7> show system boot-messages | match memory 
real memory  = 1073741824 (1024 MB)
avail memory = 703148032 (670 MB)
pcib0:  pcibus 0 on motherboard

juniper@J2300-7> 

Config:

interfaces {
    fe-0/0/0 {
        vlan-tagging;
        unit 3 {
            vlan-id 3;
            family inet {
                address 3.0.0.2/24;
            }
        }
        unit 22 {
            vlan-id 22;
            family inet {
                address 22.2.2.23/24;
            }
        }
        unit 312 {
            vlan-id 312;
            family inet {
                address 3.1.2.2/24;
            }
        }
        unit 323 {
            vlan-id 323;
            family inet {
                address 3.2.3.2/24;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 3.3.3.2/32;
            }
        }
    }
}
routing-options {
    router-id 3.3.3.2;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0 {
                passive;
            }
            interface fe-0/0/0.3 {
                metric 1000;
            }
            interface fe-0/0/0.312 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.323 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.22 {
                metric 100;
            }
        }
    }
}

Impressions: This is one of a lot of J2300s that I got off of eBay. I used this one to study for the Class of Service, Multicast, and L2VPN sections of my JNCIE back a few years ago. These are actually really nice little boxes that run in packet mode only. I don't use them too much anymore since I got my SRXs, but I still find room and time to play with them.

Issues:

Where to get it: These are EOL and EOS, so you have to cruise eBay if you want one.

Cisco 3750

Router 3 in Cluster 2

Descripton: This is a Cisco 3750-24P running IOS 12.2(50)SE3 using the IP services image. It has 128 MB of RAM.

C3750-1>sh ver
Cisco IOS Software, C3750 Software (C3750-IPSERVICESK9-M), Version 12.2(50)SE3, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2009 by Cisco Systems, Inc.
Compiled Wed 22-Jul-09 06:19 by prod_rel_team
Image text-base: 0x01000000, data-base: 0x02D00000

ROM: Bootstrap program is C3750 boot loader
BOOTLDR: C3750 Boot Loader (C3750-HBOOT-M) Version 12.2(44)SE5, RELEASE SOFTWARE (fc1)

C3750-1 uptime is 1 minute
System returned to ROM by power-on
System image file is "flash:/c3750-ipservicesk9-mz.122-50.SE3.bin"

cisco WS-C3750-24P (PowerPC405) processor (revision C0) with 131072K bytes of memory.
Processor board ID CAT0834X15V
Last reset from power-on
5 Virtual Ethernet interfaces
24 FastEthernet interfaces
2 Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.

512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address       : 00:11:BB:DA:46:80
Motherboard assembly number     : 73-8312-04
Power supply part number        : 341-0029-03
Motherboard serial number       : XX000000000
Power supply serial number      : XX000000000
Model revision number           : C0
Motherboard revision number     : B0
Model number                    : WS-C3750-24PS-S
System serial number            : CAT0834X15V
Top Assembly Part Number        : 800-21982-01
Top Assembly Revision Number    : F0
Version ID                      : N/A
Hardware Board Revision Number  : 0x09


Switch Ports Model              SW Version            SW Image                 
------ ----- -----              ----------            ----------               
*    1 26    WS-C3750-24P       12.2(50)SE3           C3750-IPSERVICESK9-M     


Configuration register is 0xF

Config:

!
vlan 3,33,323,334 
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
!
interface FastEthernet1/0/1
 description "OSPF TEST"
 switchport trunk encapsulation dot1q
 switchport mode trunk
 switchport nonegotiate
!
interface Vlan3
 ip address 3.0.0.3 255.255.255.0
 ip ospf cost 1000
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
interface Vlan33
 ip ospf network point-to-multipoint
 ip ospf cost 100
 ip ospf priority 0
 ip ospf mtu-ignore
 ip ospf 1 area 0.0.0.0
!
interface Vlan323
 ip address 3.2.3.3 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
interface Vlan334
 ip address 3.3.4.3 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
 ip ospf 1 area 0.0.0.0
!
router ospf 1
 router-id 3.3.3.3
 log-adjacency-changes
 passive-interface Loopback0
 neighbor 33.3.3.32 cost 100
!

Impressions: This is definately a switch, all of the router-like functions had to be performed on RVI pseudo-interfaces. It still supported all of the OSPF interface types though.

Issues:

Where to get it: These are EOL, EOS as well => eBay.

Juniper Networks SRX100H

Router 5 in Cluster 3

Descripton: This is a SRX100H running Junos 11.4R5.5 with 1 GB of RAM.

juniper@SRX100-7_OSPF> show version 
Hostname: SRX100-7_OSPF
Model: srx100h
JUNOS Software Release [12.1R4.7]

juniper@SRX100-7_OSPF> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX0000000000      SRX100H
Routing Engine   REV 19   750-021773   XX0000000000      RE-SRX100H
FPC 0                                                    FPC
Power Supply 0  

juniper@SRX100-7_OSPF> show system boot-messages | match memory 
real memory  = 1073741824 (1024MB)
avail memory = 526438400 (502MB)

Config:

interfaces {
    fe-0/0/0 {
        vlan-tagging;
        unit 3 {
            vlan-id 3;
            family inet {
                address 3.0.0.4/24;
            }
        }
        unit 44 {
            vlan-id 44;
            family inet {
                address 44.4.4.43/24;
            }
        }
        unit 334 {
            vlan-id 334;
            family inet {
                address 3.3.4.4/24;
            }
        }
        unit 345 {
            vlan-id 345;
            family inet {
                address 3.4.5.4/24;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 3.3.3.4/32;
            }
        }
    }
}
routing-options {
    router-id 3.3.3.4;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0 {
                passive;
            }
            interface fe-0/0/0.3 {
                metric 1000;
            }
            interface fe-0/0/0.334 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.345 {
                interface-type p2p;
                metric 10;
            }
            interface fe-0/0/0.44 {
                metric 100;
            }
        }
    }
}
security {
    zones {
        security-zone OSPF {
            host-inbound-traffic {
                system-services {
                    ping;
                }
                protocols {
                    ospf;
                }
            }
            interfaces {
                fe-0/0/0.3;
                fe-0/0/0.334;
                fe-0/0/0.345;
                fe-0/0/0.44;
                lo0.0;
            }
        }
    }
}
And this one is also running in flow mode:
juniper@SRX100-7_OSPF> show security flow status 
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
    Advanced services data-plane memory mode: Default
  Flow trace status
    Flow tracing status: off

Impressions: This is a SRX100H running in flow mode to contrast it with the one running in packet mode.

Issues:

Where to get it: At Juniper Networks

Juniper Networks EX2200C

Router 5 in Cluster 3

Descripton: This is a Juniper Networks EX2200C running Junos with Junos 12.2R2.4 with PoE support and 512 MB of RAM.

{master:0}
juniper@EX2200C-3> show version 
fpc0:
--------------------------------------------------------------------------
Hostname: EX2200C-3
Model: ex2200-c-12p-2g
JUNOS Base OS boot [12.2R2.4]
JUNOS Base OS Software Suite [12.2R2.4]
JUNOS Kernel Software Suite [12.2R2.4]
JUNOS Crypto Software Suite [12.2R2.4]
JUNOS Online Documentation [12.2R2.4]
JUNOS Enterprise Software Suite [12.2R2.4]
JUNOS Packet Forwarding Engine Enterprise Software Suite [12.2R2.4]
JUNOS Routing Software Suite [12.2R2.4]
JUNOS Web Management [12.2R2.4]
JUNOS FIPS mode utilities [12.2R2.4]

{master:0}
juniper@EX2200C-3> show chassis hardware 
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XX0000000000      EX2200-C-12P-2G, POE+
Routing Engine 0 REV 05   650-036547   XX0000000000      EX2200-C-12P-2G, POE+
FPC 0            REV 05   650-036547   XX0000000000      EX2200-C-12P-2G, POE+
  CPU                     BUILTIN      BUILTIN           FPC CPU
  PIC 0                   BUILTIN      BUILTIN           12x 10/100/1000 Base-T
  PIC 1          REV 05   650-036547   XX0000000000      2x (10/100/1000 Base-T or GE SFP)
Power Supply 0                                           PS 180W AC

{master:0}
juniper@EX2200C-3> show system boot-messages | match memory 
real memory  = 536870912 (512 MB)
avail memory = 503500800 (480 MB)

Config:

interfaces {
    ge-0/0/8 {
        vlan-tagging;
        unit 3 {
            vlan-id 3;
            family inet {
                address 3.0.0.5/24;
            }
        }
        unit 55 {
            vlan-id 55;
            family inet {
                address 55.5.5.53/24;
            }
        }
        unit 345 {
            vlan-id 345;
            family inet {
                address 3.4.5.5/24;
            }
        }
        unit 356 {
            vlan-id 356;
            family inet {
                address 3.5.6.5/24;
            }
        }
    }
}
routing-options {
    router-id 3.3.3.5;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface lo0.0;
            interface ge-0/0/8.3 {
                metric 1000;
            }
            interface ge-0/0/8.345 {
                interface-type p2p;
                metric 10;
            }
            interface ge-0/0/8.356 {
                interface-type p2p;
                metric 10;
            }
            interface ge-0/0/8.55 {
                metric 100;
            }
        }
    }
}

Impressions: This is a very capable little switch that is pretty inexpensive. I have one of these sitting below my SRX210H serving my switching needs for my home datacenter (basement consisting of a NAS, a backup server, VOIP PBX, XEN VM server, and lab connection). The coolest thing about these is that they have a USB serial console port with a USB-serial converter built right in. All you need to do is plug a USB micro cable into your laptop's USB port, and viola -- console port access! Since I run Linux and FreeBSD on my laptops, you just get a new serial port just like that! Since Junos 12.1, you can also run up to four of these little things in a virtual chassis! Just like it's bigger brother, the EX3200, this thing can operate a port just like it was a router with no need to use pseudo-routed-VLAN interfaces.

Issues:

Where to get it: At Juniper Networks

RouterBoard RB133

Router 6 in Cluster 3

Descripton: This is a RouterBoard RB133 running MikroTik RouterOS 5.22.

[admin@RB133] > system routerboard print
       routerboard: yes
             model: 133
     serial-number: 000000000000
  current-firmware: 2.18
  upgrade-firmware: 2.18
[admin@RB133] > 

Config:

/interface ethernet
set 0 arp=enabled auto-negotiation=yes bandwidth=unlimited/unlimited \
    disabled=no full-duplex=yes l2mtu=1518 mac-address=00:0C:42:25:18:36 \
    master-port=none mtu=1500 name=OSPF speed=100Mbps
/interface bridge
add admin-mac=00:00:00:00:00:00 ageing-time=5m arp=enabled auto-mac=yes \
    disabled=yes forward-delay=15s max-message-age=20s mtu=1500 name=\
    loopback0 priority=0x8000 protocol-mode=none transmit-hold-count=6
/interface vlan
add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=CLUSTER3 \
    use-service-tag=no vlan-id=3
add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=VLAN356 \
    use-service-tag=no vlan-id=356
add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=VLAN367 \
    use-service-tag=no vlan-id=367
add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=P2MP6 \
    use-service-tag=no vlan-id=66
/interface bridge port
add disabled=no edge=auto external-fdb=auto horizon=none interface=OSPF \
    path-cost=10 point-to-point=auto priority=0x80
/ip address
add address=3.3.3.6/32 disabled=no interface=loopback0 network=3.3.3.6
add address=3.5.6.6/24 disabled=no interface=VLAN356 network=3.5.6.0
add address=3.6.7.6/24 disabled=no interface=VLAN367 network=3.6.7.0
add address=66.6.6.63/24 disabled=no interface=P2MP6 network=66.6.6.0
add address=3.0.0.6/24 disabled=no interface=CLUSTER3 network=3.0.0.0
set OSPF disabled=no
set loopback0 disabled=no
set CLUSTER3 disabled=yes
set VLAN356 disabled=yes
set VLAN367 disabled=yes
set P2MP6 disabled=yes
/routing ospf area
set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\
    backbone type=default
/routing ospf instance
set [ find default=yes ] disabled=no distribute-default=never in-filter=\
    ospf-in metric-bgp=200000 metric-connected=2000 metric-default=1000 \
    metric-other-ospf=auto metric-rip=20000 metric-static=2000 mpls-te-area=\
    backbone mpls-te-router-id=loopback0 name=default out-filter=ospf-out \
    redistribute-bgp=no redistribute-connected=no redistribute-other-ospf=no \
    redistribute-rip=no redistribute-static=no router-id=3.3.3.6
/routing ospf interface
add authentication=none authentication-key="" authentication-key-id=1 cost=\
    1000 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \
    interface=CLUSTER3 network-type=broadcast passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=\
    100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \
    interface=P2MP6 network-type=broadcast passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=10 \
    dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\
    VLAN356 network-type=point-to-point passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=none authentication-key="" authentication-key-id=1 cost=10 \
    dead-interval=40s disabled=no hello-interval=10s instance-id=0 interface=\
    VLAN367 network-type=point-to-point passive=no priority=128 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
/routing ospf network
add area=backbone disabled=no network=3.3.3.6/32
add area=backbone disabled=no network=3.0.0.0/24
add area=backbone disabled=no network=3.5.6.0/24
add area=backbone disabled=no network=3.6.7.0/24
add area=backbone disabled=no network=66.6.6.0/24

Impressions: Another really cheap, but really fun and capable little router. Plus, this one has a serial port for easy first time and emergency access!

Issues:

Where to get it: These aren't made any more. You may be able to still find them online or at eBay.

Cisco 1760

Router 7 in Cluster 3

Descripton: This is a Cisco 1760 running IOS 12.3(4) on the Advanced Enterprise train with 96 MB of RAM.

C1760-1>sh ver 
Cisco IOS Software, C1700 Software (C1700-ADVENTERPRISEK9-M), Version 12.3(4)XG5, RELEASE SOFTWARE (fc1)
Synched to technology version 12.3(5.7)T
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2005 by Cisco Systems, Inc.
Compiled Tue 04-Oct-05 00:08 by ealyon

ROM: System Bootstrap, Version 12.2(4r)XL, RELEASE SOFTWARE (fc1)
ROM: 

C1760-1 uptime is 34 minutes
System returned to ROM by power-on
System image file is "flash:c1700-adventerprisek9-mz.123-4.XG5.bin"

Cisco 1760 (MPC860P) processor (revision 0x200) with 83559K/14745K bytes of memory.
Processor board ID XX000000000 (526361396), with hardware revision BB67
MPC860P processor: part number 5, mask 2
1 FastEthernet interface
1 Virtual Private Network (VPN) Module
32K bytes of NVRAM.
32768K bytes of processor board System flash (Read/Write)

Configuration register is 0x2102

Config:

!
interface Loopback0
 ip address 3.3.3.7 255.255.255.255
!
interface FastEthernet0/0
 no ip address
 speed auto
!
interface FastEthernet0/0.3
 encapsulation dot1Q 3
 ip address 3.0.0.7 255.255.255.0
 ip ospf cost 1000
 ip ospf priority 128
!
interface FastEthernet0/0.77
 encapsulation dot1Q 77
 ip address 77.7.7.73 255.255.255.0
 ip mtu 1496
 ip ospf network non-broadcast
 ip ospf cost 100
 ip ospf priority 128
!
interface FastEthernet0/0.317
 encapsulation dot1Q 317
 ip address 3.1.7.7 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
!
interface FastEthernet0/0.367
 encapsulation dot1Q 367
 ip address 3.6.7.7 255.255.255.0
 ip ospf network point-to-point
 ip ospf cost 10
 ip ospf priority 128
!
router ospf 1
 router-id 3.3.3.7
 log-adjacency-changes
 passive-interface Loopback0
 network 3.0.0.7 0.0.0.0 area 0.0.0.0
 network 3.1.7.7 0.0.0.0 area 0.0.0.0
 network 3.3.3.7 0.0.0.0 area 0.0.0.0
 network 3.6.7.7 0.0.0.0 area 0.0.0.0
 network 77.7.7.73 0.0.0.0 area 0.0.0.0
 neighbor 77.7.7.71 cost 100
 neighbor 77.7.7.72 cost 100
!

Impressions: I originally picked up this box off eBay to play with some voice stuff. My first impression whas the whopping 1 builtin Fast Ethernet port, and only WIC slots usable for network interfaces (the others are for voice modules), was that this router wouldn't get much lab use outside of playing with VoIP. However, for some reason, this thing became one of my favorite boxes to use as a CE router. For a Cisco branch router, it boots pretty quickly, it's quiet, and has a pretty quick processor so there isn't much delay in anything.

Issues:

Where to get it: Again, at eBay because this one is EoL, EoS.

Route Injection with exaBGP

exaBGP is initally used to announce routes via a BGP session to the Olive box that has 2GB of RAM for redistribution into OSPF.

exaBGP configuration is very Junos-like, and has a cool feature that it can run a script from within the config file, that generates more config. I used this feature to announce routes at the rate of 10 per second. This is done with the process service-1 stanza that calls the dyn.sh script that loops continuously, calling a python script that generates a random IPv4 prefix in CIDR style. Junos has a builtin default martian filter that will simply refuse to install any martian routes, so peering with the Olive we don't need worry about odd routes in our setup. You can view the martain table in Junos with the show route martians table. The default martian table for IPv4 routes in Junos is:

juniper@SRX210> show route martians table inet.0 

inet.0:
             0.0.0.0/0 exact -- allowed
             0.0.0.0/8 orlonger -- disallowed
             127.0.0.0/8 orlonger -- disallowed
             192.0.0.0/24 orlonger -- disallowed
             240.0.0.0/4 orlonger -- disallowed
             224.0.0.0/4 exact -- disallowed
             224.0.0.0/24 exact -- disallowed

juniper@SRX210> 
Here is the exaBGP config file that is used to peer with the Olive above:
neighbor 172.20.1.66 {
  description "Olive2GB on kvm1";
  router-id 66.66.66.66;
  local-address 172.20.10.117;
  local-as 65069;
  peer-as 65066;

   # advertise a bunch of bogus prefixes
   process service-1 {
	run /etc/exabgp/processes/dyn.sh;
   }
}
The dyn.sh script:
#!/bin/sh

# ignore Control C
# if the user ^C exabgp we will get that signal too, ignore it and let exabgp send us a SIGTERM
trap '' SIGINT


while `true`;
do
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
echo "announce route `/usr/local/bin/bs-route.py` next-hop 10.0.0.0"
sleep 1
done
And the python script: bs-route.py
#!/usr/bin/python
# 
# BS Route Generator - CIDR style
# 
# This just prints out a random network from the IPv4 address space and makes no attempt to filter bogons or martians
#
# Blackhole Networks
#
# version = 0.1

import random, sys

# Get a random 32 bit number for our IP address and a random length for our netmask
ip = random.randint(0,4294967295)
mask = random.randint(0,31)

# Convert our mask length into a 32 bit number
submask = 0
for i in range (mask):
   submask=submask+2**(31-i)

# Get the network ID by performing a bitwise and with our IP address and our subnet mask
network = ip & submask

# print the network ID out in dotted quad notation 
# do this by masking and bit shifting
oct1 = (network & 0b11111111000000000000000000000000) >> 24
oct2 = (network & 0b00000000111111110000000000000000) >> 16
oct3 = (network & 0b00000000000000001111111100000000) >> 8
oct4 = (network & 0b00000000000000000000000011111111) >> 0

# Print out the network in CIDR notation
sys.stdout.write("%s.%s.%s.%s/%s\n" % ( oct1, oct2, oct3, oct4, mask))

This is run simply with: exabgp /etc/exabgp/exabgp.conf . exaBGP does not listen for any connections on port 179, so you don't need to be root to run it.

Checking that Everyone is fully adjacent

  1. Cluster 1

    1. Router 1 - Vyatta
    2. vyatta@vyatta:~$ show ip ospf neighbor 
      
          Neighbor ID Pri State           Dead Time Address         Interface            RXmtL RqstL DBsmL
      1.1.1.2         128 2-Way/DROther     30.516s 1.0.0.2         eth1.1:1.0.0.1           0     0     0
      1.1.1.3         128 2-Way/DROther     32.628s 1.0.0.3         eth1.1:1.0.0.1           0     0     0
      1.1.1.4         128 Full/Backup       39.920s 1.0.0.4         eth1.1:1.0.0.1           0     0     0
      1.1.1.5         128 Full/DR           35.919s 1.0.0.5         eth1.1:1.0.0.1           0     0     0
      1.1.1.6         128 2-Way/DROther     36.072s 1.0.0.6         eth1.1:1.0.0.1           0     0     0
      1.1.1.7         128 2-Way/DROther     38.611s 1.0.0.7         eth1.1:1.0.0.1           0     0     0
      2.2.2.1         128 Full/Backup       38.818s 11.1.1.21       eth1.11:11.1.1.11        0     0     0
      3.3.3.1         128 Full/DR           38.101s 11.1.1.31       eth1.11:11.1.1.11        0     0     0
      1.1.1.2         250 Full/Backup       35.665s 1.1.2.2         eth1.112:1.1.2.1         0     0     0
      1.1.1.7         128 Full/Backup       38.357s 1.1.7.7         eth1.117:1.1.7.1      5630     0     0
      
    3. Router 2 - OpenOSPFd
    4. # ospfctl show neighbor 
      ID              Pri State        DeadTime Address         Iface     Uptime
      1.1.1.3         128 FULL/BCKUP   00:00:35 1.2.3.3         vlan123   00:02:21
      1.1.1.1         128 FULL/DR      00:00:30 1.1.2.1         vlan112   00:46:09
      2.2.2.2         128 FULL/DR      00:00:34 22.2.2.22       vlan22    00:45:31
      3.3.3.2         128 FULL/BCKUP   00:00:37 22.2.2.23       vlan22    00:45:32
      1.1.1.6         128 2-WAY/OTHER  00:00:37 1.0.0.6         vlan1     -
      1.1.1.3         128 2-WAY/OTHER  00:00:32 1.0.0.3         vlan1     -
      1.1.1.7         128 2-WAY/OTHER  00:00:35 1.0.0.7         vlan1     -
      1.1.1.5         128 FULL/DR      00:00:31 1.0.0.5         vlan1     00:45:59
      1.1.1.1         128 2-WAY/OTHER  00:00:30 1.0.0.1         vlan1     -
      1.1.1.4         128 FULL/BCKUP   00:00:37 1.0.0.4         vlan1     00:14:15
      
    5. Router 3 - 2GB Olive
    6. Note: 33.3.3.33 is down due to C3750 to Olive MTU mismatch issue discussed above
      juniper@Olive2GB> show ospf neighbor                      
      Address          Interface              State     ID               Pri  Dead
      1.0.0.5          fxp1.1                 Full      1.1.1.5          128    38
      1.0.0.1          fxp1.1                 2Way      1.1.1.1          128    38
      1.0.0.2          fxp1.1                 2Way      1.1.1.2          128    36
      1.0.0.4          fxp1.1                 Full      1.1.1.4          128    35
      1.0.0.6          fxp1.1                 2Way      1.1.1.6          128    35
      1.0.0.7          fxp1.1                 2Way      1.1.1.7          128    33
      1.2.3.2          fxp1.123               Full      1.1.1.2          128    37
      1.3.4.4          fxp1.134               Full      1.1.1.4            1    35
      33.3.3.33        fxp1.33                ExStart   3.3.3.3            0   116
      33.3.3.32        fxp1.33                Full      2.2.2.3          255   100
      
    7. Router 4 - BIRD
    8. bird> show ospf neighbors
      OSPFol:
      Router ID   	Pri	     State     	DTime	Interface  Router IP   
      1.1.1.5	128	    full/dr   	00:37	eth0.0001  1.0.0.5        
      1.1.1.1	128	    full/other	00:36	eth0.0001  1.0.0.1        
      1.1.1.2	128	    full/other	00:35	eth0.0001  1.0.0.2        
      1.1.1.7	128	    full/other	00:40	eth0.0001  1.0.0.7        
      1.1.1.6	128	    full/other	00:33	eth0.0001  1.0.0.6        
      1.1.1.3	128	    full/other	00:40	eth0.0001  1.0.0.3        
      2.2.2.4	128	    full/bdr  	00:39	eth0.0044  44.4.4.42      
      3.3.3.4	128	    full/dr   	00:38	eth0.0044  44.4.4.43      
      1.1.1.3	128	    full/bdr  	00:39	eth0.0134  1.3.4.3        
      1.1.1.5	128	    full/ptp  	00:37	eth0.0145  1.4.5.5        
      
    9. Router 5 - Quagga
    10. quagga-router# sh ip ospf neighbor  
      
          Neighbor ID Pri State           Dead Time Address         Interface         
         RXmtL RqstL DBsmL
      1.1.1.1         128 Full/DROther      36.174s 1.0.0.1         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.2         128 Full/DROther      34.744s 1.0.0.2         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.3         128 Full/DROther      36.529s 1.0.0.3         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.4         128 Full/Backup       32.899s 1.0.0.4         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.6         128 Full/DROther      33.112s 1.0.0.6         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.7         128 Full/DROther      32.252s 1.0.0.7         eth0.0001:1.0.0.5 
             0     0     0
      1.1.1.4           1 Full/DROther      32.899s 1.4.5.4         eth0.0145:1.4.5.5 
             0     0     0
      1.1.1.6         128 Full/Backup       33.116s 1.5.6.6         eth0.0156:1.5.6.5 
             0     0     0
      2.2.2.5         128 Full/DROther      32.607s 55.5.5.52       eth0.0055:55.5.5.5
      1      0     0     0
      3.3.3.5         128 Full/Backup       33.022s 55.5.5.53       eth0.0055:55.5.5.5
      1      0     0     0
      
    11. Router 6 - XORP
    12. router@xorp> show ospf4 neighbor 
        Address         Interface             State      ID              Pri  Dead
      1.0.0.4          eth0/eth0              Full      1.1.1.4          128    31
      1.0.0.7          eth0/eth0              TwoWay    1.1.1.7          128    36
      1.0.0.2          eth0/eth0              TwoWay    1.1.1.2          128    33
      1.0.0.1          eth0/eth0              TwoWay    1.1.1.1          128    34
      1.0.0.5          eth0/eth0              Full      1.1.1.5          128    35
      1.0.0.3          eth0/eth0              TwoWay    1.1.1.3          128    30
      66.6.6.62        eth1/eth1              Full      2.2.2.6          128    38
      66.6.6.63        eth1/eth1              Full      3.3.3.6          128    30
      1.5.6.5          eth2/eth2              Full      1.1.1.5          128    35
      1.6.7.7          eth3/eth3              Full      1.1.1.7          128    36
      
    13. Router 7 - 1GB Olive
    14. juniper@Olive1GB> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      1.0.0.3          fxp1.1                 2Way      1.1.1.3          128    38
      1.0.0.6          fxp1.1                 2Way      1.1.1.6          128    31
      1.0.0.4          fxp1.1                 Full      1.1.1.4          128    31
      1.0.0.2          fxp1.1                 2Way      1.1.1.2          128    32
      1.0.0.5          fxp1.1                 Full      1.1.1.5          128    34
      1.0.0.1          fxp1.1                 2Way      1.1.1.1          128    34
      1.1.7.1          fxp1.117               Full      1.1.1.1          128    34
      1.6.7.6          fxp1.167               Full      1.1.1.6          128    31
      77.7.7.72        fxp1.77                Full      2.2.2.7          128    94
      77.7.7.73        fxp1.77                Full      3.3.3.7          128   101
      
  2. Cluster 2

    1. Router 1 - EX3200
    2. juniper@EX3200-2_OSPF> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      11.1.1.31        ge-0/0/1.11            Full      3.3.3.1          128    36
      11.1.1.11        ge-0/0/1.11            Full      1.1.1.1          128    30
      2.0.0.5          ge-0/0/1.2             Full      2.2.2.5          128    34
      2.0.0.6          ge-0/0/1.2             Full      2.2.2.6          128    33
      2.0.0.3          ge-0/0/1.2             2Way      2.2.2.3          128    34
      2.0.0.7          ge-0/0/1.2             2Way      2.2.2.7          128    37
      2.0.0.4          ge-0/0/1.2             2Way      2.2.2.4          128    35
      2.0.0.2          ge-0/0/1.2             2Way      2.2.2.2          128    35
      2.1.2.2          ge-0/0/1.212           Full      2.2.2.2          128    34
      2.1.7.7          ge-0/0/1.217           Full      2.2.2.7          128    37
      
    3. Router 2 - SRX100H in Packet Mode
    4. juniper@SRX100-6_OSPF> show ospf neighbor              
      Address          Interface              State     ID               Pri  Dead
      2.0.0.5          fe-0/0/0.2             Full      2.2.2.5          128    39
      2.0.0.6          fe-0/0/0.2             Full      2.2.2.6          128    30
      2.0.0.3          fe-0/0/0.2             2Way      2.2.2.3          128    39
      2.0.0.7          fe-0/0/0.2             2Way      2.2.2.7          128    33
      2.0.0.4          fe-0/0/0.2             2Way      2.2.2.4          128    38
      2.0.0.1          fe-0/0/0.2             2Way      2.2.2.1          128    38
      2.1.2.1          fe-0/0/0.212           Full      2.2.2.1          128    31
      22.2.2.21        fe-0/0/0.22            Full      1.1.1.2          128    36
      22.2.2.23        fe-0/0/0.22            Full      3.3.3.2          128    31
      2.2.3.3          fe-0/0/0.223           Full      2.2.2.3          128    39
      
    5. Router 3 - Cisco 3640
    6. C3640-1#sh ip ospf neighbor 
      
      Neighbor ID     Pri   State           Dead Time   Address         Interface
      3.3.3.3           0   FULL/  -        00:01:36    33.3.3.33       FastEthernet0/0.33
      1.1.1.3           0   FULL/  -        00:01:59    33.3.3.31       FastEthernet0/0.33
      2.2.2.4           0   FULL/  -        00:00:35    2.3.4.4         FastEthernet0/0.234
      2.2.2.2           0   FULL/  -        00:00:37    2.2.3.2         FastEthernet0/0.223
      2.2.2.1         128   2WAY/DROTHER    00:00:33    2.0.0.1         FastEthernet0/0.2
      2.2.2.2         128   2WAY/DROTHER    00:00:35    2.0.0.2         FastEthernet0/0.2
      2.2.2.4         128   2WAY/DROTHER    00:00:34    2.0.0.4         FastEthernet0/0.2
      2.2.2.5         128   FULL/BDR        00:00:32    2.0.0.5         FastEthernet0/0.2
      2.2.2.6         128   FULL/DR         00:00:37    2.0.0.6         FastEthernet0/0.2
      2.2.2.7         128   2WAY/DROTHER    00:00:35    2.0.0.7         FastEthernet0/0.2
      
    7. Router 4 - SRX210HE
    8. juniper@SRX210HE_OSPF> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      2.0.0.5          ge-0/0/0.2             Full      2.2.2.5          128    35
      2.0.0.7          ge-0/0/0.2             2Way      2.2.2.7          128    38
      2.0.0.6          ge-0/0/0.2             Full      2.2.2.6          128    33
      2.0.0.3          ge-0/0/0.2             2Way      2.2.2.3          128    34
      2.0.0.2          ge-0/0/0.2             2Way      2.2.2.2          128    36
      2.0.0.1          ge-0/0/0.2             2Way      2.2.2.1          128    33
      2.3.4.3          ge-0/0/0.234           Full      2.2.2.3          128    34
      2.4.5.5          ge-0/0/0.245           Full      2.2.2.5          128    34
      44.4.4.41        ge-0/0/0.44            Full      1.1.1.4          128    38
      44.4.4.43        ge-0/0/0.44            Full      3.3.3.4          128    37
      
    9. Router 5 - NS208
    10. ns208-> get vrouter OSPF protocol ospf neighbor 
      VR: OSPF RouterId: 2.2.2.5
      ----------------------------------
      		Neighbor(s) on interface ethernet2.7 (Area 0.0.0.0)
      IpAddr/IfIndex  RouterId        Pri State    Opt  Up           StateChg     
      ------------------------------------------------------------------------------
      55.5.5.51       1.1.1.5         128 Full     E    11:53:12     (+7 -0)
      55.5.5.53       3.3.3.5         128 Full     E    11:53:12     (+7 -0)
      
      		Neighbor(s) on interface loopback.2 (Area 0.0.0.0)
      
      		Neighbor(s) on interface ethernet2.5 (Area 0.0.0.0)
      IpAddr/IfIndex  RouterId        Pri State    Opt  Up           StateChg     
      ------------------------------------------------------------------------------
      2.5.6.6         2.2.2.6         128 Full     E    12:27:21     (+8 -1)
      
      		Neighbor(s) on interface ethernet2.4 (Area 0.0.0.0)
      IpAddr/IfIndex  RouterId        Pri State    Opt  Up           StateChg     
      ------------------------------------------------------------------------------
      2.4.5.4         2.2.2.4         128 Full     E    12:27:13     (+8 -1)
      
      		Neighbor(s) on interface ethernet2.2 (Area 0.0.0.0)
      IpAddr/IfIndex  RouterId        Pri State    Opt  Up           StateChg     
      ------------------------------------------------------------------------------
      2.0.0.7         2.2.2.7         128 Full     E    01:47:03     (+11 -1)
      2.0.0.2         2.2.2.2         128 Full     E    01:47:23     (+7 -0)
      2.0.0.4         2.2.2.4         128 Full     E    01:47:23     (+7 -0)
      2.0.0.1         2.2.2.1         128 Full     E    01:47:23     (+7 -0)
      2.0.0.3         2.2.2.3         128 Full     E    01:47:23     (+7 -0)
      2.0.0.6         2.2.2.6         128 Full     E    12:27:22     (+7 -0)
      
      
    11. Router 6 - Cisco 2811
    12. C2811-1#sh ip ospf neighbor 
      
      Neighbor ID     Pri   State           Dead Time   Address         Interface
      2.2.2.7           0   FULL/  -        00:00:30    2.6.7.7         FastEthernet0/0.267
      2.2.2.5           0   FULL/  -        00:00:36    2.5.6.5         FastEthernet0/0.256
      1.1.1.6         128   FULL/DROTHER    00:00:39    66.6.6.61       FastEthernet0/0.66
      3.3.3.6         128   FULL/BDR        00:00:39    66.6.6.63       FastEthernet0/0.66
      2.2.2.1         128   FULL/DROTHER    00:00:39    2.0.0.1         FastEthernet0/0.2
      2.2.2.2         128   FULL/DROTHER    00:00:35    2.0.0.2         FastEthernet0/0.2
      2.2.2.3         128   FULL/DROTHER    00:00:36    2.0.0.3         FastEthernet0/0.2
      2.2.2.4         128   FULL/DROTHER    00:00:37    2.0.0.4         FastEthernet0/0.2
      2.2.2.5         128   FULL/BDR        00:00:36    2.0.0.5         FastEthernet0/0.2
      2.2.2.7         128   FULL/DROTHER    00:00:30    2.0.0.7         FastEthernet0/0.2
      
    13. Router 7 - RB750G
    14. [admin@RB750G] > /routing ospf neighbor print
       0 instance=default router-id=2.2.2.6 address=2.6.7.6 interface=VLAN267 
         priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" 
         state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h48m41s 
      
       1 instance=default router-id=2.2.2.3 address=2.0.0.3 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" 
         state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 
      
       2 instance=default router-id=2.2.2.5 address=2.0.0.5 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h48m43s 
      
       3 instance=default router-id=2.2.2.1 address=2.0.0.1 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" 
         state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 
      
       4 instance=default router-id=2.2.2.6 address=2.0.0.6 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h48m43s 
      
       5 instance=default router-id=2.2.2.2 address=2.0.0.2 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" 
         state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 
      
       6 instance=default router-id=2.2.2.4 address=2.0.0.4 interface=CLUSTER2 
         priority=128 dr-address=2.0.0.6 backup-dr-address=2.0.0.5 state="2-Way" 
         state-changes=2 ls-retransmits=0 ls-requests=0 db-summaries=0 
      
       7 instance=default router-id=2.2.2.1 address=2.1.7.1 interface=VLAN217 
         priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h48m49s 
      
       8 instance=default router-id=3.3.3.7 address=77.7.7.73 interface=P2MP7 
         priority=128 dr-address=77.7.7.73 backup-dr-address=77.7.7.72 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h48m26s 
      
       9 instance=default router-id=1.1.1.7 address=77.7.7.71 interface=P2MP7 
         priority=128 dr-address=77.7.7.73 backup-dr-address=77.7.7.72 state="Full" 
         state-changes=12 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=48m59s 
      
      
  3. Cluster 3

    1. Router 1 - SRX100B
    2. juniper@SRX100-5_OSPF> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      11.1.1.11        fe-0/0/1.11            Full      1.1.1.1          128    31
      11.1.1.21        fe-0/0/1.11            Full      2.2.2.1          128    38
      3.0.0.3          fe-0/0/1.3             2Way      3.3.3.3          128    39
      3.0.0.5          fe-0/0/1.3             2Way      3.3.3.5          128    34
      3.0.0.7          fe-0/0/1.3             Full      3.3.3.7          128    38
      3.0.0.2          fe-0/0/1.3             2Way      3.3.3.2          128    34
      3.0.0.6          fe-0/0/1.3             Full      3.3.3.6          128    37
      3.0.0.4          fe-0/0/1.3             2Way      3.3.3.4          128    39
      3.1.2.2          fe-0/0/1.312           Full      3.3.3.2          128    35
      3.1.7.7          fe-0/0/1.317           Full      3.3.3.7          128    38
      
      
    3. Router 2 - J2300
    4. juniper@J2300-7> show ospf neighbor    
      Address          Interface              State     ID               Pri  Dead
      22.2.2.21        fe-0/0/0.22            Full      1.1.1.2          128    31
      22.2.2.22        fe-0/0/0.22            Full      2.2.2.2          128    33
      3.0.0.5          fe-0/0/0.3             2Way      3.3.3.5          128    31
      3.0.0.7          fe-0/0/0.3             Full      3.3.3.7          128    30
      3.0.0.3          fe-0/0/0.3             2Way      3.3.3.3          128    33
      3.0.0.1          fe-0/0/0.3             2Way      3.3.3.1          128    33
      3.0.0.4          fe-0/0/0.3             2Way      3.3.3.4          128    34
      3.0.0.6          fe-0/0/0.3             Full      3.3.3.6          128    30
      3.1.2.1          fe-0/0/0.312           Full      3.3.3.1          128    38
      3.2.3.3          fe-0/0/0.323           Full      3.3.3.3          128    32
      
    5. Router 3 - Cisco 3750
    6. Adjacency to 1.1.1.3 is down due to the MTU mismatch discussed ealier

      C3750-1#show ip ospf nei
      
      Neighbor ID     Pri   State           Dead Time   Address         Interface
      3.3.3.4           0   FULL/  -        00:00:39    3.3.4.4         Vlan334
      3.3.3.2           0   FULL/  -        00:00:37    3.2.3.2         Vlan323
      1.1.1.3           0   DOWN/  -           -        33.3.3.31       Vlan33
      2.2.2.3           0   FULL/  -        00:01:45    33.3.3.32       Vlan33
      3.3.3.1         128   2WAY/DROTHER    00:00:32    3.0.0.1         Vlan3
      3.3.3.2         128   2WAY/DROTHER    00:00:37    3.0.0.2         Vlan3
      3.3.3.4         128   2WAY/DROTHER    00:00:35    3.0.0.4         Vlan3
      3.3.3.5         128   2WAY/DROTHER    00:00:38    3.0.0.5         Vlan3
      3.3.3.6         128   FULL/BDR        00:00:38    3.0.0.6         Vlan3
      3.3.3.7         128   FULL/DR         00:00:39    3.0.0.7         Vlan3
      
    7. Router 4 - SRX100H
    8. juniper@SRX100-7_OSPF> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      3.0.0.3          fe-0/0/0.3             2Way      3.3.3.3          128    39
      3.0.0.5          fe-0/0/0.3             2Way      3.3.3.5          128    39
      3.0.0.7          fe-0/0/0.3             Full      3.3.3.7          128    33
      3.0.0.2          fe-0/0/0.3             2Way      3.3.3.2          128    32
      3.0.0.6          fe-0/0/0.3             Full      3.3.3.6          128    31
      3.0.0.1          fe-0/0/0.3             2Way      3.3.3.1          128    37
      3.3.4.3          fe-0/0/0.334           Full      3.3.3.3          128    35
      3.4.5.5          fe-0/0/0.345           Full      3.3.3.5          128    38
      44.4.4.41        fe-0/0/0.44            Full      1.1.1.4          128    32
      44.4.4.42        fe-0/0/0.44            Full      2.2.2.4          128    35
      
    9. Router 5 - EX220C
    10. {master:0}
      copek@EX2200C-3> show ospf neighbor 
      Address          Interface              State     ID               Pri  Dead
      3.0.0.3          ge-0/0/8.3             2Way      3.3.3.3          128    31
      3.0.0.6          ge-0/0/8.3             Full      3.3.3.6          128    39
      3.0.0.7          ge-0/0/8.3             Full      3.3.3.7          128    31
      3.0.0.2          ge-0/0/8.3             2Way      3.3.3.2          128    33
      3.0.0.1          ge-0/0/8.3             2Way      3.3.3.1          128    36
      3.0.0.4          ge-0/0/8.3             2Way      3.3.3.4          128    38
      3.4.5.4          ge-0/0/8.345           Full      3.3.3.4          128    39
      3.5.6.6          ge-0/0/8.356           Full      3.3.3.6          128    39
      55.5.5.52        ge-0/0/8.55            Full      2.2.2.5          128    39
      55.5.5.51        ge-0/0/8.55            Full      1.1.1.5          128    38
      
      
    11. Router 6 - RB133
    12. [admin@RB133] > /routing ospf neighbor print
       0 instance=default router-id=3.3.3.3 address=3.0.0.3 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=41m10s 
      
       1 instance=default router-id=1.1.1.6 address=66.6.6.61 interface=P2MP6 
         priority=128 dr-address=66.6.6.62 backup-dr-address=66.6.6.63 state="Full" 
         state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=58m15s 
      
       2 instance=default router-id=3.3.3.7 address=3.0.0.7 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h57m7s 
      
       3 instance=default router-id=3.3.3.7 address=3.6.7.7 interface=VLAN367 
         priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" 
         state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h57m7s 
      
       4 instance=default router-id=3.3.3.5 address=3.5.6.5 interface=VLAN356 
         priority=128 dr-address=0.0.0.0 backup-dr-address=0.0.0.0 state="Full" 
         state-changes=4 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h57m8s 
      
       5 instance=default router-id=3.3.3.1 address=3.0.0.1 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h56m58s 
      
       6 instance=default router-id=3.3.3.4 address=3.0.0.4 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=8 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h56m58s 
      
       7 instance=default router-id=3.3.3.5 address=3.0.0.5 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h56m58s 
      
       8 instance=default router-id=3.3.3.2 address=3.0.0.2 interface=CLUSTER3 
         priority=128 dr-address=3.0.0.7 backup-dr-address=3.0.0.6 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h56m58s 
      
       9 instance=default router-id=2.2.2.6 address=66.6.6.62 interface=P2MP6 
         priority=128 dr-address=66.6.6.62 backup-dr-address=66.6.6.63 state="Full" 
         state-changes=5 ls-retransmits=0 ls-requests=0 db-summaries=0 
         adjacency=1h57m16s 
      
      
    13. Router 7 - Cisco 1760
    14. C1760-1#sh ip ospf neighbor 
      
      Neighbor ID     Pri   State           Dead Time   Address         Interface
      2.2.2.7         128   FULL/BDR        00:01:38    77.7.7.72       FastEthernet0/0.77
      1.1.1.7         128   FULL/DROTHER    00:01:53    77.7.7.71       FastEthernet0/0.77
      3.3.3.6           0   FULL/  -        00:00:31    3.6.7.6         FastEthernet0/0.367
      3.3.3.1           0   FULL/  -        00:00:34    3.1.7.1         FastEthernet0/0.317
      3.3.3.1         128   FULL/DROTHER    00:00:33    3.0.0.1         FastEthernet0/0.3
      3.3.3.2         128   FULL/DROTHER    00:00:34    3.0.0.2         FastEthernet0/0.3
      3.3.3.3         128   FULL/DROTHER    00:00:38    3.0.0.3         FastEthernet0/0.3
      3.3.3.4         128   FULL/DROTHER    00:00:34    3.0.0.4         FastEthernet0/0.3
      3.3.3.5         128   FULL/DROTHER    00:00:39    3.0.0.5         FastEthernet0/0.3
      3.3.3.6         128   FULL/BDR        00:00:31    3.0.0.6         FastEthernet0/0.3
      

Run 1: The slow buildup.

exaBGP was started from an "off-net" Linux box, and set up to peer with the Olive with the most memory. The script sent 10 prefixes to the Olive every second as an IPv4 unicast BGP route. The olive had to parse them out, and then re-advertise them as an OSPF external route which it did as soon as it was determined to be a valid route and was deemed suitable for readvertisement: exaBGP was run below:

user@linux-box:~$ exabgp /etc/exabgp/exabgp.conf
Sun, 06 Jan 2013 00:22:17 INFO     11392  configuration Performing reload of exabgp 1.3.4
Sun, 06 Jan 2013 00:22:17 INFO     11392  supervisor    New Peer 172.20.1.66
Sun, 06 Jan 2013 00:22:17 INFO     11392  configuration Loaded new configuration successfully
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Forked process service-1
trap: SIGINT: bad trap
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 113.252.0.0/17 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 223.124.11.116/30 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 148.217.8.0/21 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 245.229.64.0/19 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 128.0.0.0/6 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  supervisor    Performing dynamic route update
Sun, 06 Jan 2013 00:22:17 INFO     11392  supervisor    Updated peers dynamic routes successfully
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> OPEN version=4 asn=65069 hold_time=180 router_id=66.66.66.66 capabilities=[Multiprotocol for IPv4 unicast IPv6 unicast IPv4 flow-ipv4, 4Bytes AS 65069]
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 99.67.40.0/21 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 147.20.72.0/22 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 82.156.0.0/17 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 224.0.0.0/9 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  processes     Command from process service-1 : announce route 85.241.220.0/22 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:17 INFO     11392  supervisor    Performing dynamic route update
Sun, 06 Jan 2013 00:22:17 INFO     11392  supervisor    Updated peers dynamic routes successfully
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   << OPEN version=4 asn=65066 hold_time=90 router_id=1.1.1.3 capabilities=[Cisco Route Refresh, Multiprotocol for IPv4 unicast, Route Refresh, Graceful Restart, 4Bytes AS 65066]
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> KEEPALIVE (OPENCONFIRM)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   << KEEPALIVE (ESTABLISHED)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (update)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> 10 UPDATE(s)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> KEEPALIVE (no more UPDATE and no EOR)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (0)
Sun, 06 Jan 2013 00:22:17 INFO     11392  message       Peer     172.20.1.66 ASN 65066   << KEEPALIVE
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 220.222.228.0/23 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 65.88.32.0/19 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 152.0.0.0/6 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 37.200.117.128/30 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 156.0.0.0/7 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 207.149.60.16/31 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 75.128.0.0/9 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 154.45.172.0/22 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 172.0.0.0/8 next-hop 10.0.0.0 
Sun, 06 Jan 2013 00:22:18 INFO     11392  supervisor    Performing dynamic route update
Sun, 06 Jan 2013 00:22:18 INFO     11392  supervisor    Updated peers dynamic routes successfully
Sun, 06 Jan 2013 00:22:18 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (9)
Sun, 06 Jan 2013 00:22:18 INFO     11392  processes     Command from process service-1 : announce route 96.252.114.0/26 next-hop 10.0.0.0
Sun, 06 Jan 2013 00:22:18 INFO     11392  supervisor    Performing dynamic route update
Sun, 06 Jan 2013 00:22:18 INFO     11392  supervisor    Updated peers dynamic routes successfully
Sun, 06 Jan 2013 00:22:18 INFO     11392  message       Peer     172.20.1.66 ASN 65066   >> UPDATE (1)

Everything was going well, but after a while (anywhere from 4k to 40k prefixes), the Olives would loose all of their neighbors:

juniper@Olive1GB> show ospf neighbor 
Address          Interface              State     ID               Pri  Dead
77.7.7.72        fxp1.77                Down      0.0.0.0            0     0
77.7.7.73        fxp1.77                Down      3.3.3.7          128     0

I had to restart the Olive, or even the whole VM instance to get the Olives to re-form any adjacencies. After a while, I narrowed this down to the emulated network card -- an Intel 10/100 NIC. After a while, the emulated NIC would just stop processing traffic. I tried three different models supported by QEMU-kvm, and althgouth the i82557b model seemed to work the best, it would still go to sleep after about 40k routes. So I swung the BGP into OSPF export duties over to the J2300, and it seemed to perform very well.

The results

At first everything was fine, until the network had about 2,000 external LSAs floating around. Then the Cisco 3750 started to complain:

*Mar  1 10:39:41.477: %PLATFORM_UCAST-6-PREFIX:  One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix
Then it began to complain a lot more:
*Mar  1 10:41:00.799: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x295E598, alignment 0 
Pool: Processor  Free: 56772  Cause: Not enough free memory 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "IP RIB Update", ipl= 0, pid= 251
-Traceback= 1EA47C0 1EA4F0C 294A8A0 294CDC0 294D0D0 295E59C 295F1D8 149B6D0 13E0CAC 1FF7A48 1F9F724 1FB98D4 1FB9A5C 1FA4310 1FA4438 1FA4870
*Mar  1 10:41:00.799: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platf
 --More-- orm IPv4 Fib malloc failed (fatal) (0 subsequent failures).
*Mar  1 10:41:00.799: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.
*Mar  1 10:41:00.908: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (0 subsequent failures).
*Mar  1 10:41:00.908: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.
*Mar  1 10:41:00.950: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib 
 --More-- malloc failed (fatal) (5 subsequent failures).
*Mar  1 10:41:00.950: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.
*Mar  1 10:41:01.026: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (0 subsequent failures).
*Mar  1 10:41:01.034: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.
*Mar  1 10:41:22.492: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
At that point, the C3750 just complained about having to disable CEF for awhile as it ignored LSAs. Then it decided that due to too many requests from it's neighbors for LSA acknowlgedments that it would just drop off the network:
*Mar  1 10:41:25.889: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
 --More-- 
*Mar  1 10:42:22.521: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:42:25.893: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expired
 --More-- 
*Mar  1 10:43:22.549: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:44:22.578: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:44:38.541: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
 --More-- 
*Mar  1 10:45:22.606: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:45:38.544: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expired
 --More-- 
*Mar  1 10:46:22.634: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:47:22.663: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:48:01.217: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
 --More-- 
*Mar  1 10:48:22.691: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition. It can be re-enabled by configuring "ip cef [distributed]" 
 --More-- 
*Mar  1 10:49:01.220: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on Vlan33 from DOWN to DOWN, Neighbor Down: Ignore timer expired
This sequence repeated ad-infinum. However, though the console the box was responsive. Measures to stop the onslaught could have been taken to protect the switch.

Not long thereafer, at 4K external LSAs the two Mikrotik routers seemed to start to work a bit harder. The CPU on the RB750G going at about 5%, and the poor little RB133 was runinng at about 20%. Just shy of 8,000 external LSAs the EX2200C made a complaint: Jan 6 13:04:10 EX2200C-3 rpd[1075]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (8000) in table inet.0 still exceeds or equals configured maximum (8000). Despite it's bitching, it contined to process LSAs just fine. Then the first real death happned as I lost the telnet connection to the RB750G. The RB750G was totally unresponsive and I could not even ping it. I power cycled it, and it came back a while with an ICMP echo-reply, but it never let me log in again. It would reply to my pings for a few seconds, and then it was gone again.

At just shy of 2^14 (16K) LSAs, the EX3200 voiced a complaint: Jan 6 12:11:13 EX3200-2_OSPF rpd[1081]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (16384) in table inet.0 still exceeds or equals configured maximum (16384)

At 20k LSAs, the Cisco 1760 entered an infinte reboot cycle with the following cry for help:

*Mar  5 12:27:37.998: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x8001FD58, alignment 0 
Pool: Processor  Free: 357580  Cause: Memory fragmentation 
Alternate Pool: I/O  Free: 108940  Cause: Memory fragmentation 

-Process= "OSPF Router 1", ipl= 0, pid= 157
-Traceback= 8000C1CC 8000F868 8001FD5C 811A5B38 811A5D94 811A7610 81434CF8 8143CB18 8143CE78 8143CBB0 8143CC54 8143F950 8143FBB8 81400064 80561688 80565EFC

%Software-forced reload


Unexpected exception to CPUvector 700, PC = 8055FCCC
-Traceback= 8055FCCC 80020D78 8141356C 8142DA8C 813FFFDC 80561688 80565EFC

Writing crashinfo to flash:crashinfo_20020305-122750

=== Flushing messages (12:27:50 UTC Tue Mar 5 2002) ===

Queued messages:
*Mar  5 12:27:50.911: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

*Mar  5 12:27:50.879: %SYS-2-CHUNKNOROOT: Root chunk need to be specified for 8380FC48
-Process= "OSPF Router 1", ipl= 0, pid= 157
-Traceback= 80020D70 8141356C 8142DA8C 813FFFDC 80561688 80565EFC
*** System received a Software forced crash ***
signal= 0x17, code= 0x700, context= 0x8328be48
PC = 0x8055fccc, Vector = 0x700, SP = 0x837f4370

System Bootstrap, Version 12.2(4r)XL, RELEASE SOFTWARE (fc1)
TAC Support: http://www.cisco.com/tac
Copyright (c) 2001 by cisco Systems, Inc.
C1700 platform with 98304 Kbytes of main memory


with 115K 

-Process= "OSPF Router 1", ipl= 0, pid= 157
-Traceback= 8000C1CC 8000F868 8001FD5C 814134C8 8142DA8C 813FFFDC 80561688 80565EFC
%% Low on memory; try again later

%% Low on memory; try again later

*Mar  5 14:56:15.041: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.1 on FastEthernet0/0.317 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
*Mar  5 14:56:17.036: %SYS-2-MALLOCFAIL: Memory allocation of 20000 bytes failed from 0x8001FD58, alignment 0 
Pool: Processor  Free: 294980  Cause: Memory fragmentation 
Alternate Pool: I/O  Free: 52  Cause: Not enough free memory 

-Process= "OSPF Router 1", ipl= 0, pid= 157
-Traceback= 8000C1CC 8000F868 8001FD5C 814134C8 8142DA8C 813FFFDC 80561688 80565EFC

The Cisco 1760 would come back to life, proceed to restuff it's LSDB over the top, and shriek the same cry of pain. If you were quick, you could have saved it from it's agony from the console port.

At 22K routes, the RB133's CPU pegged at 100%. The cli through the serial port was completely unresponsive.

[admin@RB133] > /system resource monitor 
          cpu-used: 99
  cpu-used-per-cpu: 99%
       free-memory: 6624
-- [Q quit|D dump|C-z pause]
I'd hit enter, and minutes later I might get another carriage return. After a long while the message: action timed out - try again, if error continues contact MikroTik support and send a supout file (13) appeared in the midst of my repeated bashing of the "Enter" key. There was no way to save the RB133, it was too slow to do anything.

Then the Netscreen 208 went, displaying by far the bloodiest message. I deleted about three quarters of the contents for brevity. As soon as it came back on the network, it would repeat it's cycle of agony.

ns208-> timer is NULL during creation
timer is NULL during creation
timer handler memory allocation failed
timer handler memory allocation failed
frag 7128230: bad pointer 07128230, task unknown (517) =0000000007128230: 00 00 00 00 00 04 22 05  15 a0 00 00 03 03 03 02   ......". ........
07128240: 80 00 00 01 95 db 00 24  ff ff 80 00 80 00 00 00   .......$ ........
07128280: 00 04 22 05 95 df ff ff  03 03 03 02 80 00 00 01   .."..... ........

**** overwriting suspect: 7128130
-- start 0x07128130 0x07127500 0x07128230
-- used block, allocation trace: 004d5954 004d8c10 004dd230
-- size 224, handler 0, task 65, used mask 15
07128130: 00 00 00 ef 19 13 08 41  00 4d 59 54 00 4d 8c 10   .......A .MYT.M..
07128320: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
**** end overwrting suspect
###Frag Check, 
Trace:
0021e288 0021e33c 0021e4cc 0021f074 0021d5b0 004eeee8 
004eee0c 004dcac4 004dd404 004e0440 004ed0a4 004ed460 
004fc59c 004ecb10 03257eec 
###severe problem,can't free memory, 
Trace:
0021f08c 0021d5b0 004eeee8 004eee0c 004dcac4 004dd404 
004e0440 004ed0a4 004ed460 004fc59c 004ecb10 03257eec 


***********************
ScreenOS Crash Context:
***********************
  bad:80000008  cnt:982192f6  cmp:9821ca2d  sts:b4009013

-------------
ASIC Context:
-------------
config1 11490093, config2 002b0082
md5 control 00000000
md5 pak base 00000000
des status 00000000
des dma adr 00000000
des dma cnt 00000000
-----------
OS Context:
-----------
Died Flow/bootup Module
Cur Task Context: ospf
------------
Memory Check:
------------
frag 7128130: bad next pointer 07128230, task ospf (65) =0000000007128230: 00 00 00 00 00 04 22 05  15 a0 00 00 03 03 03 02   ......". ........
07128280: b1 80 00 00 03 03 03 02  80 00 00 02 a0 53 00 24   ........ .....S.$

**** overwriting suspect: 7128130
-- start 0x07128130 0x07127500 0x07128230
-- used block, allocation trace: 004d5954 004d8c10 004dd230
-- size 224, handler 0, task 65, used mask 15
07128130: 00 00 00 ef 19 13 08 41  00 4d 59 54 00 4d 8c 10   .......A .MYT.M..
07128320: a0 53 00 24 ff 80 00 00  80 00 00 00 00 00 00 00   .S.$.... ........
**** end overwrting suspect
###Frag Check, 
Trace:
0021e288 0021e36c 0021e4cc 0021f630 0021f54c 0008371c 
00081790 
frag 7128230: bad pointer 07128230, task unknown (517) =0000000007128230: 00 00 00 00 00 04 22 05  15 a0 00 00 03 03 03 02   ......". ........
07128280: b1 80 00 00 03 03 03 02  80 00 00 02 a0 53 00 24   ........ .....S.$

**** overwriting suspect: 7128130
-- start 0x07128130 0x07127500 0x07128230
-- used block, allocation trace: 004d5954 004d8c10 004dd230
-- size 224, handler 0, task 65, used mask 15
07128320: a0 53 00 24 ff 80 00 00  80 00 00 00 00 00 00 00   .S.$.... ........
**** end overwrting suspect
###Frag Check, 
Trace:
0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 
00081790 
frag 80000000: bad pointer 80000000, task unknown (1536) =0000000380000000: 3c 1a 01 b3 67 5a de 00  af 41 00 00 3c 01 a0 00   <...gZ.. .A..<...
80000050: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
Can not find overwrite suspect for 0x80000000
Address out of heap range
###Frag Check, 
Trace:
0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 
00081790 

80000040: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
80000050: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
Can not find overwrite suspect for 0x80000000
Address out of heap range
###Frag Check, 
Trace:
0021e288 0021e33c 0021e4cc 0021f630 0021f54c 0008371c 
00081790 

NetScreen NS-200 Boot Loader Version 3.0.0 (Checksum: B48FB1B8)
Copyright (c) 1997-2003 NetScreen Technologies, Inc.

Total physical memory: 128MB
    Test - Pass
    Initialization - Done

Model Number: NS-208

Hit any key to run loader
Hit any key to run loader
Hit any key to run loader
Hit any key to run loader

Loading default system image from on-board flash disk...

Image authenticated!

 Start loading...
....................................................................................................
Done.



Juniper Networks, Inc
NS-200 System Software
Copyright, 1997-2006

Version 5.4.0r18.0
Init Heap (1ebd010/5342bf0, 00000000/00000000)
GT64120 revision id: 0x12
Load NVRAM Information ... (5.4.0)Done
GT64120 revision id: 0x12
Memory Test: b7800000,40000 ....... Done
Install module init vectors
Verify ACL register default value (at hw reset) ... Done
Verify ACL register read/write ... Done
Verify ACL rule read/write ... Done
Verify ACL rule search ... Done
MD5("a") = 0cc175b9 c0f1b6a8 31c399e2 69772661
MD5("abc") = 90015098 3cd24fb0 d6963f7d 28e17f72
MD5("message digest") = f96b697d 7cb7938d 525a2f31 aaf161d0
Verify DES register read/write ... Done
Install modules (00db4000,0197fb38) ... 

Initializing DI 1.1.0-ns
load dns table : dns table file does not exist.
System config (1387 bytes) loaded
.
Done.
Load System Configuration ...............................................................................................................................Enabled licensekey auto update
.....................................Done
system init done..
login: ethernet1 interface change physical state to Up
ethernet2 interface change physical state to Up
System change state to Active(1)

Once the we had the Cisco 3750 causing retransmissions, and the C1760 and NS208 constantly rebooting, the tiny 17 node ( 21 - 2xOlive+ 2xMikrotik) OSPF network really seemed to drop into chaos. Despite all of this, the J2300 continued to pump more and more external LSAs into the backbone area. Shortly thereafter, at about 25K routes the 3640 seemed to give up on life displaying the follwing message before it rebooted:

rPool: Processor  Free: 109816  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "IP RIB Update", ipl= 3, pid= 76 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x6107A530 0x61084698 0x61085728 0x6042E188 0x60400780 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948
Jan  6 12:08:58.107: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948
Jan  6 12:09:03.399: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948
Jan  6 12:09:06.475: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Dead timer expired
Jan  6 12:09:13.855: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948
Jan  6 12:09:19.527: %FIB-3-NOMEM: Malloc Failure, disabling CEF -Traceback= 0x60A9A5A8 0x604007DC 0x60405CD8 0x6040D42C 0x6040EAA0 0x604161C0 0x60EA9A84 0x60E7AB74 0x61023964 0x61023948
Jan  6 12:09:28.239: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x61084834, alignment 0 
Pool: Processor  Free: 1014468  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FDE378 0x60FDFABC 0x60FFA4A4 0x60FC8AEC 0x61023964 0x61023948
Jan  6 12:10:44.551: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired
Jan  6 12:11:22.116: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.4 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
C3640-1#

After it's first reboot, the 3640 just added to the network churn by blowing it's memory bounds, time after time, after time, dropping all of it's neighbors, clearing up it's RAM and staring the process all over again

Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948
Jan  6 18:16:15.525: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x61084834, alignment 0 
Pool: Processor  Free: 53296  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948
Jan  6 18:16:46.389: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x61084834, alignment 0 
Pool: Processor  Free: 53296  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 30 -Traceback= 0x60A9A5A8 0x61074344 0x61079100 0x610795FC 0x6108483C 0x61085728 0x60FF5B84 0x60FF803C 0x60FF8684 0x60FC931C 0x61023964 0x61023948
Jan  6 18:16:55.553: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Too many retransmissions
Jan  6 18:17:56.430: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from DOWN to DOWN, Neighbor Down: Ignore timer expired
Jan  6 18:18:07.722: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from LOADING to FULL, Loading Done

At about 30k external LSAs, the EX2200C stepped up it's complaints by oversubscribing it's FIB:

Jan  6 13:21:58  EX2200C-3 rpd[1075]: RPD_SCHED_SLIP: 7 sec scheduler slip, user: 5 sec 289760 usec, system: 0 sec, 0 usec
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 31.176.0.0/16 (cstatus: 65565) 
Jan  6 13:22:07  EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid)
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22)
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to install the RT entry (status: 22)
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed 
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 16 prefix 31.176/16 nh 1329 
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed 
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 36.246.0.0/15 (cstatus: 65565) 
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22)
Jan  6 13:22:07  EX2200C-3 fpc0 Failed to install the RT entry (status: 22)
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed 
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 15 prefix 36.246/15 nh 1329 
Jan  6 13:22:07  EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed 
Jan  6 13:22:08  EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 46.144.64.0/18 (cstatus: 65565) 
Jan  6 13:22:08  EX2200C-3 fpc0 Failed to h/w update ip uc route entry (status: 22)
Jan  6 13:22:08  EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid)
Jan  6 13:22:08  EX2200C-3 /kernel: RT_PFE: RT msg op 3 (PREFIX CHANGE) failed, err 5 (Invalid)
Jan  6 13:22:08  EX2200C-3 fpc0 Failed to install the RT entry (status: 22)
Jan  6 13:22:08  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2882: rt_halp_vectors->rt_create failed 
Jan  6 13:22:08  EX2200C-3 fpc0 RT-HAL,rt_entry_add_msg_proc,2942: proto ipv4,len 18 prefix 46.144.64/18 nh 1329 
Jan  6 13:22:08  EX2200C-3 fpc0 RT-HAL,rt_msg_handler,601: route process failed 
Jan  6 13:22:08  EX2200C-3 fpc0 Failed to Add the IPv4 Uc prefix: lpm-idx 0, vrf-idx 0, prefix/len 47.41.15.240/29 (cstatus: 65565) 

To it's credit, the little EX2200C kept running just fine. It was easily accessible the whole time.

Somewhere in this range the first opensorce implementation displayed the first serious problems, with suprise, suprise XORP giving up.

[ 2013/01/06 13:21:12.16138 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16211 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16289 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16384 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf

At 40k, the SRX100B, started to bitch about FIB space:

Jan  6 13:37:34  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:37:34  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid)
Jan  6 13:37:34  SRX100-5_OSPF last message repeated 13 times
Jan  6 13:39:29  SRX100-5_OSPF last message repeated 1098 times 
Jan  6 13:45:37  SRX100-5_OSPF last message repeated 3391 times
Jan  6 13:45:39  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:45:39  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid)
Jan  6 13:46:05  SRX100-5_OSPF last message repeated 313 times
Jan  6 13:46:46  SRX100-5_OSPF last message repeated 335 times
Jan  6 13:46:48  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:46:48  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid)
Jan  6 13:47:03  SRX100-5_OSPF last message repeated 210 times
Jan  6 13:47:49  SRX100-5_OSPF last message repeated 424 times
Jan  6 13:47:55  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:47:55  SRX100-5_OSPF /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid)
Jan  6 13:48:01  SRX100-5_OSPF last message repeated 107 times

The Cisco 2811, feeling left out decided to get into the action at about 75k routes. It started to show signs by flapping adjacencies and reporting corrupted LSAs:

*Jan  6 12:43:24.690: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:43:43.990: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:43:50.558: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:43:57.226: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:03.486: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:09.890: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:16.114: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:44:16.590: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:23.354: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:30.014: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:36.594: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:42.918: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:44:49.610: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.5.6.5, FastEthernet0/0.256
*Jan  6 12:45:03.410: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2
*Jan  6 12:45:10.130: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2
*Jan  6 12:45:16.602: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2
*Jan  6 12:45:24.426: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2
*Jan  6 12:45:25.810: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:45:31.406: %OSPF-4-BADLSATYPE: Invalid lsa: dbd Type 0, Length 770, LSID 0.0.0.0 from 0.6.34.5, 2.0.0.5, FastEthernet0/0.2
*Jan  6 12:46:17.910: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.6 on FastEthernet0/0.66 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:47:27.910: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.256 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:47:27.966: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.2 from EXCHANGE to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:47:29.466: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.267 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:47:32.530: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from INIT to DOWN, Neighbor Down: Dead timer expired

Then it blew it's memory bounds:

*Jan  6 12:48:25.446: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 398876  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F905ECz 0x42F932E4z 0x42F98E34z 0x42F8A2F0z 0x4393A798z 0x4393A77Cz
*Jan  6 12:48:55.230: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.5 on FastEthernet0/0.256 from LOADING to FULL, Loading Done
*Jan  6 12:49:27.978: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.267 from INIT to DOWN, Neighbor Down: Dead timer expired
*Jan  6 12:49:35.278: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x43984B44, alignment 16 
Pool: Processor  Free: 1717756  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "IP RIB Update", ipl= 3, pid= 184 -Traceback= 0x43968B0Cz 0x439820ACz 0x41FD132Cz 0x41FD3104z 0x41FD3758z 0x41FCC360z 0x41F8FECCz 0x41F51254z 0x41F93B90z 0x41F93DA8z 0x41F97E68z 0x41F9808Cz 0x4203CE70z 0x4203CFC4z 0x42695898z 0x4268B364z
*Jan  6 12:49:35.798: %COMMON_FIB-3-NOMEM: Memory allocation failure for validating prefix in IPv4 CEF [0x41F37C70] (fatal) (2436 subsequent failures).
*Jan  6 12:49:35.798: %COMMON_FIB-4-DISABLING: IPv4 CEF is being disabled due to a fatal error.
And did this for the rest of the time....
*Jan  6 18:12:47.856: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 1004388  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz
*Jan  6 18:13:17.864: %SYS-2-MALLOCFAIL: Memory allocation of 5000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 1001384  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Hello", ipl= 0, pid= 12 -Traceback= 0x439643A8z 0x43982348z 0x42F950C8z 0x42F95B74z 0x42F8CA50z 0x42F8CC2Cz 0x42FB9478z 0x42F857C0z 0x42F861B8z 0x4393A798z 0x4393A77Cz
*Jan  6 18:13:22.644: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN.
' -Traceback= 0x4393A798z 0x4393A77Cz
*Jan  6 18:13:47.876: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 1004116  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz
*Jan  6 18:14:17.876: %SYS-2-MALLOCFAIL: Memory allocation of 5000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 968736  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42FA4E38z 0x42FA57BCz 0x42FA5D44z 0x42F9A15Cz 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz
*Jan  6 18:14:22.644: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN.
' -Traceback= 0x4393A798z 0x4393A77Cz
*Jan  6 18:14:47.904: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 968736  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz
*Jan  6 18:15:17.944: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x43984DE0, alignment 0 
Pool: Processor  Free: 1010156  Cause: Memory fragmentation 
Alternate Pool: None  Free: 0  Cause: No Alternate pool 
 -Process= "OSPF-1 Router", ipl= 0, pid= 316 -Traceback= 0x439643A8z 0x43982348z 0x42F95958z 0x42F99918z 0x42F9A118z 0x42F850A0z 0x42F8A870z 0x4393A798z 0x4393A77Cz
*Jan  6 18:15:22.648: %LICENSE-2-VLS_ERROR: 'VLSnotifyBirthAndExpiryEvents' failed with an error - rc = 13 - 'Error[13]: Severe internal error in licensing or accessing feature UNKNOWN.
' -Traceback= 0x4393A798z 0x4393A77Cz

Somewhere in this range the first opensorce implementation displayed the first serious problems, with suprise, suprise XORP giving up. It dropped all of it's neighbors, and never reformed any adjacencies.

[ 2013/01/06 13:21:12.16138 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16211 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16289 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2013/01/06 13:21:12.16384 INFO xorp_rib RIB ] Received death event for protocol ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf

Then the SRX100B joined the cycling reboot club due to the hardware watchdog resetting the box.

Jan  6 15:50:21  SRX100-5_OSPF init: ipmi (PID 0) started
panic: Hardware watchdog timeout
cpuid = 0
KDB: stack backtrace:
SP 0x0: not in kernel
uart_z8530_class+0x0 (0,0,0,0) ra 0 sz 0
pid 22, process: idle: cpu0
Uptime: 1d3h32m33s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Stopping other CPUs


U-Boot 1.1.6 (Build time: Dec 12 2009 - 17:17:55)

SRX_100_LOWMEM board revision major:0, minor:0, serial #: AT3809AF0822
OCTEON CN5020-SCP pass 1.1, Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz data rate)
DRAM:  512 MB
Starting Memory POST... 
Checking datalines... OK

Somewhere between the 80K and 115K range, the EX3200 started to choke, gagged and then dropped a core:

Jan  6 13:40:02  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:02  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:03  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:03  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:03  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:03  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:04  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:04  EX3200-2_OSPF /kernel: last message repeated 4 times
Jan  6 13:40:04  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:04  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:05  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:05  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:06  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:06  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:06  EX3200-2_OSPF /kernel: last message repeated 1 times
Jan  6 13:40:06  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:07  EX3200-2_OSPF last message repeated 2 times
Jan  6 13:40:07  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:08  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:08  EX3200-2_OSPF /kernel: last message repeated 2 times
Jan  6 13:40:08  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:08  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:09  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:09  EX3200-2_OSPF /kernel: last message repeated 1 times
Jan  6 13:40:09  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:09  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:10  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:10  EX3200-2_OSPF /kernel: last message repeated 2 times
Jan  6 13:40:10  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:10  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:11  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:11  EX3200-2_OSPF /kernel: last message repeated 4 times
Jan  6 13:40:11  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:11  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:12  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:12  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:12  EX3200-2_OSPF /kernel: last message repeated 8 times
Jan  6 13:40:12  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:12  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:12  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:13  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:13  EX3200-2_OSPF /kernel: last message repeated 3 times
Jan  6 13:40:13  EX3200-2_OSPF /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
Jan  6 13:40:13  EX3200-2_OSPF fpc0 RT-HAL,rt_msg_handler,586: route check failed 22 
Jan  6 13:40:14  EX3200-2_OSPF last message repeated 34 times
Jan  6 13:41:04  EX3200-2_OSPF /kernel: Percentage memory available(19)less than threshold(20 %)- 3
Jan  6 13:41:16  EX3200-2_OSPF rpd[12424]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.3 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from ExStart to Init due to 1WayRcvd (event reason: neighbor is in one-way mode)
Jan  6 13:42:33  EX3200-2_OSPF rpd[12424]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.6 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from ExStart to Down due to InActiveTimer (event reason: neighbor was inactive and declared dead)
Jan  6 13:43:15  EX3200-2_OSPF /kernel: Process (12424,rpd) has exceeded 85% of RLIMIT_DATA: used 57936 KB Max 65536 KB
Jan  6 13:43:16  EX3200-2_OSPF rpd[12424]: RPD_RT_PREFIX_LIMIT_REACHED: Number of prefixes (16384) in table inet.0 reached configured maximum (16384)
Jan  6 13:43:18  EX3200-2_OSPF /kernel: Process (12424,rpd) attempted to exceed RLIMIT_DATA: attempted 66128 KB Max 65536 KB
Jan  6 13:43:36  EX3200-2_OSPF init: routing (PID 12424) terminated by signal number 6. Core dumped!
Jan  6 13:43:36  EX3200-2_OSPF init: routing (PID 12572) started
Jan  6 13:43:38  EX3200-2_OSPF rpd[12572]: L2CKT acquiring mastership for primary
Jan  6 13:43:38  EX3200-2_OSPF rpd[12572]: L2VPN acquiring mastership for primary
Jan  6 13:43:38  EX3200-2_OSPF rpd[12572]: RPD_KRT_KERNEL_BAD_ROUTE: KRT: lost ifl 0 for route 192.168.0.0
Jan  6 13:43:38  EX3200-2_OSPF rpd[12572]: RPD_TASK_BEGIN: Commencing routing updates, version 12.2R2.4, built 2012-11-15 17:42:45 UTC by builder
Jan  6 13:44:05  EX3200-2_OSPF dumpd: Core and context for rpd saved in /var/tmp/rpd.core-tarball.1.tgz

This also proved to be a cylce, overload the LSDB, drop a core, restart rpd, repeat

The rest of the routers seemed to keep chugging along despite all of the chaos. The J2300 was doing double duty just fine, the SRXs with 1GB of RAM chugged along, BIRD, Quagga, Vyatta and OpenOSPFd seemed to be keeping up fine. However, BIRD was taking from 20% to 40% of the CPU and OpenOSPFd was edging closer to 60%.

When there were about 180K external LSAs floating around, it didn't look like anything exciting was going to happen for quite a while. exaBGP had been feeding prefixes to the J2300 for about 5 hours. The 1GB routers (except XORP) just seemed to be cruising along -- despite the chaos introduced from the recycling group. At this point on every link there was about 1 Mbps of traffic on every link -- and this was pure OSPF traffic! There was no other traffc on the network besides the link-state routing protocol.

At this point I decided to kill the BGP connection to the J2300. exaBGP sent a cease message to the J2300 causing it to drop all of it's BGP routes, and start to pull back all of the external Type-5 LSAs it originated by aging them out.

Sun, 06 Jan 2013 20:39:18 INFO     6481   message       Peer     172.20.1.23 ASN 65066   >> KEEPALIVE (no more UPDATE and no EOR)
Sun, 06 Jan 2013 20:39:18 INFO     6481   message       Peer     172.20.1.23 ASN 65066   Sending Notification (6,3) [Cease: Peer De-configured]  
Sun, 06 Jan 2013 20:39:18 INFO     6481   supervisor    Performing dynamic route update
Sun, 06 Jan 2013 20:39:18 INFO     6481   supervisor    Updated peers dynamic routes successfully
Sun, 06 Jan 2013 20:39:18 INFO     6481   processes     Terminating process service-1

This introduced a lot more LSA updates as the J2300 sends out updates for all of the external LSAs. Traffic went up from about 1 Mbps to around the neighborhood of 3Mbps. The hardware based routers all seemed to be able to delete LSAs at about the same pace, but the opensource guys seemed to have a bit harder time keeping up. Quagga and Vyatta (which uses Quagga's daemons) were typically lagged about 10K behind the number that the J2300 still had left. BIRD was a bit further behind, but was definately working more than the Quaggites -- but OpenOSPFd was really lagging as the ospfd was really sucking up the CPU time.

load averages:  0.58,  0.63,  0.52                           openospfd 17:26:35
26 processes:  25 idle, 1 on processor
CPU states:  8.4% user,  0.0% nice, 20.0% system,  2.0% interrupt, 69.7% idle
Memory: Real: 163M/260M act/tot Free: 723M Cache: 22M Swap: 0K/17M

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
22718 _ospfd     2    0  145M  124M sleep     kqread   30:53 17.92% ospfd
 2618 root       2    0   21M   22M sleep     kqread   11:07  7.42% ospfd
 9004 _ospfd     2    0 2664K 2324K sleep     kqread    4:49  1.27% ospfd
 9106 _pflogd    4    0  656K  376K sleep     bpf       0:01  0.00% pflogd
    1 root      10    0  480K  420K idle      wait      0:01  0.00% init
12734 root       2    0 1732K 1740K sleep     select    0:01  0.00% sendmail
14388 _syslogd   2    0  672K  856K idle      poll      0:00  0.00% syslogd
17673 named      2    0 7632K 8424K idle      select    0:00  0.00% named
31083 root      18    0  720K  612K sleep     pause     0:00  0.00% ksh
26649 root       2    0  548K 1016K idle      select    0:00  0.00% cron
12361 root      28    0  636K 1724K onproc    -         0:00  0.00% top
 9911 _iked      2    0 1624K 1076K idle      kqread    0:00  0.00% iked
15138 root       2    0 1912K 1208K idle      kqread    0:00  0.00% iked
25883 root       2    0  592K  532K idle      netio     0:00  0.00% pflogd
 4978 root       2    0 2124K  996K idle      netio     0:00  0.00% named
24305 root       3    0  356K  912K idle      ttyin     0:00  0.00% getty
31626 root       2    0  652K  848K idle      netio     0:00  0.00% syslogd

When the rest of the "live" routers had 80K and 90K of LSAs left, ospfd on OpenBSD still add over 144,000. Compare the ospf summary on the OpenOSPFd box compared to the one on the J2300 at the same time:

# ospfctl show  
Router ID: 1.1.1.2
Uptime: 08:41:45
RFC1583 compatibility flag is disabled
SPF delay is 1000 msec(s), hold time between two SPFs is 5000 msec(s)
Number of external LSA(s) 144785 (Checksum sum 0x1bd1c442)
Number of areas attached to this router: 1

Area ID: 0.0.0.0
  Number of interfaces in this area: 5
  Number of fully adjacent neighbors in this area: 5
  SPF algorithm executed 798 time(s)
  Number LSA(s) 22 (Checksum sum 0x94514)

juniper@J2300-7> show ospf database summary    
Area 0.0.0.0:
   15 Router LSAs
   10 Network LSAs
Externals:
   81534 Extern LSAs
Interface fe-0/0/0.22:
Area 0.0.0.0:
Interface fe-0/0/0.3:
Area 0.0.0.0:
Interface fe-0/0/0.312:
Area 0.0.0.0:
Interface fe-0/0/0.323:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

[edit]

At the same time, OpenOSPFd was hogging all of the CPU time on the KVM host. OpenOSPFd is running within PID 7655, BIRD is 2066, followed by Vyatta and Quagga

top - 18:57:37 up 1 day,  7:44,  9 users,  load average: 0.91, 1.02, 1.14
Tasks: 213 total,   2 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.1%us,  0.4%sy,  0.0%ni, 84.0%id,  0.3%wa,  0.2%hi,  0.1%si,  0.0%st
Mem:  16252540k total, 11151688k used,  5100852k free,    19904k buffers
Swap: 14352380k total,   109464k used, 14242916k free,  5687716k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 7655 qemu      20   0 1362m 1.0g   9m S 112.1  6.5 186:24.09 qemu-kvm          
 2066 qemu      20   0 4247m 911m  10m S 15.6  5.7 128:10.85 qemu-kvm           
 2379 qemu      20   0 5346m 431m   9m S  5.0  2.7  33:07.99 qemu-kvm           
18945 qemu      20   0 1365m 953m   9m S  3.0  6.0  28:58.08 qemu-kvm           
 1056 root      20   0 1011m  24m 7028 S  0.3  0.2   5:27.83 libvirtd           
 2333 qemu      20   0 4782m 860m   9m S  0.3  5.4  26:31.96 qemu-kvm           
 2381 root      20   0     0    0    0 S  0.3  0.0   1:51.12 vhost-2379         
    1 root      20   0 67116  25m 2084 S  0.0  0.2   0:01.08 systemd            
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd           
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.48 ksoftirqd/0        
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H       
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0H       
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.11 migration/0        
    9 root      RT   0     0    0    0 S  0.0  0.0   0:00.13 watchdog/0         
   10 root      RT   0     0    0    0 S  0.0  0.0   0:00.06 migration/1        
   12 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/1:0H       
   13 root      20   0     0    0    0 S  0.0  0.0   0:00.40 ksoftirqd/1   

It took about 35 minutes for the LSAs to get all cleared out, and everyone except the XORP box and the Olives came back into the network eventually. I was really quite suprised at how long it took to clear out all of from the LSDB. Working on real networks, that keep external routes to a bare miniumum, OSPF always seemed very, very quick to me. With tens of thousands of external LSAs - it really brought a rather small network, with no traffic, to it's knees and took forever to respond and could never really converge fully. BGP is really damn quick in comparison on this scale! Juniper is also pretty kind with the reflooding of LSAs that a Junos box originates, only refreshing them every 50 minutes. Had this been a Cisco, the flooding would have happened even more often - at 30 minutes after an LSAs birthday. I'm not sure which is better on a network with a boatload of external advertisements to support - on one hand you'll get more due to the more frequent refreshing, but on the other hand especially with some of the slower boxes on the network **cough, cough, openospd, cough ** you're more likely to have an LSA actually die of old age. Sounds like a fun place to do some tuning! Either way, this was in no way a converged network and it problably doesn't make much difference either way. This also was with only one router injecting external routes, so there really wasn't a lot of comparison that needed to be done by the none ASBRs -- adding another redistributing peer or two would really make the workload go up.

Takeways

For next time

Making Some Adjustments

To prepare for the next OSPF Database Overload, I made some adjustments which should make it a bit easier to keep track of everything, and also to simulate an actual network a little better.

I setup another virtual machine to act as a NTP server to serve time for those that need it (KVMs just get time from the bare metal server). Generally, running a NTP server off of a VM is a pretty bad idea, but I'm not really looking for precision too much, so a few seconds of drift here and there isn't going to make too much of a difference. The same KVM is accepting remote syslog messages, so each router can spew complaints across to somebody who will listen, and hopefully all of the errors will wind up in one big file -- if they make it. And to keep an eye on things, the KVM is also running Nagios which is keeping tabs on all of the test participents with some simple ping probles. Generally ICMP isn't the best protocol to use for this, but it's quick and easy and supported by every IP implementation so it should suffice.

Generally speaking, in a real network there are more than just routers. So to simulate what a real enterprise-like network might be doing, each OSPF router participant is now responsible for announcing two user networks, one user network will be advertised as part of the Router LSA the router originates. The other will be announced through some sort of redistribution, so the advertisement is an External LSA of Type 5. The idea here is one network will be innate to the OSPF router, the other will have the potential to get lost in the shuffle once we start carelessly dumping thousands of routes into our LSDB. To test if these announcements are accurrate, there is a host on each network to simulate an actual user or network device like a stupid printer. Nagios will be keeping an eye on these "hosts" as well with a set of ping probes. The IP addressing for each host network contained in the Router LSA follows the format of <Cluster #><Router #>.0.0/25 where the router recieves the .1 address and the host is .2. For the host network announced in the External LSA, the network address is <Cluster #><Router #>.0.128/25 where the router owns the .129 address and the host is .130. The VM running Nagios was given an interface in each one of the cluster broadcast networks (1.0.0.10/24, 2.0.0.0/24 and 3.0.0.0/24) so it need not "route" to reach each OSPF router. In order to make sure that is is reaching each host network via a OSPF network advertisement (no cheating with static routes), the Nagios box is listening to OSPF LSAs by running an instance of Quagga. Quagga was chosen for ease of configuration, and if it doesn't work out BIRD will be used in it's place. The Nagios box recieved an extra Gigabtye of RAM to protect itself from any implosions due to lack of memory. The links on the Nagios box were also maxed out at 65535 to make it look very unattractive for any transit traffic, this is the same methodolgy behind the "OSPF Overload" command found on many implmentations. The OSPF priority on all of th links was set at 0 to keep the Nagios box from ever becoming a DR.

To simulate the hosts, I setup a minimal installation of Microcore Linux in a KVM and cloned it 41 times. The install is on a 24 MB disk, and uses 34 MB of RAM. The Microcore KVM for C1R1 is available here: vyatta_LSA1.img.bz2 (7.9 Mb) for download. It's a standard Micocore install, but it's been modified to autologin as root on a serial console. If you want to use this for your own purposes, remember to edit the /opt/bootlocal.sh and /opt/bootsync.sh files to change the IP address, default gateway and hostname of the image. Don't forget to run the command filetool.sh -b if you want to make your purposes permanent. This image boots up in about 2 seconds, but you may need to add a second or two of delay in one of the boot*.sh files if the IP address doesn't seem to be taking. Note also that the root user in the image has no password and uses virtio drivers for the NIC, memory and hard disk. An XML dump of the virtual machine definition for libvirt is here: vyatta_LSA1.xml.

The last adjustment was to change all of the Olive interfaces from emulated Intel 10/100 NICs, to emulated Intel 10/100/1000 NICs. I did some quick tests and the new interfaces for the Olives seemed to be holding up perfectly. Note that all that was needed to change the config from the old fxp intefaces to the new em interfaces was one command in config mode, followd by a commit: replace pattern fxp with em.

Additional Configuration for Host Network Advertisements

In case you're curious about the configuration behind the new Type 1 and Type 5 LSAs for the host networks, here is the configuration that was added to each router. In general, to announce a route as part of the Router LSA (Type 1) the network and/or interface is added into the routers OSPF configuration. To announce a route as a separate External LSA (Type 5), the network prefix or interface is redistributed through OSPF. Care was taken to use policy or a filter so only the single /25 network was redistributed just to keep things clean and consistent.

  1. Cluster 1

    1. Router 1 - Vyatta
    2. Note: Doing this configuration was like if a Junos box and IOS box had a baby. Junos-like policy to make a IOS-like route-map.
      interfaces {
          ethernet eth0 {
              address 1.1.0.1/25
              description LSA1
              duplex auto
              hw-id 52:54:00:94:13:84
              smp_affinity auto
              speed auto
          }
          ethernet eth2 {
              address 1.1.0.129/25
              description LSA5
              duplex auto
              hw-id 52:54:00:08:2d:6f
              smp_affinity auto
              speed auto
          }
      }
      policy {
          prefix-list HOST {
              rule 1 {
                  action permit
                  prefix 1.1.0.128/25
              }
          }
          route-map HOST {
              rule 1 {
                  action permit
                  match {
                      ip {
                          address {
                              prefix-list HOST
                          }
                      }
                  }
              }
              rule 2 {
                  action deny
              }
          }
      }
      protocols {
          ospf {
              area 0.0.0.0 {
                  network 1.1.0.0/25
              }
              redistribute {
                  connected {
                      metric-type 2
                      route-map HOST
                  }
              }
          }
      }
      
    3. Router 2 - OpenOSPFd
    4. Note: em1 is the LSA1 interface with IP 1.2.0.1/25, and em2 is the LSA5 interface with IP 1.2.0.129/25. Really easy and minimal config to get this one working properly
      redistribute 1.2.0.128/25
      area 0.0.0.0 {
              interface em1
      }
      
    5. Router 3 - 2GB Olive
    6. Note: Nice, straightforward, and easy to control!
      interfaces {
          em2 {
              unit 0 {
                  family inet {
                      address 1.3.0.1/25;
                  }
              }
          }
          em3 {
              unit 0 {
                  family inet {
                      address 1.3.0.129/25;
                  }
              }
          }
      }
      protocols {
              export EXPORT-LSA5;
              area 0.0.0.0 {
                  interface em2.0 {
                      metric 100;
                  }
              }
      }
      policy-options {
          policy-statement EXPORT-LSA5 {
              term LSA5-HOST {
                  from {
                      protocol direct;
                      route-filter 1.3.0.128/25 exact;
                  }
                  then accept;
              }
              then {
                  metric 100;
              }
          }
      }
      
    7. Router 4 - BIRD
    8. Note: While trying to get BIRD to limit it's redistribution of connected routes, I noticed that the BIRD router was not announcing it's /32 address for it's loopback interface. Getting it to advertise this proved to be non-trivial. I had to bring up a dummy interface dummy0 in the OS, and use it enleau of the actual loopback interface, lo. Limiting the redistribution aalso proved to be pretty complex. BIRD proved to be the most difficult implementation to get both networks advertised out properly without announcing anything unintended. However, I did learn that BIRD seems to be really powerful in the policy aspect, able to do a lot of things that no other implementation seems to be able to. eth1 is the LSA1 interface with IP address 1.4.0.1/25.
      filter OSPF_export {
      	if ( source = RTS_DEVICE && net ~ [ 1.4.0.128/25 ] ) then accept;
              reject;
      }
      
      protocol direct {
              interface "*";
      }
      
      protocol ospf OSPFol {
             export filter OSPF_export;
      	area 0.0.0.0 {
      		stub no;
      		interface "dummy0" {
      			stub yes;
      		};
      		interface "eth1" {
      			cost 100;
      		};
      	};
      }
      
      
    9. Router 5 - Quagga
    10. Note: Easy. Configured just like another flavor of IOS.
      !
      interface eth1
       description LSA1
       ip address 1.5.0.1/25
       ip ospf cost 100
       ipv6 nd suppress-ra
      !
      interface eth2
       description LSA5
       ip address 1.5.0.129/25
       ipv6 nd suppress-ra
      !
      router ospf
       redistribute connected route-map LSA5
       network 1.5.0.0/25 area 0.0.0.0
      !
      ip prefix-list LSA5 seq 5 permit 1.5.0.128/25
      !
      route-map LSA5 permit 1
       match ip address prefix-list LSA5
       set metric 100
      !
      
    11. Router 6 - XORP
    12. Note: eth4 has IP address 1.6.0.1/25 and eth5 has IP address 1.6.0.129/25. Although it looks fairly simple and straighforward, it seemed like a battle to get XORP to do what I wanted.
      protocols {
          ospf4 {
              area 0.0.0.0 {
                  area-type: "normal"
                  interface eth4 {
                      link-type: "broadcast"
                      vif eth4 {
                          address 1.6.0.1 {
                              priority: 128
                              hello-interval: 10
                              router-dead-interval: 40
                              interface-cost: 100
                              retransmit-interval: 5
                              transit-delay: 1
                              disable: false
                          }
                      }
                  }
              }
              export: "EXPORT-OSPF"
          }
      }
      policy {
          policy-statement "EXPORT-OSPF" {
              term LSA5 {
                  from {
                      protocol: "connected"
                      network4: 1.6.0.128/25
                  }
                  then {
                      metric: 100
                      accept {
                      }
                  }
              }
          }
      }
      interfaces {
          interface eth4 {
              description: ""
              disable: false
              discard: false
              unreachable: false
              management: false
              parent-ifname: ""
              iface-type: ""
              vid: ""
              default-system-config {
              }
          }
          interface eth5 {
              description: ""
              disable: false
              discard: false
              unreachable: false
              management: false
              parent-ifname: ""
              iface-type: ""
              vid: ""
              default-system-config {
              }
          }
      }
      
      
    13. Router 7 - 1GB Olive
    14. Note: Just like all the other Junos boxes.
      interfaces {
          em2 {
              unit 0 {
                  family inet {
                      address 1.7.0.1/25;
                  }
              }
          }
          em3 {
              unit 0 {
                  family inet {
                      address 1.7.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
                  interface em2.0 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 1.7.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      
  2. Cluster 2

    1. Router 1 - EX3200
    2. Note: Just like all the other Junos boxes.
      interfaces {
          ge-0/0/1 {
              unit 2101 {
                  vlan-id 2101;
                  family inet {
                      address 2.1.0.1/25;
                  }
              }
              unit 2105 {
                  vlan-id 2105;
                  family inet {
                      address 2.1.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface ge-0/0/1.2101 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              from {
                  protocol direct;
                  route-filter 2.1.0.128/25 exact;
              }
              then accept;
          }
      }
      
    3. Router 2 - SRX100H in Packet Mode
    4. Note: Just like all the other Junos boxes.
      interfaces {
          fe-0/0/0 {
              unit 2201 {
                  vlan-id 2201;
                  family inet {
                      address 2.2.0.1/25;
                  }
              }
              unit 2205 {
                  vlan-id 2205;
                  family inet {
                      address 2.2.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface fe-0/0/0.2201 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 2.2.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      
    5. Router 3 - Cisco 3640
    6. Note: Almost like all the other IOS boxes (each box and IOS has it's own personality and wants to be configured slightly differently), and Quagga too.
      interface FastEthernet0/0.2301
       encapsulation dot1Q 2301
       ip address 2.3.0.1 255.255.255.128
       ip ospf cost 100
      !
      interface FastEthernet0/0.2305
       encapsulation dot1Q 2305
       ip address 2.3.0.129 255.255.255.128
      !
      router ospf 1
       redistribute connected metric 100 subnets route-map LSA5
       network 2.3.0.1 0.0.0.0 area 0
      !
      ip prefix-list LSA5 seq 5 permit 2.3.0.128/25
      route-map LSA5 permit 10
       match ip address prefix-list LSA5
       set metric 100
      !
      
    7. Router 4 - SRX210HE
    8. Note: Just like all the other Junos boxes but with a few extra lines for the flow mode security.
      interfaces {
          ge-0/0/0 {
              unit 2401 {
                  vlan-id 2401;
                  family inet {
                      address 2.4.0.1/25;
                  }
              }
              unit 2405 {
                  vlan-id 2405;
                  family inet {
                      address 2.4.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface ge-0/0/0.2401 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 2.4.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      security {
          policies {
              from-zone OSPF to-zone OSPF {
                  policy ACCEPT-ALL {
                      match {
                          source-address any;
                          destination-address any;
                          application any;
                      }
                      then {
                          permit;
                      }
                  }
              }
          }
          zones {
              security-zone OSPF {
                  interfaces {
                      ge-0/0/0.2401;
                      ge-0/0/0.2405;
                  }
              }
          }
      }
      
      
    9. Router 5 - NS208
    10. Note: This one proved to be fairly difficult to get working properly as well, mainly due to the dependency on the order that commands need to be entered.
      set interface "ethernet2.11" tag 2501 zone "OSPF"
      set interface "ethernet2.15" tag 2505 zone "OSPF"
      set interface ethernet2.11 ip 2.5.0.1/25
      set interface ethernet2.11 route
      set interface ethernet2.15 ip 2.5.0.129/25
      set interface ethernet2.15 route
      set interface ethernet2.11 ip manageable
      set interface ethernet2.15 ip manageable
      set interface ethernet2.11 manage ping
      set interface ethernet2.15 manage ping
      set access-list 1
      set access-list 1 permit ip 2.5.0.128/25 1
      set route-map name "LSA5" permit 1
      set match ip 1
      set metric 100
      exit
      set protocol ospf
      set redistribute route-map "LSA5" protocol connected
      exit
      exit
      set interface ethernet2.11 protocol ospf area 0.0.0.0
      set interface ethernet2.11 protocol ospf enable
      set interface ethernet2.11 protocol ospf cost 100
      
    11. Router 6 - Cisco 2811
    12. Note: Almost like all the other IOS boxes.
      interface FastEthernet0/0.2601
       encapsulation dot1Q 2601
       ip address 2.6.0.1 255.255.255.128
       ip ospf cost 100
       ip ospf 1 area 0.0.0.0
      !
      interface FastEthernet0/0.2605
       encapsulation dot1Q 2605
       ip address 2.6.0.129 255.255.255.128
      !
      router ospf 1
       redistribute connected subnets route-map LSA5
      !
      ip prefix-list LSA5 seq 5 permit 2.6.0.128/25
      !
      route-map LSA5 permit 10
       match ip address prefix-list LSA5
      !
      
    13. Router 7 - RB750G
    14. Note: My beloved Webmin failed me here as it wouldn't let me create the ospf-out filter properly. I had to do it from the cli to get it to work.
      /interface vlan
      add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=LSA1 \
          use-service-tag=no vlan-id=2701
      add arp=enabled disabled=no interface=OSPF l2mtu=1516 mtu=1500 name=LSA5 \
          use-service-tag=no vlan-id=2705
      /routing ospf instance
      set [ find default=yes ] disabled=no distribute-default=never in-filter=\
          ospf-in metric-bgp=auto metric-connected=100 metric-default=1 \
          metric-other-ospf=auto metric-rip=20 metric-static=20 name=default \
          out-filter=ospf-out redistribute-bgp=no redistribute-connected=as-type-2 \
          redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \
          router-id=2.2.2.7
      /routing ospf area
      set [ find default=yes ] area-id=0.0.0.0 disabled=no instance=default name=\
          backbone type=default
      /ip address
      add address=2.7.0.1/25 disabled=no interface=LSA1 network=2.7.0.0
      add address=2.7.0.129/25 disabled=no interface=LSA5 network=2.7.0.128
      /routing ospf interface
      add authentication=none authentication-key="" authentication-key-id=1 cost=\
          100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \
          interface=LSA1 network-type=broadcast passive=no priority=1 \
          retransmit-interval=5s transmit-delay=1s use-bfd=no
      /routing ospf network
      add area=backbone disabled=no network=2.7.0.0/25
      /routing filter
      add action=accept chain=ospf-out disabled=no invert-match=no prefix=\
          2.7.0.128/25 set-bgp-prepend-path=""
      add action=discard chain=ospf-out disabled=no invert-match=no \
          set-bgp-prepend-path=""
      
  3. Cluster 3

    1. Router 1 - SRX100B
    2. Note: Just like all the other Junos boxes but with a few extra lines for the flow mode security. interfaces { fe-0/0/1 { unit 3101 { vlan-id 3101; family inet { address 3.1.0.1/25; } } unit 3105 { vlan-id 3105; family inet { address 3.1.0.129/25; } } } } protocols { ospf { export LSA5; area 0.0.0.0 { interface fe-0/0/1.3101 { metric 100; } } } } policy-options { policy-statement LSA5 { term LSA5 { from { protocol direct; route-filter 3.1.0.128/25 exact; } then { metric 100; accept; } } } } security { policies { from-zone OSPF to-zone OSPF { policy ACCEPT-ALL { match { source-address any; destination-address any; application any; } then { permit; } } } } zones { security-zone OSPF { interfaces { fe-0/0/1.3101; fe-0/0/1.3105; } } } }
      
      
    3. Router 2 - J2300
    4. Note: Just like all the other Junos boxes.
      interfaces {
          fe-0/0/0 {
              unit 3201 {
                  vlan-id 3201;
                  family inet {
                      address 3.2.0.1/25;
                  }
              }
              unit 3205 {
                  vlan-id 3205;
                  family inet {
                      address 3.2.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface fe-0/0/0.3201 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 3.2.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      
    5. Router 3 - Cisco 3750
    6. Note: Almost like all the other IOS boxes.
      vlan 3,33,323,334,3301,3305 
      !
      interface Vlan3301
       ip address 3.3.0.1 255.255.255.128
       ip ospf cost 100
       ip ospf 1 area 0.0.0.0
      !
      interface Vlan3305
       ip address 3.3.0.129 255.255.255.128
      !
      router ospf 1
       redistribute connected subnets route-map LSA5
      !
      ip prefix-list LSA5 seq 5 permit 3.3.0.128/25
      !
      route-map LSA5 permit 1
       match ip address prefix-list LSA5
       set metric 100
      !
      
    7. Router 4 - SRX100H
    8. interfaces {
          fe-0/0/0 {
              unit 3401 {
                  vlan-id 3401;
                  family inet {
                      address 3.4.0.1/25;
                  }
              }
              unit 3405 {
                  vlan-id 3405;
                  family inet {
                      address 3.4.0.129/25;
                  }
              }
          }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface fe-0/0/0.3401 {
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 3.4.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      security {
          policies {
              from-zone OSPF to-zone OSPF {
                  policy ACCEPT-ALL {
                      match {
                          source-address any;
                          destination-address any;
                          application any;
                      }
                      then {
                          permit;
                      }
                  }
              }
          }
          zones {
              security-zone OSPF {
                  interfaces {
                      fe-0/0/0.3401;
                      fe-0/0/0.3405;
                  }
              }
          }
      }
      
    9. Router 5 - EX220C
    10. Note: Had to configure the interface as passive as the EX2200s will only let you actively run OSPF on 4 interfaces, otherwise just like all the other Junos boxes.
      interfaces {
          ge-0/0/8 {
              vlan-tagging;
              unit 3501 {
                  vlan-id 3501;
                  family inet {
                      address 3.5.0.1/25;
                  }
              }
              unit 3505 {
                  vlan-id 3505;
                  family inet {
                      address 3.5.0.129/25;
                  }
              }
          }
      }
      protocols {
          ospf {
              export LSA5;
              area 0.0.0.0 {
                  interface ge-0/0/8.3501 {
                      passive;
                      metric 100;
                  }
              }
          }
      }
      policy-options {
          policy-statement LSA5 {
              term LSA5 {
                  from {
                      protocol direct;
                      route-filter 3.5.0.128/25 exact;
                  }
                  then {
                      metric 100;
                      accept;
                  }
              }
          }
      }
      
    11. Router 6 - RB133
    12. Note: Just like the other RouterOS box.
      /interface vlan
      add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=LSA1 \
          use-service-tag=no vlan-id=3601
      add arp=enabled disabled=no interface=OSPF l2mtu=1514 mtu=1500 name=LSA5 \
          use-service-tag=no vlan-id=3605
      /ip address
      add address=3.6.0.1/25 disabled=no interface=LSA1 network=3.6.0.0
      add address=3.6.0.129/25 disabled=no interface=LSA5 network=3.6.0.128
      /routing filter
      add action=accept chain=ospf-out disabled=no invert-match=no prefix=\
          3.6.0.128/25 set-bgp-prepend-path=""
      add action=discard chain=ospf-out disabled=no invert-match=no \
          set-bgp-prepend-path=""
      /routing ospf instance
      set [ find default=yes ] disabled=no distribute-default=never in-filter=\
          ospf-in metric-bgp=200000 metric-connected=100 metric-default=1000 \
          metric-other-ospf=auto metric-rip=20000 metric-static=2000 mpls-te-area=\
          backbone mpls-te-router-id=loopback0 name=default out-filter=ospf-out \
          redistribute-bgp=no redistribute-connected=as-type-2 \
          redistribute-other-ospf=no redistribute-rip=no redistribute-static=no \
          router-id=3.3.3.6
      /routing ospf interface
      add authentication=none authentication-key="" authentication-key-id=1 cost=\
          100 dead-interval=40s disabled=no hello-interval=10s instance-id=0 \
          interface=LSA1 network-type=broadcast passive=no priority=1 \
          retransmit-interval=5s transmit-delay=1s use-bfd=no
      /routing ospf network
      add area=backbone disabled=no network=3.6.0.0/25
      
    13. Router 7 - Cisco 1760
    14. Note: Almost like all the other IOS boxes.
      interface FastEthernet0/0.3701
       encapsulation dot1Q 3701
       ip address 3.7.0.1 255.255.255.128
       ip ospf cost 100
      !
      interface FastEthernet0/0.3705
       encapsulation dot1Q 3705
       ip address 3.7.0.129 255.255.255.128
      !
      router ospf 1
       redistribute connected subnets route-map LSA5
       network 3.7.0.1 0.0.0.0 area 0
      !
      ip prefix-list LSA5 seq 5 permit 3.7.0.128/25
      !
      route-map LSA5 permit 10
       match ip address prefix-list LSA5
       set metric 100
      !
      

And here is our LSDB, to show that everything is working as intended: Note that the 3.0.0.10 router is the Nagios monitoring station, NTP and syslog server.

The Link State Database
vyatta@vyatta:~$ show ip ospf database 

       OSPF Router with ID (1.1.1.1)

                Router Link States (Area 0.0.0.0)

Link ID         ADV Router      Age  Seq#       CkSum  Link count
1.1.1.1         1.1.1.1          380 0x80000116 0x751a 6
1.1.1.2         1.1.1.2         1568 0x800000c1 0x35d8 6
1.1.1.3         1.1.1.3          695 0x800000fd 0x5d52 7
1.1.1.4         1.1.1.4          147 0x800000d3 0x674f 7
1.1.1.5         1.1.1.5          403 0x8000010d 0xa68d 7
1.1.1.6         1.1.1.6          205 0x8000016a 0x2c8d 6
1.1.1.7         1.1.1.7           21 0x80000167 0x78fb 6
2.2.2.1         2.2.2.1         1964 0x8000001c 0x506c 8
2.2.2.2         2.2.2.2         1456 0x80000027 0xbfd0 8
2.2.2.3         2.2.2.3          344 0x80000020 0x3005 9
2.2.2.4         2.2.2.4         1991 0x80000014 0x63c1 8
2.2.2.5         2.2.2.5         1389 0x8000001b 0xddff 8
2.2.2.6         2.2.2.6         1718 0x80000219 0x3f5d 8
2.2.2.7         2.2.2.7          814 0x80000095 0x3ad6 8
3.0.0.10        3.0.0.10        1287 0x80000055 0x8cff 3
3.3.3.1         3.3.3.1         1454 0x80000015 0x5b48 8
3.3.3.2         3.3.3.2          864 0x80000012 0x5e6e 10
3.3.3.3         3.3.3.3          358 0x80000013 0x36c0 8
3.3.3.4         3.3.3.4         2502 0x80000014 0x69a4 8
3.3.3.5         3.3.3.5          394 0x80000027 0xc8f2 8
3.3.3.6         3.3.3.6         1790 0x80000032 0xacd8 8
3.3.3.7         3.3.3.7          189 0x80000014 0x8bd8 8

                Net Link States (Area 0.0.0.0)

Link ID         ADV Router      Age  Seq#       CkSum
1.0.0.7         1.1.1.7          521 0x80000041 0xcbc5
1.1.2.1         1.1.1.1         1570 0x8000009e 0x5159
1.1.7.1         1.1.1.1         1490 0x8000009b 0x663d
1.2.3.3         1.1.1.3          874 0x80000069 0xb601
1.3.4.3         1.1.1.3          285 0x80000063 0xc7f1
1.5.6.5         1.1.1.5         1282 0x80000038 0xe117
1.6.7.7         1.1.1.7         1021 0x80000025 0x03e0
2.0.0.5         2.2.2.5         1354 0x80000013 0x6941
3.0.0.5         3.3.3.5         2394 0x80000021 0x0a79
11.1.1.11       1.1.1.1          200 0x80000021 0x31d3
22.2.2.21       1.1.1.2         1440 0x8000002f 0xfddd
44.4.4.41       1.1.1.4          147 0x80000016 0x8638
55.5.5.51       1.1.1.5           82 0x8000002f 0x7911
66.6.6.63       3.3.3.6           10 0x80000014 0x6022
77.7.7.71       1.1.1.7         2021 0x8000005e 0x6b99

                AS External Link States

Link ID         ADV Router      Age  Seq#       CkSum  Route
1.1.0.128       1.1.1.1          280 0x800000d4 0xca19 E2 1.1.0.128/25 [0x0]
1.2.0.128       1.1.1.2           73 0x80000099 0xec62 E1 1.2.0.128/25 [0x0]
1.3.0.128       1.1.1.3         1473 0x8000007a 0xb07d E2 1.3.0.128/25 [0x0]
1.4.0.128       1.1.1.4          441 0x800000a5 0x2bbe E2 1.4.0.128/25 [0x0]
1.5.0.128       1.1.1.5          112 0x800000be 0xd1cf E2 1.5.0.128/25 [0x0]
1.6.0.128       1.1.1.6         1515 0x80000032 0xd853 E2 1.6.0.128/25 [0x0]
1.7.0.128       1.1.1.7         1521 0x80000074 0x6067 E2 1.7.0.128/25 [0x0]
2.1.0.128       2.2.2.1          555 0x8000000f 0x8613 E2 2.1.0.128/25 [0x0]
2.2.0.128       2.2.2.2         1580 0x8000001c 0x46df E2 2.2.0.128/25 [0x0]
2.3.0.128       2.2.2.3          344 0x80000015 0x60cc E2 2.3.0.128/25 [0x0]
2.4.0.128       2.2.2.4         2597 0x8000000b 0x44ee E2 2.4.0.128/25 [0x0]
2.5.0.128       2.2.2.5          314 0x80000014 0x9c0c E1 2.5.0.128/25 [0x0]
2.6.0.128       2.2.2.6          214 0x80000084 0x28df E2 2.6.0.128/25 [0x0]
2.7.0.128       2.2.2.7          813 0x80000011 0xe363 E2 2.7.0.128/25 [0x0]
3.1.0.128       3.3.3.1          447 0x8000000d 0x51e1 E2 3.1.0.128/25 [0x0]
3.2.0.128       3.3.3.2         1459 0x8000000a 0x45ee E2 3.2.0.128/25 [0x0]
3.3.0.128       3.3.3.3          358 0x80000011 0x43e9 E2 3.3.0.128/25 [0x0]
3.4.0.128       3.3.3.4          421 0x8000000d 0x1b12 E2 3.4.0.128/25 [0x0]
3.5.0.128       3.3.3.5         1394 0x8000001e 0xe633 E2 3.5.0.128/25 [0x0]
3.6.0.128       3.3.3.6         1584 0x8000002e 0x9691 E2 3.6.0.128/25 [0x0]
3.7.0.128       3.3.3.7          189 0x80000011 0xfa2a E2 3.7.0.128/25 [0x0]

vyatta@vyatta:~$ 

Run 2: The Careless Internet Routing Table Dump

What happends when you have a full BGP feed and you accidentally dump it into your IGP? To find out I modified my python script that blurts out random bogus prefixes to spew out a specified number of prefixes. To do this I chose a Class A network (/8 for you CIDR people) and had the python script break the entire block into the most subnets that it can fit, and then iterate through with less and less specific subnet masks until the proper number of networks is achieved. This pyhon script is bs-prefixes.py and has a hardcoded variable ClassA in the script that decides what /8 network to split up. It takes one argument, the number of prefixes. Note that you cannot have more than 16777216 prefixes unless you move to a different plane of geometry. Besides, if you want that many prefixes anyway you're even crazier than I am. I chose to use the 13/8 network, as 13 is a nice unlucky number and perfect to explode a network with, and the block belongs to Xerox and they probably wouldn't mind me copying their address space, HAR, HAR, HAR.

Example: bs-prefixes creating 7 prefixes. Perfect for feeding into exaBGP!
user@Linux-box:~$ bs-prefixes.py 7
13.0.0.0/10
13.64.0.0/10
13.128.0.0/10
13.192.0.0/10
13.0.0.0/9
13.128.0.0/9
13.0.0.0/8
user@Linux-box:~$ bs-prefixes.py 7

This python script is then used in conjunction with another shell script announce.sh to be used as a dynamic process used by exaBGP to announce the specified number of prefixes via BGP to our unlucky network. The shell script has a bit of logic built in to throttle down the announcments to 128 per second. I found if I just let it fly unbounded, I had a good chance of blowing a buffer up somewhere and having my BGP session go active. I also turned on graceful restart to make for quicker recovery in case of a dropped BGP tcp session, and pointed back at my 2GB Olive box and away from the J2300. As of 12 Jan, 2013 the full Internet routig table is about 460,000 routes. So to simulate this we'll set up our test to generate 500,000 routes -- just to be on the safe side. The new exaBGP config file is exabgp-500k.conf.

To simulate an accidental table dump, we will let BGP load up the RIB completely on the Olive2GB box. Once it is steady, we'll add an export policy to dump all the BGP routes into OSPF. This would be akin to an enterprise network that has a couple of Internet access points with a full BGP feed, but are only supposed to advertise a default route into the rest of the enterprise network. Our export policy will emulate a novice network operator ( or more likely just a stupid one ) that made a major screw up.

Letting exaBGP pump 500,000 routes into the 2GB Olive
copek@sheddap:/etc/exabgp$ exabgp exabgp-500k.conf 
Sat, 12 Jan 2013 23:09:48 INFO     8326   configuration Performing reload of exabgp 1.3.4
Sat, 12 Jan 2013 23:09:48 INFO     8326   supervisor    New Peer 172.20.1.66
Sat, 12 Jan 2013 23:09:48 INFO     8326   configuration Loaded new configuration successfully
Sat, 12 Jan 2013 23:09:48 INFO     8326   processes     Forked process service-1
trap: SIGINT: bad trap
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   >> OPEN version=4 asn=65069 hold_time=600 router_id=66.66.66.66 capabilities=[Graceful Restart, Multiprotocol for IPv4 unicast IPv6 unicast IPv4 flow-ipv4, 4Bytes AS 65069]
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   << OPEN version=4 asn=65066 hold_time=600 router_id=1.1.1.3 capabilities=[Cisco Route Refresh, Multiprotocol for IPv4 unicast, Route Refresh, Graceful Restart, 4Bytes AS 65066]
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   >> KEEPALIVE (OPENCONFIRM)
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   << KEEPALIVE (ESTABLISHED)
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   >> UPDATE (eors)
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   >> KEEPALIVE (no more UPDATE and no EOR)
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   << KEEPALIVE
Sat, 12 Jan 2013 23:09:48 INFO     8326   message       Peer     172.20.1.66 ASN 65066   << UPDATE (not parsed)
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.0.0/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.0.64/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.0.128/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.0.192/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.1.0/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.1.64/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.1.128/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.1.192/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.2.0/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.2.64/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.2.128/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.2.192/26 next-hop 10.0.0.0 
Sat, 12 Jan 2013 23:09:58 INFO     8326   processes     Command from process service-1 : announce route 13.0.3.0/26 next-hop 10.0.0.0 
The Olive with a RIB full of routes!
juniper@Olive2GB> show bgp summary    
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0            500000     500000          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
172.20.10.117         65069     500004         49       0       7     2:20:06 500000/500000/500000/0 0/0/0/0

juniper@Olive2GB> show route summary 
Autonomous system number: 65066
Router ID: 1.1.1.3

inet.0: 500104 destinations, 500104 routes (500104 active, 0 holddown, 0 hidden)
Restart Complete
              Direct:      8 routes,      8 active
               Local:      7 routes,      7 active
                OSPF:     88 routes,     88 active
                 BGP: 500000 routes, 500000 active
              Static:      1 routes,      1 active

juniper@Olive2GB> 
A quick check of our Nagios network monitoring station reveals that our network is currently all green and happy!

Nagios - Happy network!

To dump this into OSPF, we'll use the following policy.

policy-statement EXPORT-BGP-TO-OSPF {
    term STUPID {
        from protocol bgp;
        then accept;
    }
}
And apply it with the following....
[edit]
juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF 

[edit]
juniper@Olive2GB# commit 
commit complete

[edit]
juniper@Olive2GB# 

WHOOPS!

The 2GB Olive loaded up it's LSDB pretty quickly, and started flooding the network immediately.

juniper@Olive2GB# run show ospf database summary 
Area 0.0.0.0:
   21 Router LSAs
   16 Network LSAs
   1 OpaqArea LSAs
Externals:
   500020 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

[edit]
juniper@Olive2GB# 

Just keeping an eye on the Nagios monitoring page, the first router that had anything turn red was in Cluster 1, Router number 6 -- XORP.

A few minutes passed, and then a lot of things started to turn red, the NS208 went off the map, the Vyatta boxes LSA Type 5 host network advertisement was lost, the EX2200C lost it's host routes first followed shortly thereafter by all of it's OSPF routes. Then the SRX210HE went completely red. The RB133 went completely red and stayed that way forever, while it's Routerboard companion the RB750 reddened up all of it's OSPF routes on the Nagios display. The lower memory Cisco boxes all joined in the fray as well, with the 3750 leading the charge followed by the 1760, the 3640 and finally the 2811. The EX3200 didn't want to be left out, and went red for OSPF as well. All of the reactions up until this point were pretty much the same as was done with the slow LSA buildup. Nothing too new here.

Then we had some new action...the OpenOSPFd box was running full tilt on the CPU for quite some time, and then blew it's memory bounds with the cry of: OpenOSPFd process dies # UVM: pid 16359 (ospfd), uid 85 killed: out of swap.

The sudden dump of hundreds of thousands of LSAs into the network had a much different wave of destruction than I thought it would have. I assumed before hand that the entire network would just explode pretty much all at once. However, the initial wave of chaos was mostly isolated to the boxes directly connected to the router responsible for flooding the LSAs into the network. The unforunate neighbors, acted as a temporary safety buffer further on into the network. Their loaded CPUs, flapping adjacencies, crashes, reboots and even deaths insolated routers further away from the 2GB Olive box. The network failed so badly, that the flooding process was severly impeded, and wound up being drawn out over a very long time. The further the box was in the network from the nasty LSA injector, the longer it seemed to take to load up it's LSDB. However, the overload of LSAs did eventually propogate towards everyone, just a bit slower and more chaotically. In turn the half of a million LSAs made it to every corner of the network.

The remaining participants in Cluster 1 were the first to actually slot all of the new external LSAs into memory - Vyatta, BIRD, Quagga and the 1GB Olive all managed the feat.

Vyatta with 500k+ of external LSAs
vyatta@vyatta:~$ show ip ospf 
 OSPF Routing Process, Router ID: 1.1.1.1
 Supports only single TOS (TOS0) routes
 This implementation conforms to RFC2328
 RFC1583Compatibility flag is disabled
 OpaqueCapability flag is disabled
 Initial SPF scheduling delay 200 millisec(s)
 Minimum hold time between consecutive SPFs 1000 millisec(s)
 Maximum hold time between consecutive SPFs 10000 millisec(s)
 Hold time multiplier is currently 1
 SPF algorithm last executed 5.024s ago
 SPF timer is inactive
 Refresh timer 10 secs
 This router is an ASBR (injecting external routing information)
 Number of external LSA 500020. Checksum Sum 0xd4687780
 Number of opaque AS LSA 0. Checksum Sum 0x00000000
 Number of areas attached to this router: 1

 Area ID: 0.0.0.0 (Backbone)
   Number of interfaces in this area: Total: 6, Active: 6
   Number of fully adjacent neighbors in this area: 3
   Area has no authentication
   SPF algorithm executed 1983 times
   Number of LSA 38
   Number of router LSA 21. Checksum Sum 0x000a5a14
   Number of network LSA 17. Checksum Sum 0x0008311c
   Number of summary LSA 0. Checksum Sum 0x00000000
   Number of ASBR summary LSA 0. Checksum Sum 0x00000000
   Number of NSSA LSA 0. Checksum Sum 0x00000000
   Number of opaque link LSA 0. Checksum Sum 0x00000000
   Number of opaque area LSA 0. Checksum Sum 0x00000000

vyatta@vyatta:~$ 
BIRD Stuffed full of LSAs
bird> show ospf 
OSPFol:
RFC1583 compatibility: disabled
RT scheduler tick: 2
Number of areas: 1
Number of LSAs in DB:	500058
	Area: 0.0.0.0 (0) [BACKBONE]
		Stub:	No
		NSSA:	No
		Transit:	No
		Number of interfaces:	6
		Number of neighbors:	9
		Number of adjacent neighbors:	7
bird> 

Curiously, Vyatta was sucking up a lot more CPU time than the Quagga instance was. However, as I was watching the LSA totals build up in Cluster 2 and 3, Quagga let out a death cry:

quagga-router# [156054.409876] Out of memory: Kill process 1584 (ospfd) score 695 or sacrifice child
[156054.410789] Killed process 1584 (ospfd) total-vm:782616kB, anon-rss:707220kB, file-rss:0kB

Cluster 2 and 3 were well behind in loading up LSAs. The boxes still alive and kicking without complaint were up at about 250K of external LSAs

Representaive from Cluster 2, only has a little more than half of the LSAs at this point...
juniper@SRX100-6_OSPF> show ospf database summary    
Area 0.0.0.0:
   21 Router LSAs
   8 Network LSAs
   1 OpaqArea LSAs
Externals:
   252992 Extern LSAs
Interface fe-0/0/0.2:
Area 0.0.0.0:
Interface fe-0/0/0.212:
Area 0.0.0.0:
Interface fe-0/0/0.22:
Area 0.0.0.0:
Interface fe-0/0/0.2201:
Area 0.0.0.0:
Interface fe-0/0/0.223:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@SRX100-6_OSPF> 

Triage

After about 2 hours, things seemed to be converged as they were going to get. The Nagios map had settled down without change to look like this:

Nagios - Very sad network!

So when things had more or less gotten as bad as it looked like things were going to get, here is each participants status:

  1. Cluster 1

    1. Router 1 - Vyatta
    2. Fully loaded LSDB, however over time it appeared to be very, very slowly consuming all of it's memory.

    3. Router 2 - OpenOSPFd
    4. The ospfd process terminated. The box still responded to pings, but did not attempt to participate in OSPF again.

    5. Router 3 - Olive 2GB
    6. The route injector was happily keeping it's 500K routes it injected up to date

    7. Router 4 - BIRD
    8. Fully loaded LSDB. The CPU was running a fair amount, about 20% to 40%, but it seemed steady

    9. Router 5 - Quagga
    10. The ospfd process died. The OS still allowed for pings, but it was done routing.

    11. Router 6 - XORP
    12. No xorp processes were left running, all had terminated. The OS still allowed the box to be pinged.

    13. Router 7 - 1GB Olive
    14. The 1GB Olive was running steady.

  2. Cluster 2

    1. Router 1 - EX3200
    2. The routing process, rpd, was in a perpetual restart loop.

    3. Router 2 - SRX100H in Packet Mode
    4. Was still working on loading LSAs, was up to 305469 external LSAs after 2 hours

    5. Router 3 - Cisco 3640
    6. The OSPF process was in a perpetual restart loop.

    7. Router 4 - SRX210HE
    8. The entire system was in a perpetual reboot loop.

    9. Router 5 - NS208
    10. This one was also in a perpetual system reboot loop.

    11. Router 6 - Cisco 2811
    12. The OSPF process was in a perpetual restart loop.

    13. Router 7 - RB750G
    14. The system responded to pings, but was otherwise completely dead. It did not particpate in routing, and would not offer a login prompt. It had closed the telnet connection I had open to it.

  3. Cluster 3

    1. Router 1 - SRX100B
    2. Was in a perpetual system restart loop, like all of the other Junos boxes in flow mode.

    3. Router 2 - J2300
    4. Still loading LSAs and holding steady, up to 385259.

    5. Router 3 - Cisco 3750
    6. OSPF was in a perpetual restart loop, and the system had disabled CEF forwarding mode.

    7. Router 4 - SRX100H
    8. Perpetual system restart loop.

    9. Router 5 - EX220C
    10. RPD was in a perpetual restart loop.

    11. Router 6 - RB133
    12. This box was completely dead. No response to pings, no action on the console port.

    13. Router 7 - Cisco 1760
    14. Perpetual system restart loop.

If you care to go through the syslog messages, and see the crys for help and make your own conclusions about what happened, the syslog output is available for download here: run2_results.log.bz2. This file is compressed with bzip2 down to 143K, and expands to 2.3M. This is all of the syslog messages up until this point.

Stopping the madness...

After two hours, at 15:20, I decided to stop C1R3 from keeping it's 500K of external LSAs fresh, and let it prematurely expire all of them by stopping the redistribution of BGP into OSPF:

[edit]
juniper@Olive2GB# delete protocols ospf export EXPORT-BGP-TO-OSPF 

[edit]
juniper@Olive2GB# commit 
commit complete

[edit]
juniper@Olive2GB# 

You can see that now all of the external LSAs for any network in the 13.0.0.0/8 space it originated was now aged out.

juniper@Olive2GB> show ospf database    

    OSPF database, Area 0.0.0.0
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
Router   1.1.1.1          1.1.1.1          0x80000140   391  0x2  0x4844  96
Router  *1.1.1.3          1.1.1.3          0x80000120    43  0x22 0xe89  108
Router   1.1.1.4          1.1.1.4          0x800000f0  1281  0x2  0x95b7  96
Router   1.1.1.7          1.1.1.7          0x800001a2   206  0x22 0xcc70  96
Router   2.2.2.1          2.2.2.1          0x8000013b   252  0x22 0x4ed7  96
Router   2.2.2.2          2.2.2.2          0x80000129  1567  0x22 0x578d  96
Router   2.2.2.3          2.2.2.3          0x80000076    17  0x22 0x50e6 108
Router   2.2.2.5          2.2.2.5          0x8000014e   113  0x22 0x279d  96
Router   2.2.2.6          2.2.2.6          0x80000255  1161  0x22 0x5a2   96
Router   2.2.2.7          2.2.2.7          0x800000c7   586  0x2  0x2a16  96
Router   3.0.0.10         3.0.0.10         0x80000096   310  0x2  0x64e3  60
Router   3.3.3.1          3.3.3.1          0x80000075   553  0x22 0x2dd   96
Router   3.3.3.2          3.3.3.2          0x800000d7  2787  0x22 0xfde8 120
Router   3.3.3.3          3.3.3.3          0x80000052  2742  0x22 0xee5  108
Router   3.3.3.4          3.3.3.4          0x800000ad  2531  0x22 0x1758  96
Router   3.3.3.5          3.3.3.5          0x80000169   160  0x22 0x4657  96
Router   3.3.3.7          3.3.3.7          0x80000050   500  0x22 0xdaa8  96
Network  1.0.0.7          1.1.1.7          0x8000005b  1935  0x22 0xd2c6  44
Network  1.1.7.1          1.1.1.1          0x800000bb   987  0x2  0x265d  32
Network *1.3.4.3          1.1.1.3          0x8000007b   552  0x22 0x970a  32
Network  3.0.0.7          3.3.3.7          0x80000034   108  0x22 0xb4e9  44
Network  22.2.2.22        2.2.2.2          0x80000009  2424  0x22 0xf5f0  32
Network  44.4.4.41        1.1.1.4          0x80000002  3600  0x2  0xfde2  32
Network  55.5.5.53        3.3.3.5          0x80000014  3600  0x22 0xc6c2  32
    OSPF AS SCOPE link state database
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
Extern   1.1.0.128        1.1.1.1          0x800000f3  1048  0x2  0x8c38  36
Extern  *1.3.0.128        1.1.1.3          0x80000093   552  0x22 0x7e96  36
Extern   1.4.0.128        1.1.1.4          0x800000c5   137  0x2  0xeade  36
Extern   1.7.0.128        1.1.1.7          0x8000008e   876  0x22 0x2c81  36
Extern   2.1.0.128        2.2.2.1          0x8000005c   261  0x22 0xeb60  36
Extern   2.2.0.128        2.2.2.2          0x80000031  3199  0x22 0x1cf4  36
Extern   2.3.0.128        2.2.2.3          0x80000032  1297  0x20 0x26e9  36
Extern   2.5.0.128        2.2.2.5          0x8000006c   153  0x22 0xeb64  36
Extern   2.6.0.128        2.2.2.6          0x800000a0  1324  0x20 0xeffb  36
Extern   2.7.0.128        2.2.2.7          0x8000003d   586  0x2  0x8b8f  36
Extern   3.1.0.128        3.3.3.1          0x80000025  1280  0x22 0x21f9  36
Extern   3.2.0.128        3.3.3.2          0x8000001e  1064  0x22 0x1d03  36
Extern   3.3.0.128        3.3.3.3          0x8000002e  1492  0x20 0x907   36
Extern   3.4.0.128        3.3.3.4          0x80000022  2926  0x22 0xf027  36
Extern   3.5.0.128        3.3.3.5          0x8000005c   684  0x22 0x6a71  36
Extern   3.7.0.128        3.3.3.7          0x80000030   500  0x20 0xbc49  36
Extern  *13.1.122.0       1.1.1.3          0x80000008  3600  0x22 0xb603  36
Extern  *13.1.122.63      1.1.1.3          0x80000008  3600  0x22 0xc7f0  36
Extern  *13.1.122.64      1.1.1.3          0x80000008  3600  0x22 0xbdf9  36
Extern  *13.1.122.127     1.1.1.3          0x80000008  3600  0x22 0xc3f4  36
Extern  *13.1.122.128     1.1.1.3          0x80000008  3600  0x22 0xb9fd  36
Extern  *13.1.122.191     1.1.1.3          0x80000008  3600  0x22 0xc275  36
Extern  *13.1.122.192     1.1.1.3          0x80000008  3600  0x22 0xb87e  36
Extern  *13.1.122.255     1.1.1.3          0x80000008  3600  0x22 0xbbfc  36
Extern  *13.1.123.0       1.1.1.3          0x80000008  3600  0x22 0xb007  36
Extern  *13.1.123.63      1.1.1.3          0x80000008  3600  0x22 0xbcfa  36
Extern  *13.1.123.64      1.1.1.3          0x80000008  3600  0x22 0xb204  36
Extern  *13.1.123.127     1.1.1.3          0x80000008  3600  0x22 0xb8fe  36
Extern  *13.1.123.128     1.1.1.3          0x80000008  3600  0x22 0xae08  36
Extern  *13.1.123.191     1.1.1.3          0x80000008  3600  0x22 0xb77f  36
Extern  *13.1.123.192     1.1.1.3          0x80000008  3600  0x22 0xad88  36
Extern  *13.1.124.0       1.1.1.3          0x80000008  3600  0x22 0xa017  36
Extern  *13.1.124.63      1.1.1.3          0x80000008  3600  0x22 0xb105  36
Extern  *13.1.124.64      1.1.1.3          0x80000008  3600  0x22 0xa70e  36
Extern  *13.1.124.127     1.1.1.3          0x80000008  3600  0x22 0xad09  36
Extern  *13.1.124.128     1.1.1.3          0x80000008  3600  0x22 0xa312  36
...
..
.

The task of removing all of these nasty advertisements from the network, proved to be just as destructive if not moreso, than introducing them.

The Vyatta CPU immediately pegged at 100%. The entire network went back into utter chaos again, with once fairly steady routers blinking between red and green. Then the ospfd Quagga process that was running on the Nagios machine imploded with the same out of memory condition as C1R5. Out of mercy, and the desire to keep an eye on the chaos I had mercy and restarted the Quagga daemons on the Nagios box. As there was still a ton of advertisements swamping the network, the router LSAs and host network LSAs were lost in the chrun. It took a about 50 minutes before things calmed down enough again that Quagga on the Nagios box started populating the operating systems routing table with some of the router LSAs and the host routes.

I was also suprised at how long some of the boxes kept the aged out LSAs in memory. For instance, the 2GB Olive box, although it had propmtly removed all of the routes from it's main routing table, still had 400,000+ LSAs still stuffed in it's LSDB. It purged about 100,000 of them in about 5 minutes, the remainder took about 1 hour and 30 minutes before there was no sign of any of them left anywhere in it's DB structures. It took almost an hour until I even began to see LSA counts start to drop in any of the other routers. BIRD was the first implementation to clear out the old LSAs, cleansing itself well before the 2GB Olive box.

BIRD Purged Well Before Anyone Else
bird> show ospf
OSPFol:
RFC1583 compatibility: disabled
RT scheduler tick: 2
Number of areas: 1
Number of LSAs in DB:	37
	Area: 0.0.0.0 (0) [BACKBONE]
		Stub:	No
		NSSA:	No
		Transit:	No
		Number of interfaces:	6
		Number of neighbors:	4
		Number of adjacent neighbors:	4
bird> 
The Originator still had 294k Externals in it's LSDB
juniper@Olive2GB> show ospf database summary    
Area 0.0.0.0:
   17 Router LSAs
   12 Network LSAs
Externals:
   294494 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@Olive2GB> 

At 16:18, the Vyatta box started going completely bonkers with syslog messages complaining about Link State Updates equal to the Max LSA Age. In the syslog server there were 1,384,807 lines of these messages; and a ton of messages stating that the vyatta box had been throttled due to excessive message rate.

an 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.63),ar(1.1.1.3)]: LS age is equal to MaxAge.
Jan 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.64),ar(1.1.1.3)]: LS age is equal to MaxAge.
Jan 13 15:20:02 vyatta ospfd[1600]: Link State Update[Type5,id(13.6.155.127),ar(1.1.1.3)]: LS age is equal to MaxAge.

At just shy of two hours after the redistribution problem was fixed, the network seemed to slow it's surging in and out and start to recover. There were still a lot of routers that were chock full of external LSAs...and still loading them!!! The J2300, which never had the full 500K value showing up in it's LSDB finally maxed out at about 16:30

juniper@J2300-7> show ospf database summary    
Area 0.0.0.0:
   21 Router LSAs
   23 Network LSAs
   1 OpaqArea LSAs
Externals:
   500020 Extern LSAs
Interface fe-0/0/0.22:
Area 0.0.0.0:
Interface fe-0/0/0.3:
Area 0.0.0.0:
Interface fe-0/0/0.312:
Area 0.0.0.0:
Interface fe-0/0/0.3201:
Area 0.0.0.0:
Interface fe-0/0/0.323:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@J2300-7> 

Two hours after the LSAs were expired, the RB750G login prompt reappeared. The box was VERY sluggish for a while, but eventually started to respond to commands again. A few minutes later, the Vyatta VM had purged all of the old LSAs from memory. A lot of the host network advertisements began to reappar back on the network. Some of them surged in and out a lot, due to processes restarting and boxes rebooting (NS208, and all the Cisco devices).

At 16:50, the originator of all of the nasty LSAs had finally purged them all:

juniper@Olive2GB> show ospf database summary    
Area 0.0.0.0:
   16 Router LSAs
   11 Network LSAs
Externals:
   15 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@Olive2GB> 

Three hours after recover, old LSAs were still floating around as evidenced by these messages on the NS208 console:

ospf: receive self-orginated newer lsa with same seq -2147483604, but bigger checksum

And finally, a full six hours after the event started, and four hours after the madness was stopped, the network had seemed to once again reach a steady state.

Nagios - Network took some bruises!

And speaking of keeping LSAs for a long time, at 22:33, a full 8 hours after the LSAs were all expired, the J2300 still has them all in it's LSDB! And just for fun, here is a copy of the LSDB from the J2300 here: J2300_LSDB_500K.txt.bz2 This is a 2.1M bzip2 compressed file that expands to 37M of Link State Fun! And this is just the overview!

Let's take a look at what routers recovered, and which ones didn't. As well as what it took to get them talking to the rest of the network again.

  1. Cluster 1

    1. Router 1 - Vyatta
    2. Survived, but with some heavy CPU usage and massive syslogging.

    3. Router 2 - OpenOSPFd
    4. Had to restart the ospfd process.

    5. Router 3 - Olive 2GB
    6. Survived!

    7. Router 4 - BIRD
    8. Survived, some medium CPU usage.

    9. Router 5 - Quagga
    10. Had to restart zebra and ospfd. Note: This host wasn't running daemon tools. If it had been, quagga may have been in an endless process restart cycle.

    11. Router 6 - XORP
    12. Had to restart the xorp processes.

    13. Router 7 - 1GB Olive
    14. Survived.

  2. Cluster 2

    1. Router 1 - EX3200
    2. Survived.

    3. Router 2 - SRX100H in Packet Mode
    4. Survived, but had to wait a very long time for the LSDB to clear out enough that it had enough CPU power to successfully announce itself to the network again.

    5. Router 3 - Cisco 3640
    6. Came back on it's own without intervention.

    7. Router 4 - SRX210HE
    8. Came back on it's own without intervention.

    9. Router 5 - NS208
    10. Came back on it's own without intervention.

    11. Router 6 - Cisco 2811
    12. Came back on it's own without intervention.

    13. Router 7 - RB750G
    14. Had to reboot the system as man of the former adjacencies it had were now stuck in ExStart or Exchange, was able to do this from the cli remotely.

  3. Cluster 3

    1. Router 1 - SRX100B
    2. Came back on it's own without intervention.

    3. Router 2 - J2300
    4. Survived.

    5. Router 3 - Cisco 3750
    6. Attempted to renable CEF like the switch was telling me to do, but this created another error on the box:

      Jan 13 22:46:28.369: %COMMON_FIB-3-HW_API: HW API failure for IPv4 CEF [0x0149B6E8]: Platform IPv4 Fib malloc failed (fatal) (26 subsequent failures).
      

      Attempted to reload the box from the cli through the console port, but the command failed due to lack of memory. Then the cli locked up completely, and the box finally quit spewing errors about low memory and disabling CEF (among other complaints). When it rebooted the memory overflowed again and they cycle repeated.

    7. Router 4 - SRX100H
    8. Came back on it's own without intervention.

    9. Router 5 - EX220C
    10. Came back on it's own without intervention.

    11. Router 6 - RB133
    12. Had to power cylce the box, but to no avail. Stale LSAs killed this box repeatedly.

    13. Router 7 - Cisco 1760
    14. After 12 hours, there were still old stale LSAs floating round, and was still rebooting spontaneously.

    The Cisco 3750, the RB133 and the Cisco 1760 caused me to have to more or less simultaneously reboot any router that still had some old LSAs in it's memory -- so more or less I had to shutdown and reboot everything in Cluster 3. That was some pain!

Protecting the Routers from LSA Overload

Fortunately, there are a lot of different ways to protect router from being overloaded with LSAs to the point that it crashes and burns in a fit of LSAs.

    LSA Overload Protection Methods

  1. If you have a very underpowered router with a very small amount of memory, that needs to be on the network for some reason, protect it by placing it in a Stub area with no-summaries, AKA a Totally Stubby Area. If for some reason you still need to inject routes from an external source from the pathetic little piece of kit, you can place it in a NSSA with no-summaries, AKA a Totally Not-So-Stubby-Area. This may not be a viable option from protecting you're whole network from an accidental full Internet table dump, but it can serve to protect older boxes from other accidents, sloppiness, and the inevitible LSA creep.

  2. On any boxes that are exporting other routing protocols into OSPF, ensure the box has a protection mechanism built in to stop it from introducing too many external LSAs into the network. On Junos, this feature is implemented as a command set protocols ospf prefix-export-limit <integer from 0 to 4294967295> . This basically will stop any redistribution once the router hits the limit specified. Of course if you set this to a silly-high number you're not really protecting anything.

    Example:Protecting our Network with the prefix-export-limit command.

    [edit protocols ospf]
    juniper@Olive2GB# set prefix-export-limit 25     
    
    [edit protocols ospf]
    juniper@Olive2GB# commit   
    commit complete
    
    And our OSPF database:
    juniper@Olive2GB# run show ospf database summary 
    Area 0.0.0.0:
       8 Router LSAs
       8 Network LSAs
    Externals:
       7 Extern LSAs
    Interface em1.1:
    Area 0.0.0.0:
    Interface em1.123:
    Area 0.0.0.0:
    Interface em1.134:
    Area 0.0.0.0:
    Interface em1.33:
    Area 0.0.0.0:
    Interface em2.0:
    Area 0.0.0.0:
    Interface lo0.0:
    Area 0.0.0.0:
    
    [edit protocols ospf]
    juniper@Olive2GB# 
    
    And check that our box is still bursting at the seams with routes learned via BGP:
    juniper@Olive2GB# run show route summary 
    Autonomous system number: 65066
    Router ID: 1.1.1.3
    
    inet.0: 500048 destinations, 500048 routes (500048 active, 0 holddown, 0 hidden)
    Restart Complete
                  Direct:      8 routes,      8 active
                   Local:      7 routes,      7 active
                    OSPF:     32 routes,     32 active
                     BGP: 500000 routes, 500000 active
                  Static:      1 routes,      1 active
    
    [edit protocols ospf]
    juniper@Olive2GB# 
    
    And repeat the same redistibution nightmare - 500,000 BGP routes into OSPF!
    juniper@Olive2GB# set export EXPORT-BGP-TO-OSPF 
    
    [edit protocols ospf]
    juniper@Olive2GB# commit 
    commit complete
    
    [edit protocols ospf]
    juniper@Olive2GB# 
    
    And we check our LSDB again a few minutes after we repeated our stupid mistake (but this time with a routing condom on):
    juniper@Olive2GB# run show ospf database summary    
    Area 0.0.0.0:
       8 Router LSAs
       8 Network LSAs
    Externals:
       6 Extern LSAs
    Interface em1.1:
    Area 0.0.0.0:
    Interface em1.123:
    Area 0.0.0.0:
    Interface em1.134:
    Area 0.0.0.0:
    Interface em1.33:
    Area 0.0.0.0:
    Interface em2.0:
    Area 0.0.0.0:
    Interface lo0.0:
    Area 0.0.0.0:
    
    [edit protocols ospf]
    juniper@Olive2GB# start
    
    And we still have more or less the same number of external LSAs, in fact now we're missing one! Checking the logs on the Junos box reveals the following message:
    Jan 14 21:59:46  Olive2GB rpd[1206]: RPD_OSPF_OVERLOAD: OSPF instance master topology default is going into overload state: number of export prefixes (26) exceeded maximum allowed (25)
    
    So basically it looks like the box protected the network but refusing to export anything if the prefix limit was reached. We'll verify this by checking the LSAs the router is originating, deleting our stupid export policy, and then check the self originated LSAs again.
    [edit protocols ospf]
    juniper@Olive2GB# run show ospf database advertising-router self 
    
        OSPF database, Area 0.0.0.0
     Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
    Router  *1.1.1.3          1.1.1.3          0x80000153   271  0x22 0xf525  96
    Network *1.2.3.3          1.1.1.3          0x8000009f  1729  0x22 0x4a37  32
    Network *1.3.4.3          1.1.1.3          0x800000ac   494  0x22 0x353b  32
    
    [edit protocols ospf]
    juniper@Olive2GB# delete export EXPORT-BGP-TO-OSPF 
    
    [edit protocols ospf]
    juniper@Olive2GB# commit 
    commit complete
    
    [edit protocols ospf]
    juniper@Olive2GB# run show ospf database advertising-router self 
    
        OSPF database, Area 0.0.0.0
     Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
    Router  *1.1.1.3          1.1.1.3          0x80000154     5  0x22 0x5abf  96
    Network *1.2.3.3          1.1.1.3          0x8000009f  1751  0x22 0x4a37  32
    Network *1.3.4.3          1.1.1.3          0x800000ac   516  0x22 0x353b  32
        OSPF AS SCOPE link state database
     Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
    Extern  *1.3.0.128        1.1.1.3          0x80000001     5  0x22 0xa304  36
    
    [edit protocols ospf]
    juniper@Olive2GB# 
    
    Sure enough, after deleting the policy that blew the limit, the router is now originaing some external LSAs.

    This method will protect your network from the killer flood of LSAs, but any LSAs that the router should have been injecting, like maybe a default route will be sacrificed to pay homage to the router gods in the process. You may loose some valuable routes this way, but you're not likely to take out your entire network in a chaotic outage that lasts for hours on end.

  3. On every box in your OSPF domain, limit the number of external LSAs that the router will accept. Your network will still get the flood of LSAs, but it should keep your boxes from blowing up if the bounds are set properly. The good part about this one, is that you can set each box differently based on it's capabilities and you'll still be able to advertise out the ones you need - in addition to the others. I'll explore this option a bit more in Test Run 3, but first we need to setup all of the boxes to support this. We'll set the number of external LSAs that each box is allowed to keep to 5 * the base amount of Memory in MB. So for instance, our 32MB RouterBoards will be allowed to accept 5*32 = 160 external LSAs, while our boxes with 1 GB will be allowed to keep 5120.
  4. The Extra Config to Limit the Number of External LSAs accepted
    1. Cluster 1

      1. Router 1 - Vyatta
      2. Does not appear to support this type of feature.

      3. Router 2 - OpenOSPFd
      4. Does not appear to support this type of feature.

      5. Router 3 - Olive 2GB
      6. Junos doesn't support this until version 10.2

      7. Router 4 - BIRD
      8. Does not appear to support this type of feature.

      9. Router 5 - Quagga
      10. Does not appear to support this type of feature.

      11. Router 6 - XORP
      12. Does not appear to support this type of feature.

      13. Router 7 - 1GB Olive
      14. Junos doesn't support this until version 10.2.

    2. Cluster 2

      1. Router 1 - EX3200
      2. Versions of Junos 10.2 and later, have some pretty powerful ways to protect a routers LSDB, and it's CPU as well. You can specifiy a maximum number of non-self originated LSAs with the command maximum-lsa You can also set a warning threshold to fire off an alarm and send a trap to a NMS when you're approaching the LSA limit that's been set by specifying a percentage with warning-threshold. When this theshold is violated, essential the router moves into the ignore state. While the OSPF process is in the ignore state, it essentially stops listening to all LSAs. Of course, this means that the router drops all of it's OSPF neighbors during that time..

        When the OSPF process first starts ignoring LSAs, it also starts a timer that determines how long the router will cast away all of the LSAs that happen to collide with one of it's interfaces, and it increments a counter that keeps track of how many times it's been in the ignore state. When the ignore timer expires, it starts listening to advertisements again, and another timer starts: the retry timer. The retry timer is a short test period to see if the oversized mass of LSAs that blew the threshold is still floating around or not. If the timer expires, the network has passed the safety check and the database protection state is reset back to the beginning with the ignore counter set to zero. However, if the LSA nasties swamp the router again when it's OSPF adjacencies come back up, it moves back into the ignore state once more and increments the counter.

        Whenever the ignore counter is incremented, there is one more check that is done. The ignore counter has a maximum value it can reach before the router moves into the dreaded isolation state. In the isolation state, the router has decided that the network is too crappy to even bother with, and would rather isolate itself than deal with all of the LSAs again. When the router is isolated, it has basically given up and will need operator intervention to bring it back up.

        The timers can be tweaked: the ignore timer is set with ignore-time <seconds>, the retry timer is set with reset-time <seconds>. The maximum value of times the OSPF process will move into the ignore state before isolation is set with the ignore-count <integer>. The defaults for Junos if database protection is enabled are: 600 seconds for the reset-time, and 300 seconds for the ignore time. The maximum value of the ignore counter is 5.

        Lets do a quick run and see how all of this works in practice. We'll set the maximum number of LSAs to 640. We'll configure a warning alarm to fire off at 80% of the threshold value, and we'll set all the timers to fairly low values so we're not waiting around all day for things to happen.

        Database-Protection Values for Quick Test

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 640;
                    warning-threshold 80;
                    ignore-count 3;
                    ignore-time 60;
                    reset-time 300;
                }
            }
        }
        

        You can see the values of all of the pertinant values with the command show ospf overview. The timers and counters are all shown under the line "Database protection state:".

        juniper@EX3200-2_OSPF> show ospf overview   
        Instance: master
          Router ID: 2.2.2.1
          Route table index: 0
          AS boundary router
          LSA refresh time: 50 minutes
          DoNotAge uncapable
            AS scope LSAs received with no DC bit: 5
            Area scope LSAs received with no DC bit: 14
          Database protection state: Normal
            Warning threshold: 80 percent
            Non self-generated LSAs: Current 25, Warning 512, Allowed 640
            Ignore time: 60, Reset time: 300
            Ignore count: Current 0, Allowed 3
          Area: 0.0.0.0
            Stub type: Not Stub
            Authentication Type: None
            Area border routers: 0, AS boundary routers: 8
            Neighbors
              Up (in full state): 4
            DoNotAge uncapable
              Area scope LSAs received with no DC bit: 14
          Topology: default (ID 0)
            Prefix export count: 1
            Full SPF runs: 87                   
            SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
            Backup SPF: Not Needed
        
        juniper@EX3200-2_OSPF> 
        

        We'll do a quick neighbor check:

        juniper@EX3200-2_OSPF> show ospf neighbor 
        Address          Interface              State     ID               Pri  Dead
        11.1.1.11        ge-0/0/1.11            Full      1.1.1.1          128    39
        2.0.0.6          ge-0/0/1.2             Full      2.2.2.6          128    31
        2.0.0.10         ge-0/0/1.2             2Way      3.0.0.10           0    31
        2.0.0.7          ge-0/0/1.2             Full      2.2.2.7          128    37
        2.1.7.7          ge-0/0/1.217           Full      2.2.2.7          128    37
        
        juniper@EX3200-2_OSPF> 
        

        And then we'll have the nasty operator of our Olive box start flooding an additionall 5000 external LSAs. In an instant, our neighbors are gone:

        juniper@EX3200-2_OSPF> show ospf neighbor    
        
        juniper@EX3200-2_OSPF> 
        

        Inspecting our logs, we find a bunch of nasty messages: a warning, followed by an error and notification that all of our neighbors were murdered.

        Jan 16 21:19:14  EX3200-2_OSPF rpd[1081]: RPD_OSPF_LSA_WARNING_EXCEEDED: OSPF realm ospf-v2 number of non-local LSAs exceeded warning limit
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_LSA_MAXIMUM_EXCEEDED: OSPF realm ospf-v2 number of non-local LSAs exceeded maximum limit
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 11.1.1.11 (realm ospf-v2 ge-0/0/1.11 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum)
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.6 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum)
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.10 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from 2Way to Down due to KillNbr (event reason: exceeded database protection maximum)
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.0.0.7 (realm ospf-v2 ge-0/0/1.2 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum)
        Jan 16 21:19:15  EX3200-2_OSPF rpd[1081]: RPD_OSPF_NBRDOWN: OSPF neighbor 2.1.7.7 (realm ospf-v2 ge-0/0/1.217 area 0.0.0.0) state changed from Full to Down due to KillNbr (event reason: exceeded database protection maximum)
        

        Checking our OSPF process, we find that the router is currently ignoring everybody, and has incremented the ignore counter to 1.

        juniper@EX3200-2_OSPF> show ospf overview    
        Instance: master
          Router ID: 2.2.2.1
          Route table index: 0
          AS boundary router
          LSA refresh time: 50 minutes
          Database protection state: Ignore (34 seconds remaining)
            Warning threshold: 80 percent
            Non self-generated LSAs: Current 0, Warning 512, Allowed 640
            Ignore time: 60, Reset time: 300
            Ignore count: Current 1, Allowed 3
          Area: 0.0.0.0
            Stub type: Not Stub
            Authentication Type: None
            Area border routers: 0, AS boundary routers: 0
            Neighbors
              Up (in full state): 0
          Topology: default (ID 0)
            Prefix export count: 1
            Full SPF runs: 89
            SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
            Backup SPF: Not Needed
        
        juniper@EX3200-2_OSPF>
        

        We'll kill the nasty behavior of our big LSA injector, and let the Ignore timer run it's course to zero.

        juniper@EX3200-2_OSPF> show ospf overview Instance: master Router ID: 2.2.2.1 Route table index: 0 AS boundary router LSA refresh time: 50 minutes DoNotAge uncapable AS scope LSAs received with no DC bit: 5 Area scope LSAs received with no DC bit: 13 Database protection state: Retry (288 seconds remaining) Warning threshold: 80 percent Non self-generated LSAs: Current 24, Warning 512, Allowed 640 Ignore time: 60, Reset time: 300 Ignore count: Current 1, Allowed 3 Area: 0.0.0.0 Stub type: Not Stub Authentication Type: None Area border routers: 0, AS boundary routers: 8 Neighbors Up (in full state): 4 DoNotAge uncapable Area scope LSAs received with no DC bit: 13 Topology: default (ID 0) Prefix export count: 1 Full SPF runs: 96 SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3 Backup SPF: Not Needed juniper@EX3200-2_OSPF>

        We can see now the OSPF process is in the "Retry" state and has another counter going. Note that we reformed our neighbors, and have 24 LSAs in our LSDB, which is within our limits. Letting this timer expire we see that our OSPF process's bursting LSDB wounds have all healed, and our "Ignore counter" is back to zero.

        juniper@EX3200-2_OSPF> show ospf overview    
        Instance: master
          Router ID: 2.2.2.1
          Route table index: 0
          AS boundary router
          LSA refresh time: 50 minutes
          DoNotAge uncapable
            AS scope LSAs received with no DC bit: 5
            Area scope LSAs received with no DC bit: 13
          Database protection state: Normal
            Warning threshold: 80 percent
            Non self-generated LSAs: Current 24, Warning 512, Allowed 640
            Ignore time: 60, Reset time: 300
            Ignore count: Current 0, Allowed 3
          Area: 0.0.0.0
            Stub type: Not Stub
            Authentication Type: None
            Area border routers: 0, AS boundary routers: 8
            Neighbors
              Up (in full state): 4
            DoNotAge uncapable
              Area scope LSAs received with no DC bit: 13
          Topology: default (ID 0)
            Prefix export count: 1
            Full SPF runs: 97                   
            SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
            Backup SPF: Not Needed
        
        juniper@EX3200-2_OSPF> 
        

        Now let's push the Ignore counter passed the allowed limit and see what happens.

        juniper@EX3200-2_OSPF> show ospf overview    
        Instance: master
          Router ID: 2.2.2.1
          Route table index: 0
          AS boundary router
          LSA refresh time: 50 minutes
          Database protection state: Isolate
            Warning threshold: 80 percent
            Non self-generated LSAs: Current 0, Warning 512, Allowed 640
            Ignore time: 60, Reset time: 300
            Ignore count: Current 4, Allowed 3
          Area: 0.0.0.0
            Stub type: Not Stub
            Authentication Type: None
            Area border routers: 0, AS boundary routers: 0
            Neighbors
              Up (in full state): 0
          Topology: default (ID 0)
            Prefix export count: 1
            Full SPF runs: 101
            SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
            Backup SPF: Not Needed
        
        juniper@EX3200-2_OSPF> 
        

        We've moved into the Isolation state. The router has given up, and will need to be revived by manual intervention. Note there are no counters happily counting down, just the dreaded word Isolate. To revivie the OSPF process, you need to enter the command clear ospf database-protection. This command can be run at any time, and will set any counters and timers back to the begnning even if the router hasn't quite given up and gone into the isolation state.

        For the next run, we'll try some timers that will hopefully let the router come back to life on it's own, but not contribute to the churn so much

        OSPF Database Protection Settings to be Used for the Next Test Run
        protocols {
            ospf {
                database-protection {
                    maximum-lsa 2560;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      3. Router 2 - SRX100H in Packet Mode
      4. Same as all the other Junos boxes that support this. 1GB of RAM *5 = 5120 LSA max.

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 5120;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      5. Router 3 - Cisco 3640
      6. Most versions of IOS 12.0 or greater support similar command and logic for LSA suppression as Junos does. ( I think IOS came first actually.). To enable LSDB protection on IOS, it's is done with the max-lsa command under the appropirate OSPF router process. The full command is laid out as follows: max-lsa <maximum number of non self-genrated LSAs> <warning threshold> ignore-time <minutes> reset-time <minutes> ignore-count <integer><

        Let's do another quick test to see how IOS behaves with the following config applied to our 3640:

        router ospf 1
         max-lsa 640 ignore-time 1 reset-time 3 ignore-count 3
        

        Something to watch out for, applying this command caused our OSPF process to restart! This did not happen with Junos.

        C3640-1(config-router)#$ 75 ignore-count 3 ignore-time 1 reset-time 3
        C3640-1(config-router)#
        Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.1 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detachede
        Jan 16 22:11:49.628: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:11:49.632: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:11:49.740: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
        Jan 16 22:11:49.740: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from LOADING to FULL, Loading Donexit
        C3640-1(config)#exit
        C3640-1#
        Jan 16 22:11:55.016: %SYS-5-CONFIG_I: Configured from console by console
        Jan 16 22:11:57.836: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
        

        And we can see that our protection has been enabled by examining the OSPF routing process, indicated by "Maximum number of non self-generated LSA allowed"

        C3640-1#sh ip ospf 1
         Routing Process "ospf 1" with ID 2.2.2.3
         Start time: 00:00:23.396, Time elapsed: 00:18:55.816
         Supports only single TOS(TOS0) routes
         Supports opaque LSA
         Supports Link-local Signaling (LLS)
         Supports area transit capability
         Maximum number of non self-generated LSA allowed 640
            Threshold for warning message 75%
            Ignore-time 1 minutes, reset-time 3 minutes
            Ignore-count allowed 3, current ignore-count 0
         It is an autonomous system boundary router
         Redistributing External Routes from,
            connected with metric mapped to 100, includes subnets in redistribution
         Router is not originating router-LSAs with maximum metric
         Initial SPF schedule delay 5000 msecs
         Minimum hold time between two consecutive SPFs 10000 msecs
         Maximum wait time between two consecutive SPFs 10000 msecs
         Incremental-SPF disabled
         Minimum LSA interval 5 secs
         Minimum LSA arrival 1000 msecs
         LSA group pacing timer 240 secs
         Interface flood pacing timer 33 msecs
         Retransmission pacing timer 66 msecs
         Number of external LSA 16. Checksum Sum 0x17ABDA
         Number of opaque AS LSA 0. Checksum Sum 0x000000
         Number of DCbitless external and opaque AS LSA 6
         Number of DoNotAge external and opaque AS LSA 0
         Number of areas in this router is 1. 1 normal 0 stub 0 nssa
         Number of areas transit capable is 0
         External flood list length 0
            Area BACKBONE(0)
        	Number of interfaces in this area is 6 (1 loopback)
        	Area has no authentication
        	SPF algorithm last executed 00:11:10.608 ago
        	SPF algorithm executed 2 times
        	Area ranges are
        	Number of LSA 28. Checksum Sum 0x3D99A7
        	Number of opaque link LSA 0. Checksum Sum 0x000000
        	Number of DCbitless LSA 15
        	Number of indication LSA 0
        	Number of DoNotAge LSA 0
        	Flood list length 0
        
        C3640-1#
        

        After flooding another 5000 LSAs into our network, the 3640 complained to the console with the following message:

        Jan 16 22:27:26.388: %OSPF-4-OSPF_MAX_LSA_THR: Threshold for maximum number of non self-generated LSA has been reached "ospf 1" - 480 LSAs
        Jan 16 22:27:26.484: %OSPF-4-OSPF_MAX_LSA: Maximum number of non self-generated LSA has been exceeded "ospf 1" - 641 LSAs
        

        And checking our OSPF process status we see the router is Ignoring all neighbors:

        C3640-1#sh ip ospf 1 Routing Process "ospf 1" with ID 2.2.2.3 Start time: 00:00:23.396, Time elapsed: 00:25:04.932 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Maximum number of non self-generated LSA allowed 640 Threshold for warning message 75% Ignore-time 1 minutes, reset-time 3 minutes Ignore-count allowed 3, current ignore-count 1 Ignoring all neighbors due to max-lsa limit, time remaining: 00:00:03 It is an autonomous system boundary router Redistributing External Routes from, connected with metric mapped to 100, includes subnets in redistribution Router is not originating router-LSAs with maximum metric Initial SPF schedule delay 5000 msecs Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 1. Checksum Sum 0x26C0532 Number of opaque AS LSA 0. Checksum Sum 0x000000 Number of DCbitless external and opaque AS LSA 0 Number of DoNotAge external and opaque AS LSA 0 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Number of areas transit capable is 0 External flood list length 0 Area BACKBONE(0) (Inactive) Number of interfaces in this area is 6 (1 loopback) Area has no authentication SPF algorithm last executed 00:00:52.276 ago SPF algorithm executed 1 times Area ranges are Number of LSA 1. Checksum Sum 0x3D2054 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 C3640-1#

        It actually took a minute for the 3640 to drop it's neighbors after the threshold was reached. Junos was instantaneous. Not shown, but before the neighbors actually dropped, the 3540 had actually loaded up more LSAs than the max specified. This one had Number of external LSA 921. Checksum Sum 0x42D1AF9 in its LSDB.

        Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from FULL to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.1 on FastEthernet0/0.2 from INIT to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:28:26.565: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.6 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:28:26.569: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from FULL to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:28:26.621: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from 2WAY to DOWN, Neighbor Down: Interface down or detached0    1.1.1.3         18          0x80000002 0x0069F8 0
        

        Letting the reset timer expire put's us back at the initial state, and our neighbors come back up, and we're back at the beginning again. Let's blow the ignore counter and see what happens:

        C3640-1#sh ip ospf 1
         Routing Process "ospf 1" with ID 2.2.2.3
         Start time: 00:00:23.396, Time elapsed: 00:50:41.748
         Supports only single TOS(TOS0) routes
         Supports opaque LSA
         Supports Link-local Signaling (LLS)
         Supports area transit capability
         Maximum number of non self-generated LSA allowed 640
            Threshold for warning message 75%
            Ignore-time 1 minutes, reset-time 3 minutes
            Ignore-count allowed 3, current ignore-count 4
            Permanently ignoring all neighbors due to max-lsa limit
         It is an autonomous system boundary router
         Redistributing External Routes from,
            connected with metric mapped to 100, includes subnets in redistribution
         Router is not originating router-LSAs with maximum metric
         Initial SPF schedule delay 5000 msecs
         Minimum hold time between two consecutive SPFs 10000 msecs
         Maximum wait time between two consecutive SPFs 10000 msecs
         Incremental-SPF disabled
         Minimum LSA interval 5 secs
         Minimum LSA arrival 1000 msecs
         LSA group pacing timer 240 secs
         Interface flood pacing timer 33 msecs
         Retransmission pacing timer 66 msecs
         Number of external LSA 1. Checksum Sum 0x2A5CDFC5
         Number of opaque AS LSA 0. Checksum Sum 0x000000
         Number of DCbitless external and opaque AS LSA 0
         Number of DoNotAge external and opaque AS LSA 0
         Number of areas in this router is 1. 1 normal 0 stub 0 nssa
         Number of areas transit capable is 0
         External flood list length 0
            Area BACKBONE(0) (Inactive)
        	Number of interfaces in this area is 6 (1 loopback)
        	Area has no authentication
        	SPF algorithm last executed 00:02:26.144 ago
        	SPF algorithm executed 1 times
        	Area ranges are
        	Number of LSA 1. Checksum Sum 0xA0651E
        	Number of opaque link LSA 0. Checksum Sum 0x000000
        	Number of DCbitless LSA 0
        	Number of indication LSA 0
        	Number of DoNotAge LSA 0
        	Flood list length 0
        
        C3640-1#
        

        Note the word permanent! This router has given up on our LSA stuffed network. In order to get IOS to come back from isolation, the OSPF process needs to be restarted:

        C3640-1#clear ip ospf 1 process 
        Reset OSPF process? [no]: yes
        C3640-1#
        Jan 16 22:56:55.269: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:56:55.269: %OSPF-5-ADJCHG: Process 1, Nbr 0.0.0.0 on FastEthernet0/0.33 from DOWN to DOWN, Neighbor Down: Interface down or detached
        Jan 16 22:56:55.357: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.3 on FastEthernet0/0.33 from LOADING to FULL, Loading Done
        Jan 16 22:56:59.681: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.7 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
        Jan 16 22:57:03.025: %OSPF-5-ADJCHG: Process 1, Nbr 3.0.0.10 on FastEthernet0/0.2 from LOADING to FULL, Loading Done
        

        So four the next run, we'll use the following parameters on the 3640:

        OSPF Database Protection Settings to be Used for the Next Test Run
        router ospf 1
         max-lsa 640 75 ignore-time 30 reset-time 60 ignore-count 10
        
        Note: The warning threshold of 75% and the reset-time of 60 are the defaults, so they aren't actually visible in the config
      7. Router 4 - SRX210HE
      8. Exactly the same as C2R2

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 2560;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      9. Router 5 - NS208
      10. ScreenOS uses a command under the OSPF protocol of lsa-threshold <time duration> <max number of LSAs>

        to limit both the time over which a number of LSAs are recieved, and the maximum number of LSAs recieved. This doesn't discern between type of LSA, Type 1, Type 4, Type 10; they're all treated the same. We'll set the NS208 to accept 640 LSAs over an hour period.
        set vrouter OSPF protocol ospf lsa-threshold 3600 640
         

        Note, this only rate-limits the flooding. It doesn't seem to limit the database size. This may slow down the NS208 from imploding, but I don't think it will stop it.

        ns208-> get vrouter OSPF protocol ospf          
        VR: OSPF RouterId: 2.2.2.5
        ----------------------------------
        Status:					enabled
        State:					autonomous system boundary router
        Auto-Vlink creation:			disabled
        Number of areas:			1
        Number of external LSA(s):		5042
        External LSAs with DNA:			0
        Advertising default-route lsa:		disabled
        Default-route learnt by ospf:		will be added to the routing table
        RFC 1583 compatibility:			disabled
        Hello packet flooding protection:	disabled
        LSA flooding protection:		enabled (threshold 640 packets per 3600 second(s))
        Maximum Retransmit limit:		For nbrs on demand-circuits 12
        					For nbrs on non-demand-circuits 24
        Area 0.0.0.0 
        	Total number of interfaces is 6, Active number of interfaces is 6
        	Intra-SPF algorithm executed 22 times
        	Last Intra-SPF executed before 00:00:32
        	Number of LSA(s) is 22
        
        Inter-SPF algorithm executed: 22 times
        Last Inter-SPF executed before 00:02:26
        Extern-SPF algorithm executed: 13 times
        Last Extern-SPF executed before 00:02:26
        SPF Aborted: 2 times
        ns208-> 
        
      11. Router 6 - Cisco 2811
      12. Same command as the Cisco 3640, but adjusted to account for having twice the amount of memory. Applying this command didn't seem to cause the OSPF process to restart, but removig it did.

        router ospf 1
         max-lsa 1280 75 ignore-time 30 reset-time 60 ignore-count 10
        
      13. Router 7 - RB750G
      14. It does not appear the RouterOS supports this functionality in software.

    3. Cluster 3

      1. Router 1 - SRX100B
      2. Same as all the other Junos boxes that support this. 512MB of RAM *5 = 2560 LSA max.

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 2560;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      3. Router 2 - J2300
      4. Junos doesn't support this until version 10.2

      5. Router 3 - Cisco 3750
      6. Same command as the C3640, except applying it didn't seem to restart our OSPF process.

        router ospf 1
         max-lsa 640 75 ignore-time 30 reset-time 60 ignore-count 10
        
      7. Router 4 - SRX100H
      8. Exactly the same as C2R2

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 5120;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      9. Router 4 - SRX210HE
      10. Exactly the same as C2R2

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 5120;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      11. Router 5 - EX2200C
      12. Same as the othe Junos devices with 512MB.

        protocols {
            ospf {
                database-protection {
                    maximum-lsa 2560;
                    warning-threshold 75;
                    ignore-count 10;
                    ignore-time 1800;
                    reset-time 3600;
                }
            }
        }
        
      13. Router 6 - RB133
      14. Same as the other RouterBoard. Doesn't appear to support this functionality.

      15. Router 7 - Cisco 1760
      16. Same command as the other IOS boxes. Applying it does cause the OSPF process to pop on this platform.

        router ospf 1
         max-lsa 480 75 ignore-time 30 reset-time 60 ignore-count 10
        

Run 3: The Internet Table Dump with Protection

This is a repeat of test 2, but the with LSA protection applied to the routers above. The protection definately hasn't been applied to every router on the list, notably Cluster 1 doesn't have a single router that supports any of it. However, all of the switches, Cisco boxes and SRXs have some sort of database protection applied. In addition, the Netscreen has LSA rate-limiting applied.

First, we'll load up our 2GB Olive with 500K of BGP prefixes

juniper@Olive2GB# run show bgp summary    
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0            500000     500000          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
172.20.10.117         65069     500188        232       0     245    11:25:14 500000/500000/500000/0 0/0/0/0

[edit]
juniper@Olive2GB# 

A quick check on or network with Nagios reveals everything is once again nice and green.

Nagios: Green, before the export of 500k BGP routes into OSPF.
Nagios - Once Again, Green Happy network!

And off we go...

juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF 

[edit]
juniper@Olive2GB# commit 

Almost immediately, every router that had some sort of database protection applied starts to complain that it's LSAs: The EX3200, EX2200C, SRX100B, SRX100H in packet mode, SRX100H in flow mode, SRX210HE, 3640, 1760, 2811 and finally the 3750.

The NS208, which doesn't have database protection, but does throttle LSA advertisements sort of froze when it reached a certain level. Checking the OSPF process status repeately revealed the same number of LSAs in it's database. Note it has also aborted SPF calculations several times already.

ns208-> get vrouter OSPF protocol ospf 
VR: OSPF RouterId: 2.2.2.5
----------------------------------
Status:					enabled
State:					autonomous system boundary router
Auto-Vlink creation:			disabled
Number of areas:			1
Number of external LSA(s):		13266
External LSAs with DNA:			0
Advertising default-route lsa:		disabled
Default-route learnt by ospf:		will be added to the routing table
RFC 1583 compatibility:			disabled
Hello packet flooding protection:	disabled
LSA flooding protection:		enabled (threshold 640 packets per 3600 second(s))
Maximum Retransmit limit:		For nbrs on demand-circuits 12
					For nbrs on non-demand-circuits 24
Area 0.0.0.0 
	Total number of interfaces is 6, Active number of interfaces is 6
	Intra-SPF algorithm executed 179 times
	Last Intra-SPF executed before 00:02:21
	Number of LSA(s) is 29

Inter-SPF algorithm executed: 179 times
Last Inter-SPF executed before 00:02:21
Extern-SPF algorithm executed: 125 times
Last Extern-SPF executed before 00:02:21
SPF Aborted: 18 times
ns208-> 

OpenOSPFd and Quagga both blew their memory bounds, deciding not to participate in the test any longer. XORP gave up and quit before any one could even say "Too many LSAs", and both of the RouterBoard boxes looked like they quit too, with the RB133 console port not budging, and the RB750G closing my telnet connection.

The 1GB boxes that survived the inital flood really loaded up thier LSDBs significantly faster, with Vyataa, BIRD, the 1GB Olive and the J2300 stuffed full of 500,000+ LSAs in a few minutes. On test run 2, with a lot of boxes thrashing about and rebooting it took hours for the J2300 to learn all of the external advertisements. After only 15 minutes, it looked like the network was really at more or less a converged state.

Triage

After about 15 minutes, things seemed to be already at a steady state. The Nagios map had settled down without change to look like this:

Nagios - Steady State after 15 min

Each participants status:

  1. Cluster 1

    1. Router 1 - Vyatta
    2. Fully loaded LSDB.

    3. Router 2 - OpenOSPFd
    4. The ospfd process terminated. The box still responded to pings, but did not attempt to participate in OSPF again.

    5. Router 3 - Olive 2GB
    6. The route injector was happily keeping it's 500K routes it injected up to date

    7. Router 4 - BIRD
    8. Fully loaded LSDB.

    9. Router 5 - Quagga
    10. The ospfd process for quagga was dead. The OS still responded to pings.

    11. Router 6 - XORP
    12. No xorp processes were left running, all had terminated. The OS still allowed the box to be pinged.

    13. Router 7 - 1GB Olive
    14. The 1GB Olive was running steady.

  2. Cluster 2

    1. Router 1 - EX3200
    2. The EX3200 was igoring LSAs.

    3. Router 2 - SRX100H in Packet Mode
    4. Ignoring LSAs.

    5. Router 3 - Cisco 3640
    6. Ignoring LSAs

    7. Router 4 - SRX210HE
    8. Ignoring LSAs.

    9. Router 5 - NS208
    10. After surving for a longer peroid of time, the NS208 started to go into a reboot loop. It took a much longer time for it to overload and burst it's memory seams than before.

    11. Router 6 - Cisco 2811
    12. Ignoring LSAs.

    13. Router 7 - RB750G
    14. The system responded to pings, but was otherwise completely dead. It did not particpate in routing, and would not offer a login prompt. It had closed the telnet connection I had open to it.

  3. Cluster 3

    1. Router 1 - SRX100B
    2. The SRX100B was ignoring LSAs.

    3. Router 2 - J2300
    4. Fully loaded LSDB.

    5. Router 3 - Cisco 3750
    6. Ignoring LSAs.

    7. Router 4 - SRX100H
    8. Ignoring LSAs.

    9. Router 5 - EX220C
    10. Ignoring LSAs.

    11. Router 6 - RB133
    12. This box appeared to be completely dead. No response to pings, no action on the console port.

    13. Router 7 - Cisco 1760
    14. Ignoring LSAs. It also started to lodge a complaint about the OSPFd process hogging all of the CPUs clock cycles:.

      Jan 20 09:28:41.466: %SYS-3-CPUHOG: Task is running for (2003)msecs, more than (2000)msecs (4/1),process = OSPF Router 1.
      -Traceback= 8142DAD0 813FFFDC 80561688 80565EFC 
      

Stopping the madness...

Since things already seemed pretty stable, and every full OSPF participant was fat full of all of the half a million LSAs, I decided to let the network cook in for another 15 minutes and stop the redistribution only a half an hour after it began. I programmed up a timed commit on the 2GB Olive to withdraw it's LSAs precisely at 10:30.

juniper@Olive2GB# delete protocols ospf export EXPORT-BGP-TO-OSPF 

[edit]
juniper@Olive2GB# commit at 10:30 
configuration check succeeds
commit at will be executed at 2013-01-20 10:30:00 UTC
The configuration has been changed but not committed
Exiting configuration mode

juniper@Olive2GB> 

Right on queue, the 2GB Olive expired all of it's LSAs that fell in the 13/8 range.

juniper@Olive2GB> show ospf database    

    OSPF database, Area 0.0.0.0
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
Router   1.1.1.1          1.1.1.1          0x8000016b   341  0x2  0xcd96  96
Router   1.1.1.2          1.1.1.2          0x800000e7  1984  0x2  0x380c  96
Router  *1.1.1.3          1.1.1.3          0x800000d7    82  0x22 0x3075  96
Router   1.1.1.4          1.1.1.4          0x800000d9   875  0x2  0x5d0a  96
Router   1.1.1.5          1.1.1.5          0x80000108  1889  0x2  0xf45b 108
Router   1.1.1.6          1.1.1.6          0x80000028    86  0x2  0x1ce7  96
Router   1.1.1.7          1.1.1.7          0x800000c1    65  0x22 0x73ae  96
Router   2.2.2.4          2.2.2.4          0x80000026   330  0x22 0x9862  96
Router   2.2.2.5          2.2.2.5          0x80000045   173  0x22 0x3c92  96
Router   2.2.2.7          2.2.2.7          0x80000271   469  0x2  0xd0c3  96
Router   3.0.0.10         3.0.0.10         0x8000014a  1587  0x2  0x902a  60
Router   3.3.3.1          3.3.3.1          0x8000002e   347  0x22 0xfedd  96
Router   3.3.3.2          3.3.3.2          0x80000020   212  0x22 0x4d21  96
Router   3.3.3.4          3.3.3.4          0x8000002d   340  0x22 0x18d7  96
Router   3.3.3.6          3.3.3.6          0x8000019d   262  0x2  0x3f2b  96
Network  1.0.0.4          1.1.1.4          0x800000f3    13  0x2  0x21f8  48
Network  1.1.2.2          1.1.1.2          0x800000dc   931  0x2  0xc4a5  32
Network  1.1.7.1          1.1.1.1          0x800000b2   433  0x2  0x3854  32
Network  1.2.3.2          1.1.1.2          0x800000b2  1478  0x2  0x167a  32
Network *1.3.4.3          1.1.1.3          0x80000087  2021  0x22 0x7f16  32
Network  2.0.0.4          2.2.2.4          0x80000001   336  0x22 0xd146  32
Network  2.0.0.5          2.2.2.5          0x80000003   215  0x22 0xff0f  32
Network  3.0.0.2          3.3.3.2          0x80000001   212  0x22 0x1af7  32
Network  11.1.1.31        3.3.3.1          0x80000011   343  0x22 0xfbe6  32
Network  22.2.2.21        1.1.1.2          0x800000a8  2135  0x2  0x8ae3  32
OpaqArea 1.0.0.0          3.3.3.6          0x8000017b  3600  0x42 0x7402  28
    OSPF AS SCOPE link state database
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len 
Extern   1.1.0.128        1.1.1.1          0x800000f9  1566  0x2  0x803e  36
Extern   1.2.0.128        1.1.1.2          0x800000dc  2259  0x0  0x66a5  36
Extern  *1.3.0.128        1.1.1.3          0x80000095  2021  0x22 0x7a98  36
Extern   1.4.0.128        1.1.1.4          0x800000dc   169  0x2  0xbcf5  36
Extern   1.5.0.128        1.1.1.5          0x800000e6  2085  0x2  0x81f7  36
Extern   1.6.0.128        1.1.1.6          0x80000017   464  0x2  0xf38   36
Extern   1.7.0.128        1.1.1.7          0x80000070  1014  0x22 0x6863  36
Extern   2.1.0.128        2.2.2.1          0x80000010   849  0x22 0x8414  36
Extern   2.2.0.128        2.2.2.2          0x80000010  3102  0x22 0x5ed3  36
Extern   2.4.0.128        2.2.2.4          0x80000010   826  0x22 0x3af3  36
Extern   2.5.0.128        2.2.2.5          0x8000001c   343  0x22 0x8c14  36
Extern   2.7.0.128        2.2.2.7          0x80000189   468  0x2  0xefdd  36
Extern   3.1.0.128        3.3.3.1          0x80000010  2867  0x22 0x4be4  36
Extern   3.2.0.128        3.3.3.2          0x8000000f  1106  0x22 0x3bf3  36
Extern   3.4.0.128        3.3.3.4          0x80000010  2208  0x22 0x1515  36
Extern   3.5.0.128        3.3.3.5          0x80000010  3205  0x22 0x325   36
Extern   3.6.0.128        3.3.3.6          0x8000017c   265  0x2  0xf6e1  36
Extern   3.7.0.128        3.3.3.7          0x80000018   276  0x20 0xec31  36
Extern  *13.0.14.192      1.1.1.3          0x80000001  3600  0x22 0x7b30  36
Extern  *13.0.14.255      1.1.1.3          0x80000001  3600  0x22 0x7eae  36
Extern  *13.0.15.0        1.1.1.3          0x80000001  3600  0x22 0x73b8  36
Extern  *13.0.15.63       1.1.1.3          0x80000001  3600  0x22 0x7fac  36
Extern  *13.0.15.64       1.1.1.3          0x80000001  3600  0x22 0x75b5  36
Extern  *13.0.15.127      1.1.1.3          0x80000001  3600  0x22 0x7bb0  36
Extern  *13.0.15.128      1.1.1.3          0x80000001  3600  0x22 0x71b9  36
Extern  *13.0.15.191      1.1.1.3          0x80000001  3600  0x22 0x7a31  36
Extern  *13.0.15.192      1.1.1.3          0x80000001  3600  0x22 0x703a  36
Extern  *13.0.16.0        1.1.1.3          0x80000001  3600  0x22 0x45ec  36
Extern  *13.0.16.127      1.1.1.3          0x80000001  3600  0x22 0x70ba  36
Extern  *13.0.16.255      1.1.1.3          0x80000001  3600  0x22 0x68c2  36
Extern  *13.0.17.191      1.1.1.3          0x80000001  3600  0x22 0x6445  36
Extern  *13.0.17.192      1.1.1.3          0x80000001  3600  0x22 0x5a4e  36
                    
juniper@Olive2GB> 

The other routers began to expire and pull LSAs from their LSDBs at a massive rate, much, much quicker than previously with the low memory routers stirring up network chaos. A mere 5 minutes later, hundreds of thousand of LSAs were already missing from the other full participants. Then at 10:36, the Quagga OSPF daemon on the Nagios box blew it's memory bounds, turning the hosts page into a sea of red.

At 10:39, the 2GB Olive had most of the LSAs cleared from it's LSDB. However, there was still some good network churn going on as there were anywhere from 400 to 800 external LSAs in the LSDB. Below are three checks at about 1 second intervals. Note the varied number of external LSAs.

juniper@Olive2GB> show ospf database summary    
Area 0.0.0.0:
   15 Router LSAs
   10 Network LSAs
Externals:
   394 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@Olive2GB> show ospf database summary    
Area 0.0.0.0:
   15 Router LSAs
   10 Network LSAs
Externals:
   426 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@Olive2GB> show ospf database summary    
Area 0.0.0.0:
   15 Router LSAs
   10 Network LSAs
Externals:
   449 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

juniper@Olive2GB> 

This external LSA variance went on until 32 minutes after it started to withdraw it's external LSAs.

At 11:02, the Vyatta box was already down to 122894 externals

vyatta@vyatta:~$ sh ip ospf
 OSPF Routing Process, Router ID: 1.1.1.1
 Supports only single TOS (TOS0) routes
 This implementation conforms to RFC2328
 RFC1583Compatibility flag is disabled
 OpaqueCapability flag is disabled
 Initial SPF scheduling delay 200 millisec(s)
 Minimum hold time between consecutive SPFs 1000 millisec(s)
 Maximum hold time between consecutive SPFs 10000 millisec(s)
 Hold time multiplier is currently 1
 SPF algorithm last executed 13.519s ago
 SPF timer is inactive
 Refresh timer 10 secs
 This router is an ASBR (injecting external routing information)
 Number of external LSA 122894. Checksum Sum 0xf1bdda12
 Number of opaque AS LSA 0. Checksum Sum 0x00000000
 Number of areas attached to this router: 1

 Area ID: 0.0.0.0 (Backbone)
   Number of interfaces in this area: Total: 6, Active: 6
   Number of fully adjacent neighbors in this area: 3
   Area has no authentication
   SPF algorithm executed 904 times
   Number of LSA 30
:

And BIRD was totally clear.

bird> show ospf 
OSPFol:
RFC1583 compatibility: disabled
RT scheduler tick: 2
Number of areas: 1
Number of LSAs in DB:	46
	Area: 0.0.0.0 (0) [BACKBONE]
		Stub:	No
		NSSA:	No
		Transit:	No
		Number of interfaces:	6
		Number of neighbors:	7
		Number of adjacent neighbors:	4
bird> 

The 1GB Olive still had more than 300K of LSAs in it's memory, but it had already learned that they were all past their lifetime:

Olive1GBjuniper@Olive1GB> show ospf database | except "  3600  0x22 " | match ^External | count           
Count: 0 lines

juniper@Olive1GB> 

And the J2300 had already expired more than half of them:

juniper@J2300-7> show ospf dataabase | except "  3600  0x22 " | match ^Extern | count    
Count: 237801 lines

At 11:20, the RB750G let me back into it with a telnet session, and was already neigbhored up with everyone who was still playing the OSPF game. Vyatta began it's massive spew of MaxLSA messages to the syslog server. It continued this for another six minutes, then decided to take a break. Once it was done, it had a free and clean LSDB:

vyatta@vyatta:~$ sh ip ospf
 OSPF Routing Process, Router ID: 1.1.1.1
 Supports only single TOS (TOS0) routes
 This implementation conforms to RFC2328
 RFC1583Compatibility flag is disabled
 OpaqueCapability flag is disabled
 Initial SPF scheduling delay 200 millisec(s)
 Minimum hold time between consecutive SPFs 1000 millisec(s)
 Maximum hold time between consecutive SPFs 10000 millisec(s)
 Hold time multiplier is currently 1
 SPF algorithm last executed 3.534s ago
 SPF timer is inactive
 Refresh timer 10 secs
 This router is an ASBR (injecting external routing information)
 Number of external LSA 16. Checksum Sum 0x0006cd4d
 Number of opaque AS LSA 0. Checksum Sum 0x00000000
 Number of areas attached to this router: 1

 Area ID: 0.0.0.0 (Backbone)
   Number of interfaces in this area: Total: 6, Active: 6
   Number of fully adjacent neighbors in this area: 3
   Area has no authentication
   SPF algorithm executed 966 times
   Number of LSA 28
   Number of router LSA 19. Checksum Sum 0x000862be
   Number of network LSA 9. Checksum Sum 0x0005a665
   Number of summary LSA 0. Checksum Sum 0x00000000
   Number of ASBR summary LSA 0. Checksum Sum 0x00000000
   Number of NSSA LSA 0. Checksum Sum 0x00000000
   Number of opaque link LSA 0. Checksum Sum 0x00000000
   Number of opaque area LSA 0. Checksum Sum 0x00000000

vyatta@vyatta:~$ 

All of the boxes with OSPF database protection were still ignoring the network, as exampled by the SRX100H

juniper@SRX100-6_OSPF> show ospf overview    
Instance: master
  Router ID: 2.2.2.2
  Route table index: 0
  AS boundary router
  LSA refresh time: 50 minutes
  Database protection state: Ignore (719 seconds remaining)
    Warning threshold: 75 percent
    Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120
    Ignore time: 1800, Reset time: 3600
    Ignore count: Current 3, Allowed 10
  Area: 0.0.0.0
    Stub type: Not Stub
    Authentication Type: None
    Area border routers: 0, AS boundary routers: 0
    Neighbors
      Up (in full state): 0
  Topology: default (ID 0)
    Prefix export count: 1
    Full SPF runs: 178
    SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
    Backup SPF: Not Needed

juniper@SRX100-6_OSPF> 

Right before the hour mark, where all of the withdrawn LSAs should have expired whether or not they had been explicitly withdrawn I restarted the Quagga daemons on the Nagios box so it could see what was going on in the network.

A full hour after the slew of prefixes in the 13/8 were pulled from the network by the 2GB Olive, I checked the state of each router again:

  1. Cluster 1

    1. Router 1 - Vyatta
    2. The Vyatta box had purged itself of all of the LSAs from the "accident." However it stared spewing another round of MaxAge LSAs to the syslog server again.

    3. Router 2 - OpenOSPFd
    4. Still dead.

    5. Router 3 - Olive 2GB
    6. Had expired and purged all of the exported LSAs.

      juniper@Olive2GB> show ospf database summary    
      Area 0.0.0.0:
         17 Router LSAs
         11 Network LSAs
         1 OpaqArea LSAs
      Externals:
         19 Extern LSAs
      Interface em1.1:
      Area 0.0.0.0:
      Interface em1.123:
      Area 0.0.0.0:
      Interface em1.134:
      Area 0.0.0.0:
      Interface em1.33:
      Area 0.0.0.0:
      Interface em2.0:
      Area 0.0.0.0:
      Interface lo0.0:
      Area 0.0.0.0:
      
      juniper@Olive2GB> 
      
    7. Router 4 - BIRD
    8. Still happy!

    9. Router 5 - Quagga
    10. Still a dead parrot.

    11. Router 6 - XORP
    12. The box was RIP, but this RIP had nothing to do with Bellman-Ford alogorythms.

    13. Router 7 - 1GB Olive
    14. Purged all expired LSAs.

      juniper@Olive1GB> show ospf database summary                                
      Area 0.0.0.0:
         17 Router LSAs
         11 Network LSAs
         1 OpaqArea LSAs
      Externals:
         19 Extern LSAs
      Interface em1.1:
      Area 0.0.0.0:
      Interface em1.117:
      Area 0.0.0.0:
      Interface em1.167:
      Area 0.0.0.0:
      Interface em1.77:
      Area 0.0.0.0:
      Interface em2.0:
      Area 0.0.0.0:
      Interface lo0.0:
      Area 0.0.0.0:
      
      juniper@Olive1GB> 
      
  2. Cluster 2

    1. Router 1 - EX3200
    2. Ignoring LSAs.

      juniper@EX3200-2_OSPF> show ospf overview 
      Instance: master
        Router ID: 2.2.2.1
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        Database protection state: Ignore (1625 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 4, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 0
          Neighbors
            Up (in full state): 0
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 172
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      juniper@EX3200-2_OSPF> 
      
    3. Router 2 - SRX100H in Packet Mode
    4. Ignoring LSAs.

      juniper@SRX100-6_OSPF> show ospf overview    
      Instance: master
        Router ID: 2.2.2.2
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        Database protection state: Ignore (1570 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 4, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 0
          Neighbors
            Up (in full state): 0
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 193
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      juniper@SRX100-6_OSPF> 
      
    5. Router 3 - Cisco 3640
    6. Ignoring LSAs

      C3640-1#show ip ospf 1
       Routing Process "ospf 1" with ID 2.2.2.3
       Start time: 00:00:23.404, Time elapsed: 14:00:16.232
       Supports only single TOS(TOS0) routes
       Supports opaque LSA
       Supports Link-local Signaling (LLS)
       Supports area transit capability
       Maximum number of non self-generated LSA allowed 640
          Threshold for warning message 75%
          Ignore-time 30 minutes, reset-time 60 minutes
          Ignore-count allowed 10, current ignore-count 2 - time remaining: 00:24:40
       It is an autonomous system boundary router
       Redistributing External Routes from,
          connected with metric mapped to 100, includes subnets in redistribution
       Router is not originating router-LSAs with maximum metric
       Initial SPF schedule delay 5000 msecs
       Minimum hold time between two consecutive SPFs 10000 msecs
       Maximum wait time between two consecutive SPFs 10000 msecs
       Incremental-SPF disabled
       Minimum LSA interval 5 secs
       Minimum LSA arrival 1000 msecs
       LSA group pacing timer 240 secs
       Interface flood pacing timer 33 msecs
       Retransmission pacing timer 66 msecs
       Number of external LSA 19. Checksum Sum 0x3ACC949
       Number of opaque AS LSA 0. Checksum Sum 0x000000
       Number of DCbitless external and opaque AS LSA 5
       Number of DoNotAge external and opaque AS LSA 0
       Number of areas in this router is 1. 1 normal 0 stub 0 nssa
       Number of areas transit capable is 0
       External flood list length 0
          Area BACKBONE(0)
      	Number of interfaces in this area is 6 (1 loopback)
      	Area has no authentication
      	SPF algorithm last executed 00:01:20.192 ago
      	SPF algorithm executed 68 times
      	Area ranges are
      	Number of LSA 29. Checksum Sum 0x3C9020
      	Number of opaque link LSA 0. Checksum Sum 0x000000
      	Number of DCbitless LSA 10
      	Number of indication LSA 0
      	Number of DoNotAge LSA 0
      	Flood list length 0
      
      C3640-1#
      
    7. Router 4 - SRX210HE
    8. Ignoring LSAs.

      juniper@SRX210HE_OSPF> show ospf overview    
      Instance: master
        Router ID: 2.2.2.4
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        DoNotAge uncapable
          AS scope LSAs received with no DC bit: 5
          Area scope LSAs received with no DC bit: 10
        Database protection state: Retry (3145 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 45, Warning 3840, Allowed 5120
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 3, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 11
          Neighbors
            Up (in full state): 5
          DoNotAge uncapable
            Area scope LSAs received with no DC bit: 10
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 177                  
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      juniper@SRX210HE_OSPF> 
      
    9. Router 5 - NS208
    10. Hadn't rebooted in more than 30 minutes -- a new record! And it was back in the network.

      ns208-> get vrouter OSPF protocol ospf
      VR: OSPF RouterId: 2.2.2.5
      ----------------------------------
      Status:					enabled
      State:					autonomous system boundary router
      Auto-Vlink creation:			disabled
      Number of areas:			1
      Number of external LSA(s):		19
      External LSAs with DNA:			0
      Advertising default-route lsa:		disabled
      Default-route learnt by ospf:		will be added to the routing table
      RFC 1583 compatibility:			disabled
      Hello packet flooding protection:	disabled
      LSA flooding protection:		enabled (threshold 640 packets per 3600 second(s))
      Maximum Retransmit limit:		For nbrs on demand-circuits 12
      					For nbrs on non-demand-circuits 24
      Area 0.0.0.0 
      	Total number of interfaces is 6, Active number of interfaces is 6
      	Intra-SPF algorithm executed 153 times
      	Last Intra-SPF executed before 00:01:04
      	Number of LSA(s) is 28
      
      Inter-SPF algorithm executed: 153 times
      Last Inter-SPF executed before 00:01:04
      Extern-SPF algorithm executed: 201 times
      Last Extern-SPF executed before 00:01:04
      SPF Aborted: 22 times
      ns208-> 
      

      Note the number of times the SPF algorythm was aborted on the NS208.

    11. Router 6 - Cisco 2811
    12. Ignoring LSAs.

      C2811-1#sh ip ospf 1
       Routing Process "ospf 1" with ID 2.2.2.6
       Start time: 00:00:41.716, Time elapsed: 13:44:54.176
       Supports only single TOS(TOS0) routes
       Supports opaque LSA
       Supports Link-local Signaling (LLS)
       Supports area transit capability
       Maximum number of non self-generated LSA allowed 1280
          Current number of non self-generated LSA 46
          Threshold for warning message 75%
          Ignore-time 30 minutes, reset-time 60 minutes
          Ignore-count allowed 10, current ignore-count 2 - time remaining: 00:22:08
       Event-log enabled, Maximum number of events: 1000, Mode: cyclic
       It is an autonomous system boundary router
       Redistributing External Routes from,
          connected, includes subnets in redistribution
       Router is not originating router-LSAs with maximum metric
       Initial SPF schedule delay 5000 msecs
       Minimum hold time between two consecutive SPFs 10000 msecs
       Maximum wait time between two consecutive SPFs 10000 msecs
       Incremental-SPF disabled
       Minimum LSA interval 5 secs
       Minimum LSA arrival 1000 msecs
       LSA group pacing timer 240 secs
       Interface flood pacing timer 33 msecs
       Retransmission pacing timer 66 msecs
       Number of external LSA 19. Checksum Sum 0x78607D1
       Number of opaque AS LSA 0. Checksum Sum 0x000000
       Number of DCbitless external and opaque AS LSA 5
       Number of DoNotAge external and opaque AS LSA 0
       Number of areas in this router is 1. 1 normal 0 stub 0 nssa
       Number of areas transit capable is 0
       External flood list length 0
       IETF NSF helper support enabled
       Cisco NSF helper support enabled
       Reference bandwidth unit is 100 mbps
          Area BACKBONE(0.0.0.0)
      	Number of interfaces in this area is 6 (1 loopback)
      	Area has no authentication
      	SPF algorithm last executed 00:01:35.676 ago
      	SPF algorithm executed 73 times
      	Area ranges are
      	Number of LSA 29. Checksum Sum 0x38E832
      	Number of opaque link LSA 0. Checksum Sum 0x000000
      	Number of DCbitless LSA 10
      	Number of indication LSA 0
      	Number of DoNotAge LSA 0
              Flood list length 0
      
      C2811-1#
      
    13. Router 7 - RB750G
    14. Was back in the network!

      [admin@RB750G] /routing ospf lsa> print
      AREA       TYPE         ID             ORIGINATOR     SEQUENCE-NU...        AGE
      backbone   router       1.1.1.1        1.1.1.1            0x80000174        433
      backbone   router       1.1.1.3        1.1.1.3            0x800000DA       2237
      backbone   router       1.1.1.4        1.1.1.4            0x800000DC        440
      backbone   router       1.1.1.6        1.1.1.6            0x80000029       2140
      backbone   router       1.1.1.7        1.1.1.7            0x800000DA        415
      backbone   router       2.2.2.3        2.2.2.3            0x80000009        443
      backbone   router       2.2.2.4        2.2.2.4            0x80000032        440
      backbone   router       2.2.2.5        2.2.2.5            0x8000005C        273
      backbone   router       2.2.2.6        2.2.2.6            0x8000001D        272
      backbone   router       2.2.2.7        2.2.2.7            0x8000027E        294
      backbone   router       3.0.0.10       3.0.0.10           0x8000000D        152
      backbone   router       3.3.3.1        3.3.3.1            0x80000035        614
      backbone   router       3.3.3.2        3.3.3.2            0x8000003B        154
      backbone   router       3.3.3.3        3.3.3.3            0x80000005        206
      backbone   router       3.3.3.4        3.3.3.4            0x8000003E        506
      backbone   router       3.3.3.6        3.3.3.6            0x800001A3        279
      backbone   router       3.3.3.7        3.3.3.7            0x80000005        279
      backbone   network      1.0.0.4        1.1.1.4            0x800000F9        698
      backbone   network      1.1.7.1        1.1.1.1            0x800000B4        701
      backbone   network      1.3.4.3        1.1.1.3            0x80000089       1189
      backbone   network      2.0.0.7        2.2.2.7            0x8000000F        404
      backbone   network      3.0.0.7        3.3.3.7            0x80000003        349
      backbone   network      11.1.1.21      2.2.2.1            0x80000001        447
      backbone   network      11.1.1.31      3.3.3.1            0x80000012        621
      backbone   network      44.4.4.42      2.2.2.4            0x80000002        440
      backbone   network      44.4.4.43      3.3.3.4            0x80000003        509
      backbone   network      55.5.5.53      3.3.3.5            0x80000001        587
      backbone   network      77.7.7.73      3.3.3.7            0x80000002        300
      backbone   opaque-area  1.0.0.0        3.3.3.6            0x8000017E       1802
      external   as-external  1.1.0.128      1.1.1.1            0x800000FD        620
      external   as-external  1.3.0.128      1.1.1.3            0x80000097        487
      external   as-external  1.4.0.128      1.1.1.4            0x800000DE        455
      external   as-external  1.6.0.128      1.1.1.6            0x80000018       2506
      external   as-external  1.7.0.128      1.1.1.7            0x80000072       1239
      external   as-external  2.1.0.128      2.2.2.1            0x80000011       1736
      external   as-external  2.2.0.128      2.2.2.2            0x80000012        991
      external   as-external  2.3.0.128      2.2.2.3            0x80000003        261
      external   as-external  2.4.0.128      2.2.2.4            0x80000011       1712
      external   as-external  2.5.0.128      2.2.2.5            0x8000001E        629
      external   as-external  2.6.0.128      2.2.2.6            0x80000003        165
      external   as-external  2.7.0.128      2.2.2.7            0x8000018C        272
      external   as-external  3.1.0.128      3.3.3.1            0x80000012        758
      external   as-external  3.2.0.128      3.3.3.2            0x80000010       2871
      external   as-external  3.3.0.128      3.3.3.3            0x80000004        207
      external   as-external  3.4.0.128      3.3.3.4            0x80000011       3051
      external   as-external  3.5.0.128      3.3.3.5            0x80000012       1092
      external   as-external  3.6.0.128      3.3.3.6            0x8000017D       2219
      external   as-external  3.7.0.128      3.3.3.7            0x8000001B        408
      
      [admin@RB750G] /routing ospf lsa> 
      
  3. Cluster 3

    1. Router 1 - SRX100B
    2. Ignoring LSAs.

      juniper@SRX100-5_OSPF> show ospf overview    
      Instance: master
        Router ID: 3.3.3.1
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        Database protection state: Ignore (1198 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 4, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 0
          Neighbors
            Up (in full state): 0
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 175
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      juniper@SRX100-5_OSPF> 
      
    3. Router 2 - J2300
    4. Had completely cleaned itself of more than half of the nasty LSAs.

      copek@J2300-7> show ospf database summary 
      Area 0.0.0.0:
         20 Router LSAs
         15 Network LSAs
         1 OpaqArea LSAs
      Externals:
         213082 Extern LSAs
      Interface fe-0/0/0.22:
      Area 0.0.0.0:
      Interface fe-0/0/0.3:
      Area 0.0.0.0:
      Interface fe-0/0/0.312:
      Area 0.0.0.0:
      Interface fe-0/0/0.3201:
      Area 0.0.0.0:
      Interface fe-0/0/0.323:
      Area 0.0.0.0:
      Interface lo0.0:
      Area 0.0.0.0:
      
      copek@J2300-7> 
      

      And it had expired all of them, keeping only the "good" externals in it's LSDB

      juniper@J2300-7> show ospf database | except "  3600  0x22 " | match ^Extern | count    
      Count: 19 lines
      
      juniper@J2300-7> 
      
    5. Router 3 - Cisco 3750
    6. The box had blown it's memory stack again and the ospf process restarted.

      Jan 20 10:41:54.449: %SYS-2-MALLOCFAIL: Memory allocation of 38128 bytes failed from 0x18ECE58, alignment 0 
      Pool: Processor  Free: 29940  Cause: Not enough free memory 
      Alternate Pool: None  Free: 0  Cause: No Alternate pool 
       -Process= "HQM Stack Process", ipl= 0, pid= 142
      -Traceback= 1EA47C0 1EA4F0C 294A8A0 294CDC0 294D024 2BC1BDC 18ECE5C 18C453C 1A72C88 1A69758
      C3750-1#
      Jan 20 10:42:36.434: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.7 on Vlan3 from LOADING to FULL, Loading Done
      C3750-1#
      

      However, the box was back in the network fully particpating in OSPF. The ospf process restart seems to have prematurely cut off the ignore state. But by the time this had happened, the number of external LSAs floating around in area 0.0.0.0 was at a safe level.

    7. Router 4 - SRX100H
    8. Ignoring LSAs.

      juniper@SRX100-7_OSPF> show ospf overview    
      Instance: master
        Router ID: 3.3.3.4
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        Database protection state: Ignore (1050 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 0, Warning 3840, Allowed 5120
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 4, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 0
          Neighbors
            Up (in full state): 0
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 160
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      juniper@SRX100-7_OSPF> 
      
    9. Router 5 - EX220C
    10. Ignoring LSAs.

      juniper@EX2200C-3> show ospf overview    
      Instance: master
        Router ID: 3.3.3.5
        Route table index: 0
        AS boundary router
        LSA refresh time: 50 minutes
        Database protection state: Ignore (1033 seconds remaining)
          Warning threshold: 75 percent
          Non self-generated LSAs: Current 0, Warning 1920, Allowed 2560
          Ignore time: 1800, Reset time: 3600
          Ignore count: Current 4, Allowed 10
        Area: 0.0.0.0
          Stub type: Not Stub
          Authentication Type: None
          Area border routers: 0, AS boundary routers: 0
          Neighbors
            Up (in full state): 0
        Topology: default (ID 0)
          Prefix export count: 1
          Full SPF runs: 126
          SPF delay: 0.200000 sec, SPF holddown: 5 sec, SPF rapid runs: 3
          Backup SPF: Not Needed
      
      {master:0}
      juniper@EX2200C-3> 
      
    11. Router 6 - RB133
    12. Miracuralously, this box was back from the dead. Fully neighbored up with the other active particpants.

    13. Router 7 - Cisco 1760
    14. Like the 3750, this box had blown it's memory bounds despite the LSDB protection.

      -Process= "OSPF Router 1", ipl= 0, pid= 157
      -Traceback= 8000C1CC 8000F868 8001FD5C 8142BDA4 8142C27C 8142EAF8 8142EC00 8141483C 8142D6B4 813FFFDC 80561688 80565EFC
      Jan 20 10:34:31.112: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 
      Pool: Processor  Free: 80900  Cause: Memory fragmentation 
      Alternate Pool: I/O  Free: 651604  Cause: Memory fragmentation 
      
      -Process= "OSPF Router 1", ipl= 0, pid= 157
      -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC
      Jan 20 10:35:01.121: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 
      Pool: Processor  Free: 88488  Cause: Memory fragmentation 
      Alternate Pool: I/O  Free: 655364  Cause: Memory fragmentation 
      
      -Process= "OSPF Router 1", ipl= 0, pid= 157
      -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC
      Jan 20 10:36:04.012: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 
      Pool: Processor  Free: 127292  Cause: Memory fragmentation 
      Alternate Pool: I/O  Free: 661224  Cause: Memory fragmentation 
      
      -Process= "OSPF Router 1", ipl= 0, pid= 157
      -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC
      Jan 20 10:37:27.025: %SYS-2-MALLOCFAIL: Memory allocation of 10000 bytes failed from 0x8001FD58, alignment 0 
      Pool: Processor  Free: 209132  Cause: Memory fragmentation 
      Alternate Pool: I/O  Free: 684212  Cause: Memory fragmentation 
      
      -Process= "OSPF Router 1", ipl= 0, pid= 157
      -Traceback= 8000C1CC 8000F868 8001FD5C 8142C05C 8142F1C4 8142F7A8 8142FFA4 813FFC08 81400194 80561688 80565EFC
      Jan 20 10:41:35.398: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from FULL to DOWN, Neighbor Down: Too many retransmissions
      Jan 20 10:42:35.401: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from DOWN to DOWN, Neighbor Down: Ignore timer expired
      Jan 20 10:42:36.438: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0.3 from LOADING to FULL, Loading Done
      

      After an hour though, it was ignoring the network LSAs

      C1760-1#sh ip ospf 1
       Routing Process "ospf 1" with ID 3.3.3.7
       Supports only single TOS(TOS0) routes
       Supports opaque LSA
       Supports Link-local Signaling (LLS)
       Supports area transit capability
       Maximum number of non self-generated LSA allowed 480
          Threshold for warning message 75%
          Ignore-time 30 minutes, reset-time 60 minutes
          Ignore-count allowed 10, current ignore-count 3 - time remaining: 00:46:56
       It is an autonomous system boundary router
       Redistributing External Routes from,
          connected, includes subnets in redistribution
       Initial SPF schedule delay 5000 msecs
       Minimum hold time between two consecutive SPFs 10000 msecs
       Maximum wait time between two consecutive SPFs 10000 msecs
       Incremental-SPF disabled
       Minimum LSA interval 5 secs
       Minimum LSA arrival 1000 msecs
       LSA group pacing timer 240 secs
       Interface flood pacing timer 33 msecs
       Retransmission pacing timer 66 msecs
       Number of external LSA 19. Checksum Sum 0x3AE19A0
       Number of opaque AS LSA 0. Checksum Sum 0x000000
       Number of DCbitless external and opaque AS LSA 5
       Number of DoNotAge external and opaque AS LSA 0
       Number of areas in this router is 1. 1 normal 0 stub 0 nssa
       Number of areas transit capable is 0
       External flood list length 24365
          Area BACKBONE(0)
      	Number of interfaces in this area is 6 (1 loopback)
      	Area has no authentication
      	SPF algorithm last executed 00:01:22.415 ago
      	SPF algorithm executed 16 times
      	Area ranges are
      	Number of LSA 29. Checksum Sum 0x46971C
      	Number of opaque link LSA 0. Checksum Sum 0x000000
      	Number of DCbitless LSA 10
      	Number of indication LSA 0
      	Number of DoNotAge LSA 0
      	Flood list length 0
      
      C1760-1#
      

I waited another hour to let all of the ingore timers expire, and then checked to see what the final damage was two hours after the "event". Most everything had recovered by itself as evidenced by Nagios's monitoring page, a field of green with only a couple of bloody red gashes where OpenOSPFd, Quagga and XORP lived.

Nagios - Network Self Recovery

To bring the network fully back to life only took three actions:

  1. Restart ospfd on the OpenOSPFd box:
  2. Restart the Quagga daemons on the Quagga box.
  3. Restart the XORP service on the XoRP box.

Conclusions

It's very obvious that having routers remove themselves from OSPF had a really drastic effect on how the network behaved. The flooding of 500K external LSAs still had the same end effect, most of the network was unaccessible, but the timescales involved with reaching a converged states were really reduced. The damage done was really less severe as well. Most boxes on were able to recover on their own, no box had to be rebooted by hand, and segments of the network didn't have to be shut down. The recovery time, although it still took nearly 2 hours to come back as far as it could (due mostly to ignore LSA timers needing to expire) was remarkably quick compared to the network without any protection. The overall network churn was magnitudes of order less.

If all of the routers on the network had supported LSA database protection like the Junos > 10.2 version boxes, or the Ciscos I have no doubt the convergence and recovery times would have been even quicker. With a mix of boxes like here, the ignore timers need to be paced out enough that any stray LSAs that are floating around have a chance to expire by themselves, so when the protected boxes -- expecially the low memory ones that tend to develop serious problems **cough cough** 3750 **cough**, don't try to participate in OSPF to early and implode. Some of this will also depend on how vigilant the network operators are. If somebody leaks too many routes into the network on Friday afternoon, and nobody does anything until Monday, a lot of the boxes would have gone into isolation and yet others may have exploded during the periodic refreshes and refloods. On a carefully monitored network, the operators would have been altered to problems pretty quickly and would have a much easier time intervene and bring things back to normal.

With carefull tuning, and study of the network devices to realize what their limitations are, database protection can be a powerful tool in the arsenal against catastrophic mistakes. One would have to identify the weaker devices, like the Cisco 3750, and protect these to a higher degree than the other routers. Setting warning levels on database protection as well can serve as an early warning agianst creeping levels of external LSAs tha overtime might start to cause problems.

The Great OSPF vs. BGP Race

So we've seen that in great numbers, OSPF which is genarally regarded as quite quick can actually be pretty slow at delivering large amounts of routing information. On the otherhand, BGP which was designed to handel routing information at large scales is (in my opinion falsely) regarded as being fairly slow. So why not have a race!

Right away, I had to disqualify any router that didn't have enough memory to hold either a big BGP table or a big OSPF LSDB -- so anybody who is under 1GB is out! I also tried loading up the SRX100Hs and the SRX210HE with a big full BGP table, but their poor little MIPS processors were soon pretty overwhelmed and started to drop BGP connections. I also tried like hell to get BIRD to listen to a BGP port, but I couldn't get it to work at all. I booted XORP out of the race due to the persistent stability issues I've had with it. So the remaining players are: Vyatta, the Olives, Quagga, and the J2300. Since the topology now isolated the J2300, it was also given leave to be a member of Cluster 1. To try to make things fair, the Olive that is originating the routes was forced to be the DR for the segment.

So which is faster at blasting 500,000 prefixes from the Olive2GB to the rest of the routers, OSPF or BGP?

To test BGP, each router was set up in an internal BGP full mesh, with the 2GB Olive waiting to blast 500,000 BGP prefixes to all of the unsuspecting neighbors. OSPF is used to do what it should at this point, just provide topological information. We setup a new BGP group, and use two new policies on our iBGP neighbors: NHS to fix the next hop, and REJECT. REJECT is used simply to hold back all of the routes learned from exaBGP.

Additional Config on the 2GB Olive for iBGP
routing-options {
    router-id 1.1.1.3;
    autonomous-system 65066;
}
protocols {
    bgp {
        group iBGP {
            type internal;
            local-address 1.1.1.3;
            export [ REJECT NHS ];
            neighbor 1.1.1.1;
            neighbor 1.1.1.5;
            neighbor 1.1.1.7;
            neighbor 3.3.3.2;
        }
    }
}
policy-options {
    policy-statement FIX-NH {
        then {
            next-hop 172.20.1.1;
        }
    }
    policy-statement NHS {
        then {
            next-hop self;
        }
    }
    policy-statement REJECT {
        then reject;
    }
}

With the 2GB Olive loaded up with 500,000 BGP routes. We delete the REJECT export policy and unleash the deluge.

[edit protocols bgp group iBGP]
juniper@Olive2GB# delete export REJECT               

[edit protocols bgp group iBGP]
juniper@Olive2GB# commit 
commit complete

I tried several methods of trying to put a stopwatch on how long it took each router to learn all the BGP routes. None of my scientific approaches like setting off maximum prefix warnings seemed to work really well or provide a common comparison across all of the different platforms. So I restorted to alternately excuting a command to check the time on the router, and then to check the overall status of the routing table.

BGP proved to be fairly quick, especially for the routers that are all VMs and just speaking across an OpenVswitch to one another

Adversting 500,000 BGP prefixes:
Vyatta Quagga Olive J2300
5 seconds 6 seconds 12 seconds 83 seconds

To withraw the routes, the REJECT policy was simply reapplied. BGP proved to be even quicker at retracting prefixes.

Withdrawing 500,000 BGP prefixes:
Vyatta Quagga Olive J2300
5 seconds 4 seconds 9 seconds 38 seconds

Next up was OSPF, using the same 500k routes, and the same export policy that readvertised BGP routes into OSPF that was used in the previous test runs. Not suprisingly, OSPF was a lot slower. Once again the machines talking directly to the OpenVswitch learned all of the new LSAs first, while the J2300 with it's old FastEthernet interfaces took almost twice as long. The three VM based routers all converged at more or less the same time.

Adversting 500,000 OSPF prefixes:
Vyatta Quagga Olive J2300
549 seconds 549 seconds 549 seconds 941 seconds

Widthrawing OSPF was another story entirely. The Olive was a bit quicker pulling expired routes from it's inet.0 table, while Vyatta took an inordinate amount of time to yank the routes from its RIB. Quagga couldn't finish as it blew it's memory bounds. The J2300 was working so hard that the mgd daemon quit responding:

juniper@J2300-7> show route summary    
error: the routing subsystem is not responding to management requests

I'm not sure if it dropped a core, or it' still working on getting rid of the old LSAs. I can still ping the box, but the cli is basically frozen.

Withdrawing 500,000 OSPF prefixes:
Vyatta Quagga Olive J2300
2119 seconds Empty set. 420 seconds ? seconds

Conclusions

BGP was defintely a ton faster at this scale. The other thing that doesn't really show up well is the reliablity and CPU time involved. While BGP prefixes were being learned or withdrawn, there were short CPU spikes. While OSPF LSAs were changing, there was a very slow logartyhmic kind of buildup in CPU usage over time -- especially in the Vyatta box. I also exercised BGP quite a bit, advertising and widthrawing all of the routes a few dozens of times. If there were any route dampening policies in place they surely would have had a lot of fun. Anyway, all of the BGP implmentations sood up very well and acted in a consistent and predictable manner. I only pushed and pulled OSPF once, and had 1 box implode, 1 box drag down to the point I can't really see what is going on, and 1 box push it's CPU usage up and up and up -- taking what I considered an inordinate amount of time to reconverge.

Definately, at scale BGP is demonstratively a far superior protocol.

Killing the Survivors

Killing the Survivors

Through all of this chaos, only two boxes wethered all of the tests. The 1GB Olive stood very strong through everything (once the NICs were corrected) taking everything that was thrown at it. The Vyatta box showed alot of strain at times, but came through relatively unscathed. So do they get a reward? YES! 500,000 prefixes via BGP and OSPF simultaneously!

A quick check to make sure we're ready to go:

juniper@Olive2GB# run show bgp summary 
Groups: 2 Peers: 5 Down peers: 2
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0            500000     500000          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
1.1.1.1               65066        424     289541       0       0     3:31:28 0/0/0/0              0/0/0/0
1.1.1.5               65066       2488     461405       0       1     2:23:27 Active
1.1.1.7               65066       3055      78531       0       1     6:38:00 0/0/0/0              0/0/0/0
3.3.3.2               65066          4         10       0      15     1:28:16 Active
172.20.10.117         65069     500354        416       0       2    20:36:37 500000/500000/500000/0 0/0/0/0

[edit]
juniper@Olive2GB# run show route advertising-protocol bgp 1.1.1.1 

[edit]
juniper@Olive2GB# run show route advertising-protocol bgp 1.1.1.7    

[edit]
juniper@Olive2GB# run show ospf database summary 
Area 0.0.0.0:
   4 Router LSAs
   2 Network LSAs
Externals:
   3 Extern LSAs
Interface em1.1:
Area 0.0.0.0:
Interface em1.123:
Area 0.0.0.0:
Interface em1.134:
Area 0.0.0.0:
Interface em1.33:
Area 0.0.0.0:
Interface em2.0:
Area 0.0.0.0:
Interface lo0.0:
Area 0.0.0.0:

[edit]
juniper@Olive2GB# 

GO!

juniper@Olive2GB# delete protocols bgp group iBGP export REJECT         

[edit]
juniper@Olive2GB# set protocols ospf export EXPORT-BGP-TO-OSPF 

[edit]
juniper@Olive2GB# commit 
commit complete

[edit]
juniper@Olive2GB# 

MU-HAHAHA!

WHAT!They still live!

juniper@Olive1GB> show route summary 
Autonomous system number: 65066
Router ID: 1.1.1.7

inet.0: 501818 destinations, 1003354 routes (501818 active, 0 holddown, 0 hidden)
              Direct:      7 routes,      7 active
               Local:      6 routes,      6 active
                OSPF: 501548 routes, 501548 active
                 BGP: 501792 routes,    256 active
              Static:      1 routes,      1 active

juniper@Olive1GB> 

Another 500,000 routes! And some slop prefixes as well with two more exaBGP neighbors!

And Vyatta Dies at about 720K BGP and OSPF prefixes!
Jan 21 21:34:50 vyatta kernel: [522722.103536] Out of memory: Kill process 1544 (ospfd) score 406 or sacrifice child
Jan 21 21:34:50 vyatta kernel: [522722.107877] Killed process 1544 (ospfd) total-vm:424928kB, anon-rss:419876kB, file-rss:0kB
Jan 21 21:34:50 vyatta kernel: [522722.190039] ospfd: page allocation failure: order:0, mode:0x201da
Jan 21 21:34:50 vyatta kernel: [522722.190045] Pid: 1544, comm: ospfd Not tainted 3.3.8-1-586-vyatta-virt #1
Jan 21 21:34:50 vyatta kernel: [522722.190048] Call Trace:
Jan 21 21:34:50 vyatta kernel: [522722.190057]  [] ? warn_alloc_failed+0xc0/0xd1
Jan 21 21:34:50 vyatta kernel: [522722.190063]  [] ? __alloc_pages_nodemask+0x577/0x5ce
Jan 21 21:34:50 vyatta kernel: [522722.190067]  [] ? filemap_fault+0x26a/0x32f
Jan 21 21:34:50 vyatta kernel: [522722.190073]  [] ? __do_fault+0x97/0x403
Jan 21 21:34:50 vyatta kernel: [522722.190078]  [] ? handle_pte_fault+0x389/0x93a
Jan 21 21:34:50 vyatta kernel: [522722.190084]  [] ? common_interrupt+0x29/0x30
Jan 21 21:34:50 vyatta kernel: [522722.190088]  [] ? handle_mm_fault+0x1e0/0x1f6
Jan 21 21:34:50 vyatta kernel: [522722.190094]  [] ? do_page_fault+0x2cd/0x2e9
Jan 21 21:34:50 vyatta kernel: [522722.190099]  [] ? tick_program_event+0x1c/0x1f
Jan 21 21:34:50 vyatta kernel: [522722.190104]  [] ? hrtimer_interrupt+0x143/0x1f5
Jan 21 21:34:50 vyatta kernel: [522722.190108]  [] ? kvm_async_pf_task_wait+0x167/0x167
Jan 21 21:34:50 vyatta kernel: [522722.190112]  [] ? error_code+0x5a/0x60
Jan 21 21:34:50 vyatta kernel: [522722.190117]  [] ? detect_ht+0xc4/0x169
Jan 21 21:34:50 vyatta kernel: [522722.190120]  [] ? kvm_async_pf_task_wait+0x167/0x167
Jan 21 21:34:50 vyatta kernel: [522722.190123] Mem-Info:
Jan 21 21:34:50 vyatta kernel: [522722.190124] DMA per-cpu:
Jan 21 21:34:50 vyatta kernel: [522722.190127] CPU    0: hi:    0, btch:   1 usd:   0
Jan 21 21:34:50 vyatta kernel: [522722.190129] Normal per-cpu:
Jan 21 21:34:50 vyatta kernel: [522722.190131] CPU    0: hi:  186, btch:  31 usd:   0
Jan 21 21:34:50 vyatta kernel: [522722.190132] HighMem per-cpu:
Jan 21 21:34:50 vyatta kernel: [522722.190134] CPU    0: hi:   42, btch:   7 usd:   0
Jan 21 21:34:50 vyatta kernel: [522722.190140] active_anon:225514 inactive_anon:31 isolated_anon:0
Jan 21 21:34:50 vyatta kernel: [522722.190141]  active_file:8 inactive_file:11 isolated_file:0
Jan 21 21:34:50 vyatta kernel: [522722.190142]  unevictable:0 dirty:0 writeback:0 unstable:0
Jan 21 21:34:50 vyatta kernel: [522722.190143]  free:13235 slab_reclaimable:685 slab_unreclaimable:15401
Jan 21 21:34:50 vyatta kernel: [522722.190145]  mapped:3 shmem:112 pagetables:572 bounce:0
Jan 21 21:34:50 vyatta kernel: [522722.190153] DMA free:4448kB min:784kB low:980kB high:1176kB active_anon:9676kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:1292kB kernel_stack:0kB pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 21 21:34:50 vyatta kernel: [522722.190159] lowmem_reserve[]: 0 869 1000 1000
Jan 21 21:34:50 vyatta kernel: [522722.190168] Normal free:48364kB min:44216kB low:55268kB high:66324kB active_anon:762424kB inactive_anon:0kB active_file:32kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:890008kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:2740kB slab_unreclaimable:60312kB kernel_stack:488kB pagetables:960kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:28213 all_unreclaimable? yes
Jan 21 21:34:50 vyatta kernel: [522722.190174] lowmem_reserve[]: 0 0 1047 1047
Jan 21 21:34:50 vyatta kernel: [522722.190182] HighMem free:128kB min:128kB low:1792kB high:3456kB active_anon:129956kB inactive_anon:124kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:134112kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:448kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 21 21:34:50 vyatta kernel: [522722.190188] lowmem_reserve[]: 0 0 0 0
Jan 21 21:34:50 vyatta kernel: [522722.190192] DMA: 6*4kB 59*8kB 9*16kB 7*32kB 4*64kB 4*128kB 5*256kB 3*512kB 0*1024kB 0*2048kB 0*4096kB = 4448kB
Jan 21 21:34:50 vyatta kernel: [522722.190200] Normal: 179*4kB 135*8kB 66*16kB 27*32kB 16*64kB 27*128kB 29*256kB 16*512kB 10*1024kB 1*2048kB 3*4096kB = 48388kB
Jan 21 21:34:50 vyatta kernel: [522722.190209] HighMem: 18*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 128kB
Jan 21 21:34:50 vyatta kernel: [522722.190217] 125 total pagecache pages
Jan 21 21:34:50 vyatta kernel: [522722.190219] 0 pages in swap cache
Jan 21 21:34:50 vyatta kernel: [522722.190221] Swap cache stats: add 0, delete 0, find 0/0
Jan 21 21:34:50 vyatta kernel: [522722.190223] Free swap  = 0kB
Jan 21 21:34:50 vyatta kernel: [522722.190224] Total swap = 0kB
Jan 21 21:34:50 vyatta kernel: [522722.192911] 262126 pages RAM
Jan 21 21:34:50 vyatta kernel: [522722.192914] 33792 pages HighMem
Jan 21 21:34:50 vyatta kernel: [522722.192915] 3545 pages reserved
Jan 21 21:34:50 vyatta kernel: [522722.192917] 160 pages shared
Jan 21 21:34:50 vyatta kernel: [522722.192918] 245105 pages non-shared

The last one standing is the 1GB Olive, but the 2GB Olive is still feeding it routes!

juniper@Olive2GB# run show bgp summary    
Groups: 2 Peers: 7 Down peers: 2
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0            727097     725300          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
1.1.1.1               65066         43      13290       0       1       20:12 0/0/0/0              0/0/0/0
1.1.1.5               65066       2488     461405       0       1     3:01:45 Active
1.1.1.7               65066       3139      95008       0       1     7:16:18 0/0/0/0              0/0/0/0
3.3.3.2               65066          4         10       0      15     2:06:34 Connect
172.20.10.117         65069     500366        428       0       2    21:14:55 500000/500000/500000/0 0/0/0/0
172.20.10.118         65069     203010         10       0       0       26:48 203008/203008/203008/0 0/0/0/0
172.20.10.119         65069      24091         10       0       0        7:16 22292/24089/24089/0  0/0/0/0

[edit protocols bgp group exaBGP]
juniper@Olive2GB# 

And Vyatta's BGP connection times out, letting it restart OSPF!

But now we have our first complaints from the Olive. It's getting a bit unreponsive and logging some of it's concerns at 830K routes!

Jan 21 21:45:19  Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 998900 KB of memory, 123 percent of available

At almost 1.1 million routes ( in both BGP and OSPF) the Olive is inching up on it's available swap space

Jan 21 22:07:18  Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1308533 KB of memory, 171 percent of available

And now we're getting some pretty good scheduler slips

Jan 21 22:12:19  Olive1GB rpd[4952]: RPD_SCHED_SLIP: 8 sec scheduler slip, user: 1 sec 249321 usec, system: 0 sec, 0 usec

Finally, at just under 1.25 Million routes (*2, OSPF + BGP), some real problems:

Jan 21 22:35:19  Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1437839 KB of memory, 194 percent of available
Jan 21 22:38:31  Olive1GB rpd[4952]: RPD_SCHED_SLIP: 183 sec scheduler slip, user: 1 sec 767385 usec, system: 0 sec, 138129 usec
Jan 21 22:38:31  Olive1GB rpd[4952]: RPD_BFD_WRITE_ERROR: bfd_send: write error on pipe to bfdd (Broken pipe)
Jan 21 22:38:31  Olive1GB rpd[4952]: bgp_pp_recv: rejecting connection from 1.1.1.3 (Internal AS 65066), peer in state Established
Jan 21 22:38:31  Olive1GB rpd[4952]: bgp_pp_recv:2939: NOTIFICATION sent to 1.1.1.3+51359 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Jan 21 22:38:31  Olive1GB rpd[4952]: RPD_PPM_WRITE_ERROR: ppm_send: write error on pipe to ppmd (Broken pipe)
Jan 21 22:38:31  Olive1GB rpd[4952]: RPD_OS_MEMHIGH: Using 1438843 KB of memory, 194 percent of available

BGP is down. The router is unresponsive. I'd call it pretty much unusalbe at this point. And we got a massive scheuler slip of more than 4 minutes!

Jan 21 22:42:36  Olive1GB rpd[4952]: RPD_SCHED_SLIP: 242 sec scheduler slip, user: 3 sec 535128 usec, system: 1 sec, 683421 usec

So the ironman of routing was the Olive running Junos 10.0R4.7.

Conclusions and Reccomendations

First, a few caveots and regrets

What I didn't Do That I Probably Should Have Done....

What Else I wanted to throw in the mix but couldn't

Conclusions

Reccomendations




      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Version 0   |       C       |            Plenty             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Router ID - www.blackhole-networks.com              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Area ID - OSPF Overload                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          Checksum  OK         |         Construction          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-                                                             -+
      |                        PAGE IS A                              |
      +-                        GENERAL                              -+
      |                          MESS                                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+