This is a simple paper outlining how to use some Risk Analysis Techniques with Network Engineering to obtain quantitative results. An attempt to turn network engineering from an artful guessing game into a science.
Before applying Fault Tree Analysis (FTA) to Network Engineering, it's worth to mention what it actually is first. Fault Tree Analysis is a risk assessment technique that was developed for the United States Air Force by Bell Labs for use with ICBM systems in the early 1960's. It is an analytical technique for graphically modelling the the pathways in a system that can lead to an undesirable or unintended event. It is useful for identifying the inter-relationship between events, systems, subsystems and behavior. This analysis technigue makes it possible to throughly identify the contributing factors that may lead towards an undesireable event.
FTA can be used to model very complex systems. However, it makes it a lot easier to keep the systems as simple as possible as the models can become quite large and a bit harder to work with. Breaking a complex system down into several models of smaller subsystems is an effective way to keep the models size workable. The model uses standard logic gates: AND, OR, XOR, etc. to interconnect events and conditions. Using standard logic gates makes it easy to convert the model from a graph into a quantitative result using boolean algebra. Each event and condition can be assigned a probability in order to quantify the risk of the undesired event occurring. These probabilities can be assigned via manufaturers data like MTBF, test data, service level agreements or historical events from a ticketing system. The analysis if very rigourous, well defined and provides consistent results. Once the math is run, numbers can be matched up against the actual probability of the undesired event occuring. The effect of each subsystem or contirbuting event can be seen and quantified and assessed.
One of the real strengths of FTA is the fact that it creates a graphical model of a system and the relationships between systems and events. It makes it easy to visualize the specific sequence of events that will lead to a failure.
Example Fault Tree
Several in depth FTA tutorials have already been created by experts in the field. So rather than try to rehash, reword, steal and plagiarize these we're going to jump right in an demonstrate how this technique can be applied to network engineering. The focus will be on how this can be used for educated decision making and prioritization of resources.
Some excellent tutorials on FTA can be found in the references section.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version 0 | C | Plenty | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Router ID - www.blackhole-networks.com | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Area ID - FTA with Network Engineering | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum OK | Construction | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | PAGE STILL | +- UNDER -+ | CONSTRUCTION | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+