R Code for finding faults in a router – Network Analytics

Blog Single

In this tutorial, we will present a few simple yet effective methods that you can use to build a simple analytics tool using R to identify congestions in a router.

At our company Redeem Systems, we have large amounts of expertise in the field of engineering and networking as well as analytics. Using the expertise in the field of networking we are able to identify problems in the field of networking by which we were able to identify how analytics could help in the domain. The code discussed in this blog is just to give you a very simple and primitive application of what we can achieve in networking and engineering analytics.

Having expertise was of great help in identifying the problems that they would face while designing a router and problems they need to fix. In this example, we are going to go over how we could identify congestions in router, given a system log file of a router.

Prerequisites

Programming Language: R

R is a very powerful statistics tool for computing. What makes R good for Analytics is most of the analysis can be done using the vast amount of libraries that it has.

Programming IDE(optional but preferrable, you can also use some editor for coding): RStudio

Download the installer for your OS from the link below

Once, you have R installed you will need a few libraries before you can get started with the analysis. So, fire up your favorite editor or open Rstudio. Open a new R script file in RStudio:

library(reshape)
library(stringr)
The library reshape provides a variety of methods for reshaping data prior to analysis.

There are four main families of functions in stringr:

  • Character manipulation: these functions allow you to manipulate the individual characters inside the strings inside character vectors
  • Whitespace tools to add, remove, and manipulation whitespace.Locale sensitive operation whose operation will vary for locale to locale
  • Pattern matching functions. These recognise four engines of pattern description. The most common is regular expresssions, but there are a three other tools.
  • We will be using these two libraries for quite a bit of tasks as you will see ahead.

They say that most of the analysis is done, once you have pre-processed your data and made it ready for analysis. Ok, I dont know who said that, let’s assume I said that. In this case, it is totally true because most of the work is done in pre-processing, since here we are going to do root cause analytics. Hence, we do not have to do training for it to be able to understand. This is the most primitive use case. Ill also try to tell you about a few complex problems that we are working.

First of all, we need the data that we will be using for the analysis. It is a syslog file that was obtained from one of our routers. It is completely unstructured, as it comes in the format of a .log file. I’ll be giving the download link to it below. But the thing that we are going to focus on is how to process any unstructured data. So you can use any kind of data for doing the same. But the reason I’m using this particular data file is because i want to explain the use case related to a router.

Data Processing:

So, now we are going to get started with processing the data file and convert it into a data frame. Data frame is like a table containing the data, with all the parameters. First, we load the required libraries and then load the data we need to do the analysis on.

library(reshape)
library(stringr)

#Loading the data and converting it into a data frame
#When you run the final code, file.choose() will automatically ask you to select the file 

data <- read.table(file.choose(),header=FALSE,sep="\t")
data<-as.data.frame(data)

Now, we have to eliminate the first few lines of the data, since it has all the redundant data that is not needed for the analysis. We ,start from where the log file has data according to the value assigned to it. In this case, the data is only available for the month of February. So, we use the grep function to obtain the lines of the log that only contain the value Feb.

You can also do the same if you want to do month wise analysis. We also, do remove the extra whitepsaces between the code in several of the lines.

	#Removing the first few lines using grep function
data_cleaned<-subset(data(),grepl("Feb",ignore.case=TRUE))

#Removing the extra white spaces between lines
data_cleaned<-gsub("\\s+"," ",data_cleaned)

data_cleaned<-as.data.frame(data_cleaned)
  
  
Date <- as.Date(data_cleaned$data_cleaned,"%b %d")
    
#Splitting the log into 6 different columns
data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6)
data_final<-as.data.frame(data_cleaned)	

Data Insights:

Now, the data has been completely processed and we can identify when the defence was was

#Syslog defence was approved at these points
data_protection_approved<-subset(log, grepl("cleared",data()$V6,ignore.case=TRUE))

Now, we give the complete code at once for it be understood. If you have a look at it once, you will see that it’s a lot of explanation. I was trying to explain this code, so that anyone and everyone can understand what’s going on.

#Complete Code

library(reshape)
library(stringr)

#Loading the data and converting it into a data frame
#When you run the final code, file.choose() will automatically ask you to select the file 

In this tutorial, we will present a few simple yet effective methods that you can use to build a simple analytics tool using R to identify congestions in a router.

At our company Redeem Systems, we have large amounts of expertise in the field of engineering and networking as well as analytics. Using the expertise in the field of networking we are able to identify problems in the field of networking by which we were able to identify how analytics could help in the domain. The code discussed in this blog is just to give you a very simple and primitive application of what we can achieve in networking and engineering analytics.

Having expertise was of great help in identifying the problems that they would face while designing a router and problems they need to fix. In this example, we are going to go over how we could identify congestions in router, given a system log file of a router.

Prerequisites

Programming Language: R

R is a very powerful statistics tool for computing. What makes R good for Analytics is most of the analysis can be done using the vast amount of libraries that it has.

Programming IDE(optional but preferrable, you can also use some editor for coding): RStudio

Download the installer for your OS from the link below

Download R Studio for any OS

Once, you have R installed you will need a few libraries before you can get started with the analysis. So, fire up your favorite editor or open Rstudio. Open a new R script file in RStudio:

library(reshape)
library(stringr)

The library reshape provides a variety of methods for reshaping data prior to analysis.

There are four main families of functions in stringr:

  • Character manipulation: these functions allow you to manipulate the individual characters inside the strings inside character vectors
  • Whitespace tools to add, remove, and manipulation whitespace.Locale sensitive operation whose operation will vary for locale to locale
  • Pattern matching functions. These recognise four engines of pattern description. The most common is regular expresssions, but there are a three other tools.
  • We will be using these two libraries for quite a bit of tasks as you will see ahead.

They say that most of the analysis is done, once you have pre-processed your data and made it ready for analysis. Ok, I dont know who said that, let’s assume I said that. In this case, it is totally true because most of the work is done in pre-processing, since here we are going to do root cause analytics. Hence, we do not have to do training for it to be able to understand. This is the most primitive use case. Ill also try to tell you about a few complex problems that we are working.

First of all, we need the data that we will be using for the analysis. It is a syslog file that was obtained from one of our routers. It is completely unstructured, as it comes in the format of a .log file. I’ll be giving the download link to it below. But the thing that we are going to focus on is how to process any unstructured data. So you can use any kind of data for doing the same. But the reason I’m using this particular data file is because i want to explain the use case related to a router.

Data Processing:

So, now we are going to get started with processing the data file and convert it into a data frame. Data frame is like a table containing the data, with all the parameters. First, we load the required libraries and then load the data we need to do the analysis on.

library(reshape)
library(stringr)

#Loading the data and converting it into a data frame
#When you run the final code, file.choose() will automatically ask you to select the file 

data <- read.table(file.choose(),header=FALSE,sep="\t")
data<-as.data.frame(data)

Now, we have to eliminate the first few lines of the data, since it has all the redundant data that is not needed for the analysis. We ,start from where the log file has data according to the value assigned to it. In this case, the data is only available for the month of February. So, we use the grep function to obtain the lines of the log that only contain the value Feb.

You can also do the same if you want to do month wise analysis. We also, do remove the extra whitepsaces between the code in several of the lines.

#Removing the first few lines using grep function
data_cleaned<-subset(data(),grepl("Feb",ignore.case=TRUE))

#Removing the extra white spaces between lines
data_cleaned<-gsub("\\s+"," ",data_cleaned)

data_cleaned<-as.data.frame(data_cleaned)
  
  
Date <- as.Date(data_cleaned$data_cleaned,"%b %d")
    
#Splitting the log into 6 different columns
data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6)
data_final<-as.data.frame(data_cleaned)	

Data Insights:

Now, the data has been completely processed and we can identify when the defence was was

#Syslog defence was approved at these points
data_protection_approved<-subset(log, grepl("cleared",data()$V6,ignore.case=TRUE))

Now, we give the complete code at once for it be understood. If you have a look at it once, you will see that it’s a lot of explanation. I was trying to explain this code, so that anyone and everyone can understand what’s going on.

#Complete Code

library(reshape)
library(stringr)

#Loading the data and converting it into a data frame
#When you run the final code, file.choose() will automatically ask you to select the file 

data <- read.table(file.choose(),header=FALSE,sep="\t")
data<-as.data.frame(data)

#Removing the first few lines using grep function
data_cleaned<-subset(data,grepl("Feb",ignore.case=TRUE))

#Removing the extra white spaces between lines
data_cleaned<-gsub("\\s+"," ",data_cleaned)

data_cleaned<-as.data.frame(data_cleaned)
  
  
Date <- as.Date(data_cleaned$data_cleaned,"%b %d")
    
#Splitting the log into 6 different columns
data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6)
data_final<-as.data.frame(data_cleaned)	

#Syslog Defence was approved at these points
data_protection_approved<-subset(log, grepl("approved",data()$V6,ignore.case=TRUE))

Please note that the code, is particularly for this log file. So suppose you have a system that generates the same kind of log everyday. You can take into store all this data and find out at what points anomalies are occurring.

Similarly, you can do complex analytics applications by collecting data for years or months and use this data. Some of things that you can with that data is:

1) You can use this historic data for forecasting how the congestions would be based on day, date, month of the year, weekly trends and a lot more. From this you can get a lot of insights. There are many applications just to the forecasting analytics

2) You can use this data to find faults and also predict based on the data at that particular time to understand patterns, hence prevent faults before they occur.

	data <- read.table(file.choose(),header=FALSE,sep="\t") data<-as.data.frame(data) #Removing the first few lines using grep function data_cleaned<-subset(data,grepl("Feb",ignore.case=TRUE)) #Removing the extra white spaces between lines data_cleaned<-gsub("\\s+"," ",data_cleaned) data_cleaned<-as.data.frame(data_cleaned) Date <- as.Date(data_cleaned$data_cleaned,"%b %d") #Splitting the log into 6 different columns data_final<-str_split_fixed(data_cleaned$data_cleaned, " ", 6) data_final<-as.data.frame(data_cleaned)  #Syslog Defence was approved at these points data_protection_approved<-subset(log, grepl("approved",data()$V6,ignore.case=TRUE))

Please note that the code, is particularly for this log file. So suppose you have a system that generates the same kind of log everyday. You can take into store all this data and find out at what points anomalies are occurring.

Similarly, you can do complex analytics applications by collecting data for years or months and use this data. Some of things that you can with that data is:

1) You can use this historic data for forecasting how the congestions would be based on day, date, month of the year, weekly trends and a lot more. From this you can get a lot of insights. There are many applications just to the forecasting analytics

2) You can use this data to find faults and also predict based on the data at that particular time to understand patterns, hence prevent faults before they occur.

Leave a Comment