Confusion Matrix and Cyber Crime Cases related to it

Shubham Kumar
6 min readJun 4, 2021

--

First of all, let us understand about Confusion Matrix

A confusion matrix is a table that outlines different predictions and test results and contrasts them with real-world values. Confusion matrices are used in statistics, data mining, machine learning models and other artificial intelligence (AI) applications. It can also be called an error matrix. Confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix is confused when it makes prediction.

Confusion Matrix consists of four classes represented in 2*2 matrix, as True Positive(TP) , False Positive(FP), False Negative(FN) and True Negative(TN). It is generally used incase of Classification models which are based on Logistic Regressions to create a matrix based on analysis between the original values and predicted values.

Here we will consider our Binary Classification model predicted for some values of a dataset and 0 as negative values and 1 as positive values

True Positive (TP) : It shows number of all positive values of classification model which are correctly predicted by the Machine Learning Model

False Positive (FP) : It shows number of all positive values of classification model which are wrongly predicted by the Model and which does not matches with the original values. Also known as Type-1 error, is a result that indicates that a given condition is present when it actually is not present. In our shepherd example, that would be incorrectly identifying the animal as a wolf when in reality it is a dog.

False Negative (FN) : Shows number of all negative values of classification model which are wrongly predicted by the Model, means according to the original value it is positive but the model predicted it as negative. Also known as Type-2 error, is a result that indicates that a given condition is not present when it actually is present. In our shepherd example, that would be incorrectly identifying the animal as a dog when in reality it is a wolf.

True Negative (TN) : Shows number of all negative values of classification model which are correctly predicted by the Model, matches with the original value

For example, in a company they use confusion matrix as an alert for their security to prevent any hacker to enter their server and misusing the data so in this scenario, Confusion Matrix would be used in such a way that if in one day 200 visitors visit the company’s site, of which the model predicted that 140 visitors were of no harm (can be taken in a positive way and no alert is needed) and for rest 60 visitors, the model provided the alerts to the SecOps team of the company for preventing those 60(visitors/hackers) from data breaching, as it was harmful(negative) for the company

Negative(In which alert is shown): In this scenario, model predicted for 60 visitors as ones trying for data breaching, of which when we checked from Confusion matrix, 40 were right predictions (True Negative[TN]) by the model but 20 were wrong predictions(False Negative [FN])as they were the innocent ones who were not trying for data breach but due to accuracy of the model, they were considered as harmful for the company so the SecOps team have to check for them but found no issue with them due to inaccurate prediction of the model and this issue is known as Type-2 error which is not that much harmful for the company because the SecOps team have to just unnecessarily work again

Positive (In which no alert is shown): In this scenario, model predicted for 120 visitors as the ones who were non-harmful for the company so it does not show any type of alert to the SecOps team, so in this case SecOps team was not involved at all. After we verified according to Confusion Matrix then we get to know that 90 were right predictions (True Positive [TP]) by the model, means 90 visitors were 100% pure visitors but rest 30 were wrong predictions (False Positive [FP]) by the model due to less accuracy in which model considered them as visitors but they were actually hackers or the ones who were trying for data breaching which in reality is very much harmful for the company as in this case, no alert was shown and wrongly predicted visitors can easily tamper the data because there were no SecOps team for preventing them or blocking their activity, this type of issue is also known as Type-1 error which is very much dangerous for the company

Cyber Crime:

Nowadays, cyber crimes have increased as everything is being tried to be converted to digital way to decrease human efforts and as mentioned above due to Confusion Matrix and many other cases too. Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device. Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

What Are False Positives?

False positives are mislabeled security alerts, indicating there is a threat when in actuality, there isn’t. These false/non-malicious alerts (SIEM events) increase noise for already over-worked security teams and can include software bugs, poorly written software, or unrecognized network traffic.

By default, most security teams are conditioned to ignore false positives. Unfortunately, this practice of ignoring security alerts — no matter how trivial they may seem — can create alert fatigue and cause your team to miss actual, important alerts related to a real/malicious cyber threats (as was the case with the Target data breach).

These false alarms account for roughly 40% of the alerts cybersecurity teams receive on a daily basis and at large organizations can be overwhelming and a huge waste of time.

What Are False Negatives?

False negatives are uncaught cyber threats — overlooked by security tooling because they’re dormant, highly sophisticated (i.e. file-less or capable of lateral movement) or the security infrastructure in place lacks the technological ability to detect these attacks.

These advanced/hidden cyber threats are capable of evading prevention technologies, like next-gen firewalls, antivirus software, and endpoint detection and response (EDR) platforms trained to look for “known” attacks and malware.

No cybersecurity or data breach prevention technology can block 100% of the threats they encounter. False positives are among the 1% (roughly) of malicious malware and cyber threats most methods of prevention are prone to miss.

Solution:

Here are a few simple rules to help govern your approach to cybersecurity with a preventative, reactive, and proactive mindset:

  • Assume you’re breached and begin your offensive (proactive) initiatives with the goal of finding those breaches. By doing so, you’ll seek to validate the strength of your defensive/prevention tools with the understanding that none of them are 100% effective.
  • Use asset discovery tools to discover the hosts, systems, servers, and applications within your network environment, because you can’t protect what you don’t know exists.
  • Execute regular compromise assessments (we recommend at least once a week) and inspect every asset residing on your network.
  • Define security policies and procedures, and implement educational/training requirements so your entire team knows what to do in the event you discover a hidden breach, or worse, fall victim to a data breach.
  • Time is your most valuable asset, so implementing tools/technology to speed your speed of detection and time to respond are key and can help your security team prevent a data breach.

Thankyou for giving your valuable time to my blog, hope you like it 👐

References

#worldrecordholder #training #internship #makingindiafutureready #summer #summertraining #python #machinelearning #docker #rightmentor #deepknowledge #linuxworld #vimaldaga #righteducation

--

--