Abstract
For ensuring Network Security (NS) against malicious activities, Dedicated
Link Aggregation (DLA) in Computer Network Traffic (CNT) optimizes data transmission with increased bandwidth and
reliability. Nevertheless, the traditional works failed to identify the Attack
Patterns (APs) centred on timestamps, Error Rates (ERs), and flow duration, thereby
resulting in inefficiencies in threat detection in NS. Thus, this paper
proposes Ensemble Adaptive Entropy Density-Based Spatial Clustering of
Applications with Noise (EAE-DBSCAN) and MeDecay Heuristic-based Radial Basis
Function Networks (MDH-RBFN) techniques to identify patterns and classify the
normal and malicious traffic, respectively. Primarily, the data is pre-processed,
followed by DLA utilizing EAE-DBSCAN and feature extraction. After that, by
using EAE-DBSCAN, the patterns are identified from the extracted features for
enhanced network performance. Subsequently, utilizing MDH-RBFN, the data is
categorized as normal and malicious traffic with a Mean Absolute Error (MAE) of
0.0025. Here, the malicious traffic is blocked, whereas the non-attacked data
is encrypted. Thereafter, the traffic level is predicted for non-attacked
traffic data as low, medium, high, very high, and extreme. At last, the
required loads are balanced for storing data in the cloud.
Keyword Extraction: Reed-Solomon Quantum turbo Codes Cryptography (RSQ2C), Generalized Bell-Π Fuzzy Inference System (GBΠFIS), Weighted Round Robin with Overflow Handling (WR2QH), Computer Network Traffic (CNT), Time Series Analysis, Pattern Identification (PI), and Dedicated Link Aggregation (DLA).
1. Introduction
To increase bandwidth, enhance reliability, and improve load balancing
across high-demand environments, multiple link connections are integrated into
a single logical connection by DLA in CNT (Abbasi, et al. 2021) (Choi, et al.
2021). Traffic in networks may arise owing to the multiple link connections (Weerakody,
et al., 2021). In this, normal traffic refers to legitimate data (Li, et al.
2021), whereas malicious traffic comprises unauthorized or harmful data packets
(Lindemann et al., 2021). Therefore, the malicious traffic must be blocked for
secured data transmission.
The malicious traffic data are detected and blocked by the prevailing Machine Learning and Deep Learning methods, namely Decision Trees (Liu, et al. 2021) and Long-Short Term Memory (LSTM) (Drewek-Ossowicka, et al., 2021). But, the scalability and real-time responsiveness (Ensafi, et al. 2022, Barrera-Animas et al., 2022) are impacted by challenges like high computational demands and data requirements (Ruiz, et al. 2021). In addition, the conventional works failed to identify the APs centered on timestamps, ERs, and flow duration, thus leading to inefficiencies in threat detection. Thus, in the proposed work, EAE-DBSCAN and MDH-RBFN techniques are leveraged to efficiently identify and block malicious traffic.
1.1 Problem Statement
The limitations in traditional works are explained as follows,
Ø None of the prevailing works
identified the APs centred on timestamps, ERs, and flow duration, thereby
causing inefficiencies in threat detection.
Ø The traffic level prediction
for non-attacked data was not concentrated on in many works, thus leading to
inefficient resource allocation and network congestion.
Ø Non-aggregation of dedicated
links in (Fotiadou, et al. 2021) resulted in suboptimal network performance,
bandwidth inefficiencies, and limited scalability.
Ø The non-attacked data in
(Balamurugan, et al. 2022) were stored in the cloud in unencrypted form, which led
to potential security breaches.
Ø Anomaly and Normal traffic were
identified utilizing unprocessed data in (Shen et al., 2021), which caused
misclassified results.
The proposed work’s objectives are
detailed below,
Ø The proposed work identified
the traffic APs centered on timestamps, ERs, and flow
duration utilizing EAE-DBSCAN.
Ø The traffic level prediction
is performed using the GBΠFIS technique.
Ø DLA is carried out using the EAE-DBSCAN algorithm.
Ø Data encryption is done using RSQ2C to improve NS.
Ø Data preprocessing is done for
enhancing the classification process.
The paper is structured as: The related works are discussed in Section 2, the proposed methodology is described in Section 3, the results and discussion are presented in Section 4, and lastly, the proposed work is concluded in Section 5 with future development.
2. Literature Survey
(Fotiadou, et al. 2021) presented a DL-based
approach for threat detection and control of traffic flow in NS. In this, based
on patterns that were automatically learned through LSTM, the traffic flows
were controlled. Hence, the malicious traffic data was effectively identified
by the framework. Yet, misclassifications were caused by the unprocessed data
usage, thereby hindering the effectiveness of the DL-based threat detection
method.
(Balamurugan et al., 2022) accomplished an Enhanced Deep Reinforcement Learning (EDRL) algorithm to enhance Network Traffic (NT) analysis and prediction. In this framework, the EDRL technique was used to analyze and predict different types of NT, comprising unencrypted and encrypted data traffic. However, during NT analysis, high computational complexities in this framework caused increased latency.
(Shen, et al. 2021) introduced a Decentralized Applications (DApp) fingerprinting approach utilizing Graph Neural Networks (GNNs) to efficiently identify users’ visits to specific DApps by analyzing encrypted NT. To preserve multiple-dimensional features in bidirectional client-server interactions, a Traffic Interaction Graph was used as an information-rich representation. However, due to the slow learning process, the framework had issues with adaptability to traffic changes and time efficiency.
(Khan, et al. 2022) established a Bayesian model that automatically analyzed the abnormal traffic flow patterns. In this work, Distributed Denial of Service attacks and Flash Crowds in Wireless Sensor Networks were also distinguished. In addition, high traffic caused by malicious attacks and legitimate spikes in user activity was differentiated in this model. This method failed to encrypt the non-attacked data despite efficient categorization, thus causing security breaches. Therefore, the entire work performance was hindered.
(Dong, 2021) developed a Cost Sensitive Support Vector Machine (CMSVM) to accurately identify application types in internet traffic using network flow level characteristics. Moreover, in this work, the dynamic assignment of weights enhanced the classification performance. Nevertheless, as CMSVM failed to handle highly imbalanced datasets, it attained an increased error rate. Hence, it decreased the accuracy of predicting the traffic flow levels.
3. Proposed Methodology for Pattern Identification and Traffic Level
Prediction in Dedicated Link Aggregation
In Figure 1, the structural diagram of the proposed EAE-DBSCAN and
MDH-RBFN techniques is shown.
Figure 1: Structural Diagram of the
Proposed Work.
3.1. Data Collection
The proposed work begins by collecting the data from two datasets, such
as Network Anomaly Detection (NAD) and
Network Analytics Time Series (NATS). It is shown as,
(1)
Where, the total number of collected data is signified as
.
3.2. Preprocessing
After that, by using Data Deduplication (DD), Missing Value
Imputation (MVI), and Normalization techniques, the collected data is preprocessed as
shown below,
®The duplicate copies are
removed from using deduplicated
data size
. Hence, the reduced data
is articulated as,
(2)
®Further, MVI fills in
the missing data for completeness as exhibited below,
(3)
Here, the process to fill in the missing
values is specified as, and the data after the imputation of missing values is
notated as
.
®Afterward,
based on minimum and maximum values, the data is normalized , which is shown as,
(4)
Here, the minimal and maximal values of are represented as
, and
is the preprocessed
data.
3.3. Dedicated
Link Aggregation
Subsequently, the dedicated links are
aggregated from by identifying the dense regions in traffic data using the DBSCAN technique. Nevertheless, due to fixed parameters, DBSCAN struggles
with varying density clusters. Hence, Ensemble Adaptive Entropy (EAE) technique,
which dynamically adjusts MinPts and Epsilon based on local density estimates
and quantifies cluster uncertainty, is used. The algorithmic steps are explained
below,
Primarily, the numbers of
are signified as,
(5)
Further, the core points required for grouping are centered on MinPts and
Epsilon. Here, the EAE technique with min-max adaptive parameter is used for Epsilon
determination, which is shown as,
(6)
Where, and
signify the optimal
epsilon value and the local density-based adaptive measure for
, respectively.
After that, the core points are calculated based on MinPts
using minimum and maximum of
, which are equated as,
(7)
(8)
Subsequently, the noise points (that do
not come under the boundary of core points) are removed as,
(9)
Here, the noise points to be ignored for
aggregation are notated as.
Thus,
specifies the optimally aggregated
dedicated links.
3.4. Feature Extraction and Error Rate Analyzation
Thereafter, the features like flow_duration,
protocol_type, service_flag, dst_host_count, src_bytes, dst_bytes,
num_failed_logins, srv_count, dst_host_count, dst_host_srv_count, class,
Timestamp, Outbound_Utilzation, and more are extracted from and are denoted as
.
Further, by using error count, ERs are analyzed to enhance the overall
traffic security and performance. Thus, the analyzed ERs are represented as,
(10)
Where, the synchronization and rejection
error count based on NTs is signified as,
and the total connections (error count) are denoted as
.
3.5. Pattern
Identification
Based on timestamp, analyzed ERs, and
flow duration, the APs are
identified after ER analysis to avoid inefficiencies in threat detection using
EAE-DBSCAN, which is explained in (section 3.3). Thus, represents the identified APs.
Then, the malicious and normal traffic data are categorized as explained
in the below sections.
3.6. Normal and Malicious Traffic Classification
Further, to categorize the malicious and normal traffic data, the
identified APs and extracted features
are inputted to Radial Basis Function
Networks (RBFN). Nevertheless, underfitting
or oversensitivity issues could be caused by the poor selection of width
parameters in RBFNs, thereby degrading the model's accuracy. Hence, the MeDecay
Heuristic (MDH) technique, which adaptively sets width parameters for improved
stability and generalization, is introduced. Figure 2 exhibits the MDH-RBFN classifier.
Figure 2: MDH-RBFN
classifier
Input State Calculation
§Primarily, the inputs and
are combinedly signified
as
. Thus, the
numbers of
are articulated as,
(11)
§ In addition, the center point is
initialized based on the input data , which is shown as,
(12)
Where, the center point based on the Radial Basis Function (RBF) is
represented as.
Hidden State Calculation
§Here, using Euclidean Distance
(ED), the distance is calculated between and
, and the calculated distance
is equated as,
(13)
§After that, the hidden state is computed for each
as,
(14)
Here, the exponential function is signified as, and the width parameter is notated as
. Here,
is adaptively set utilizing
the MDH technique as,
(15)
Where, the loss function is represented as, the regularization parameter is specified as
, and the median function is denoted as
.
Output State Calculation
The output is further computed utilizing
and weights
assigned for each
as shown below,
(16)
(17)
Here, the malicious and normal traffic data in the cloud environment are notated
as .
Pseudo code of MDH-RBFN
Input: Combined
input,
Output: Malicious and Normal traffic,
Begin
Initialize iterations,
While
Initialize ,
Calculate ED,
Evaluate width parameter,
Compute,
End while
Return
End
Here, to prevent security breaches, are blocked; for
enhanced security,
are encrypted.
3.7. Data Encryption are encrypted after
categorization by using Quantum Cryptography (QC), which provides unconditional security
based on the laws of quantum mechanics. However, the random qubit sequence in
QC decelerates key generation, particularly over longer distances. Thus,
Reed-Solomon Turbo Codes (RSTC), which correct single-qubit errors and enhance
key generation, are utilized.
Initially, for faster key generation, a
quantum key is generated using RSTC.
Here, to create a unique key,
analyzes the public and
private keys
and is shown as,
(18)
(19)
Where, the coefficient of is specified as
. Then,
are encrypted
as exhibited below,
(20)
Here, the polarization
factor that converts
all into
photons
for respective
encryption is specified as
.
Finally, for secure communications, eavesdropping
is checked as equated below,
(21)
Here, and
represent the absence of
when the value is 1 and the
presence of
when the value is 0,
respectively.
Pseudo
code of RSQ2C
Input: Normal traffic data,
Output: Encrypted data,
Begin
Initialize iterations,
While
Initialize
Generate ,
Encrypt,
Check
End while
End
3.8. Traffic Level Prediction
By using the Fuzzy Inference System (FIS), the traffic level is predicted
after encrypting the data. But FIS has tuning difficulty of membership
function and control rules. Thus, the Generalized Bell-Π (GBΠ) membership function, which balances
interpretability and effectiveness in FIS by addressing the tuning challenges,
is used.
The working steps of GBΠFIS
are detailed as follows,
i.Initially, the rules are set centered on
the if-then condition as,
(22)
Here, the condition states that if
outbound utilization is
, then low, medium, high, very high, and extreme traffic
is predicted.
ii.After that, to overcome the tuning
difficulties, the GBΠ membership function is assigned for fuzzy and
is shown as,
(23)
Here, the scaling parameters of the GBΠ
function are specified as.
iii.Further, using a fuzzy relationship, the final decision
is obtained as shown
below,
(24)
iv.Subsequently, the crisp output is obtained from
and is exhibited as,
(25)
Therefore, the traffic levels are categorized based on rules.
3.9. Load
Balancing
Subsequent to categorization, data are directly stored in the cloud for future usage,
whereas
are balanced utilizing
the WR2QH technique.
¨Primarily, the weights are
assigned for each
server/link/data
.
¨After that, the total weight is
determined for each as
.
¨Then, the portion of traffic is calculated as,
(26)
Here, the total incoming traffic is specified as.
¨Thereafter, the traffic is handled
in based on capacity and
is shown below,
(27)
Where, the current and next server are signified as , and
denotes the balanced
data, which are then stored in the cloud for future usage. In further sections, the
performance assessment of the proposed work is described.
4. Results and Discussion
In this section, the performance assessment of the proposed and
traditional techniques is compared. In addition, the entire work is implemented
in the PYTHON platform.
4.1. Dataset
description
NAD and NATS datasets, which are gathered from publicly available sources,
are used in the proposed work. Here, the NAD and NATS datasets together contain
73367 data with 42 features and 2 classes, such as normal and anomaly traffic.
From the datasets, the proposed work used 45697 data (80%) and 27670 data (20%)
for training and testing, respectively.
4.2. Performance analysis
of the proposed work
Here, the proposed EAE-DBSCAN, RSQ2C, GBΠFIS, and MDH-RBFN are analogized
with existing techniques and related works. The proposed MDH-RBFN’s performance
assessment is shown below,
Figure 3: Performance Assessment of the Proposed MDH-RBFN.
In Figure 3, the performance assessment of the proposed MDH-RBFN and the
traditional RBFN, Bidirectional Long-Short Term Memory (BiLSTM), LSTM, and
Artificial Neural Network (ANN) techniques is shown. Here, the proposed
MDH-RBFN attained high Accuracy (99.05%), Precision (98.95%), Recall (99.01%),
F-measure (99.17%), Sensitivity (99.01%), and Specificity (98.64%) values than
prevailing techniques. This enhanced performance is owing to the utilization of
MDH, which adaptively sets width parameters centered on penalizing deviations.
Figure 4: Comparative Analysis based on MAE, MAPE and RMSE.
Table 1: Training Time Analysis.
|
Techniques |
Training Time (ms) |
|
Proposed
MDH-RBFN |
34252 |
|
Existing
RBFN |
56893 |
|
Existing
BiLSTM |
89436 |
|
Existing
LSTM |
97803 |
|
Existing
ANN |
107845 |
The MAE, Mean Absolute Percentage Error (MAPE), Root Mean Squared Error
(RMSE), and Training Time (TT) of the proposed MDH-RBFN and traditional RBFN,
BiLSTM, LSTM, and ANN techniques are illustrated in Figure 4 and Table 1. Here,
the proposed MDH-RBFN has minimum MAE (0.0135), MAPE (0.0167), RMSE (0.0127),
and TT (34252ms) values due to the enhanced performance, whereas the
traditional techniques exhibit degraded performance due to the oversensitivity
issues.
Figure 5: Pattern Identification and Aggregation Time Validation.
As shown in Figure 5, the Pattern Identification Time (PIT) and
Aggregation Time (AT) of the proposed EAE-DBSCAN are analogized with the
traditional DBSCAN, K-Means Clustering (KMC), Fuzzy C Means (FCM), and
K-Nearest Neighbor (KNN) techniques. Here, the proposed EAE-DBSCAN attained
minimum PI (3578ms) and AT (2389ms). But, the traditional DBSCAN, KMC, FCM, and
KNN attained average maximum PIT (8493ms) and AT (7457ms). The proposed
technique is enhanced since EAE usage in DBSCAN dynamically adjusts the parameters
in varying-density datasets.
(a)
(b)
Figure 6: (a) Encryption, Decryption Time, and (b) Security Level validation of the proposed RSQ2C.
As shown in Figures 6 (a) and 6 (b), the proposed RSQ2C is compared with
traditional QC, Elliptic Curve Cryptography (ECC), Rivest-Shamir-Adleman (RSA),
and ElGamal techniques. Here, RSTC enhances QC by improving error correction
and key generation. So, the proposed RSQ2C has a minimum Encryption time of 987ms
and a Decryption time of 945ms, with a high Security Level (SL) of 98.85%. However,
the traditional approaches had maximum encryption and decryption times with low
SL, which degraded the entire work performance.
Figure 7: Performance Analysis of the Proposed GBΠFIS.
The performance analysis of the proposed GBΠFIS and the prevailing FIS, Sigmoid Fuzzy (SF), Trapezoidal Fuzzy (TF), and Singleton Fuzzy (SiF) is shown in Figure 7. Here, the proposed work is enhanced in predicting traffic levels by achieving minimum Fuzzification Time (FT= 2132ms), Defuzzification Time (DFT= 2085ms), and Rule Generation Time (RGT= 1045ms). This enhancement is owing to the utilization of GBΠ in FIS, which renders flexible membership modeling and enhances traffic prediction accuracy. However, the traditional techniques exhibit more FT, DFT, and RGT values than the proposed GBΠFIS, thereby hindering the process of predicting traffic levels.
Table 2: Comparative Analysis with Related Works.
|
Study |
Techniques |
RMSE |
MAPE |
|
Proposed Work |
MDH-RBFN |
0.0127 |
0.0167 |
|
(Yang et al., 2021) |
ARIMA-BPNN |
0.076 |
0.099 |
|
(Bi et al., 2022) |
ST-LSTM |
0.036 |
- |
|
(Xu et al., 2021) |
AE |
0.129 |
- |
|
(Wan et al., 2022) |
LSTM |
0.112 |
- |
|
(Pan et al., 2022) |
FPKNet |
0.509 |
0.369 |
In Table 2, the proposed work is related to prevailing works. Here, for enhanced performance, the proposed work used the MDH technique, thus attaining minimum RMSE and MAPE values. Nevertheless, the conventional (Yang et al., 2021) and (Pan et al., 2022) utilized AutoRegressive Integrated Moving Average model-based Back Propagation Neural Network (ARIMA-BPNN) and Fusion Prior Knowledge Network (FPKNet) with maximum MAPE values. Additionally, (Bi et al., 2022), (Xu et al., 2021), and (Wan et al., 2022) used Savitzky–Temporal-based Long-Short Term Memory (ST-LSTM), AutoEncoder (AE), and LSTM techniques with average maximum RMSE value (0.1724ms) owing to overfitting and slow learning effects. Hence, the proposed work outperformed in analyzing the malicious traffic networks than the traditional works.
5. Conclusion
In this research, the APs based on timestamp, flow duration, and ERs are
effectively identified for more enhanced NS. Initially, the data gathered from
the datasets were preprocessed. After that, the dedicated links were aggregated
within 2389ms. In addition, using EAE-DBSCAN, the APs were identified with a
minimum PIT (3578ms). Subsequently, MDH-RBFN categorizes the normal and
malicious traffic with 99.05% accuracy and an encryption time of 987ms. At last,
the traffic level was predicted with a minimum RGT (1045ms). Hence, the
proposed work performed better in predicting malicious traffic for enhanced NS.
6. Future Recommendation
However, numerous approaches will be implemented in the future to predict
the traffic severity for more secure data transmission.
7. References