A novel multi-scale cnn and bi-lstm arbitration dense network model for low-rate ddos attack detection

Nature

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Low-rate distributed denial of service attacks, as known as LDDoS attacks, pose the notorious security risks in cloud computing network. They overload the cloud servers and degrade

network service quality with the stealthy strategy. Furthermore, this kind of small ratio and pulse-like abnormal traffic leads to a serious data scale problem. As a result, the existing

models for detecting minority and adversary LDDoS attacks are insufficient in both detection accuracy and time consumption. This paper proposes a novel multi-scale Convolutional Neural

Networks (CNN) and bidirectional Long-short Term Memory (bi-LSTM) arbitration dense network model (called MSCBL-ADN) for learning and detecting LDDoS attack behaviors under the condition of

limited dataset and time consumption. The MSCBL-ADN incorporates CNN for preliminary spatial feature extraction and embedding-based bi-LSTM for time relationship extraction. And then, it

employs arbitration network to re-weigh feature importance for higher accuracy. At last, it uses 2-block dense connection network to perform final classification. The experimental results

conducted on popular ISCX-2016-SlowDos dataset have demonstrated that the proposed MSCBL-ADN model has a significant improvement with high detection accuracy and superior time performance

over the state-of-the-art models. SIMILAR CONTENT BEING VIEWED BY OTHERS A NOVEL APPROACH FOR APT ATTACK DETECTION BASED ON AN ADVANCED COMPUTING Article Open access 27 September 2024 A

NOVEL OPTIMIZATION-DRIVEN DEEP LEARNING FRAMEWORK FOR THE DETECTION OF DDOS ATTACKS Article Open access 14 November 2024 A MULTILAYER DEEP AUTOENCODER APPROACH FOR CROSS LAYER IOT ATTACK

DETECTION USING DEEP LEARNING ALGORITHMS Article Open access 25 March 2025 INTRODUCTION With the widespread deployment of cloud computing network, LDDoS attacks have been reported as the

notorious network security issues. These attacks are a form of periodic multi-variate time series pulse flow where a malicious user floods the cloud service stealthily. They can severely

degrade the quality of service confirmed by researchers1. But their behaviors are alike to normal flows in terms of speed and volume, making them difficult to distinguish from normal flows.

For example, the BlackNurse attack reported by TDC Security Operations Center2 could degrade operators’ cloud services with the traffic speed of 5 packets per second, and offline operators’

firewalls if the traffic volume bumped to 15–18 Mbit/s. Thus, LDDoS attacks attract more and more attentions, and many LDDoS variants with smarter behaviors are designed. Hivenets3 used bee

colony algorithm to guide a group of intelligent bots to fire autonomous LDDoS attacks with minimal supervision. A multi-target LDDoS attack model4 tried to utilize and orchestrate bots’

spare time slots between bursts to fire new LDDoS attacks. And the bad news is these smarter LDDoS attacks are more strenuous to be captured. Even worse, all the traffics in the network are

impacted heavily at the same time, which leads to packet loss to some extent. This is why the big-scale public LDDoS dataset is scarce. As a result, designing and training a powerful and

satisfactory LDDoS attack detection model based on the limited poor data is a changeling task. Even the performance of models is positively correlated with the dataset, researchers tried to

centralize datasets that are scattered in different clients to train a powerful model with high performance. Federated Learning (FL) is just this kind of paradigm offering joint learning

through multiple datasets. Wang et al.5 proposed an intrusion detection method based on FL and CNNs to solve the problem of training a depth model with high accuracy under the limited

labelled data generated by a single mechanism. Li et al.6 proposed a novel DeepFed federated deep learning framework. The framework employed CNN and gated recurrent unit (GRU) as the basic

layers of intrusion detection model, and allowed multiple clients to collectively train the parameters of model. The experiments showed the proposed DeepFed scheme had the high effectiveness

in detecting network intrusion. Mothukuri et al.7 proposed a GRU-based IoT networks intrusion detection model. The weights of central model were updated from multiple clients to optimize

its accuracy by federated training rounds. This method kept the data intact on local IoT devices by sharing only the learned weights and guaranteed the data safe. Idrissi et al.8 proposed an

autoencoder-based method for distributed network intrusion detection systems by leveraging FL and anomaly detection. The method computed an intrusion score based on the reconstruction error

while preserving the privacy of local data. Wu et al.9 proposed a novel LDDoS attack detection FL framework using transfer learning and support vector machine (SVM). The framework performed

aggregation of data distributed in various clients through FL, then utilized transfer learning and SVM build personalized models. Bertoli et al.10 introduced the stacked-unsupervised FL

approach for detecting a flow-based network intrusion. The novel method comprised a deep autoencoder in conjunction with an energy flow classifier in an ensemble learning task. These methods

are available to use the high-quality LDDoS attack examples in different data nodes. However, their deep learning models are insufficiency to learn time sequence features. Thus, Tang et

al.11 used Gramian angular summation field transformation to analyze time-based features for finding out the attack, and the results were effective. With the emerging LSTM, researchers

integrated it into FL framework. By taking advantage of LSTM, Zhao et al.12 proposed an effective intelligent intrusion detection method based on FL aided LSTM framework, which can solve the

problem of a powerful deep learning model training and intrusion risks at the central server and violate user privacy if collecting dataset from all the user servers. Huong et al.13

proposed the hybrid variational autoencoder-based LSTM model based on FL for time series data anomalies. The architecture achieves high detection performance in time series data anomalies.

Zhang et al.14 proposed a vertical federated learning framework based on LSTM fault classification network. The framework could optimize model parameters on the entire firefighting IoT

platform, and get the effective predicted results. Wang et al.15 designed an attention-based bi-LSTM model within the framework of multi-domain FL framework for detecting coordinated network

attacks. Liu et al.16 proposed an asynchronous FL arbitration framework named AsyncFL-bLAM based on bi-LSTM and attention model. The novel arbitration detection model took on the

responsibility of LDDoS attack detection locally. The above methods can use well-labelled training data across many organizations. However, when researchers examined and compared the

performance of LDDoS attack detection methods deployed by centralized mode and FL mode17. The accuracy and outperforming of models in FL mode could only close to, but lower than ones in

centralized mode. Thus, researchers turned to LDDoS data enhancement and novel model design. At early stage, Random Forest (RF)18, SVM19, K-Near Neighbor (KNN)20 and their enhanced shallow

models with well-designed feature engineer were adopted for LDDoS attack detection. Tang et al.21 proposed performance and features framework for LDDoS attacks based on SVM, KNN and so on.

The framework analyzed the performance of normal traffic and made full use of flow features, which gained a high detection accuracy. Tang et al.22 also enhanced the fine-grained detection

model by adaptive Kohonen network cluster analysis algorithm. As a result, the proposal had an ability to accurate fine-grained detection and detect every attack burst. Muhammad et al.23

proposed a multivariate chart by integrating support vector data description and kernel density estimation. The chart can monitor the network’s anomaly with high performance. Later,

researchers realized that deep learning models were superior to shallow ones in detection accuracy and scalability. Liu et al.4 introduced locality sensitive features extraction method for

classifying the different flows into small buckets ahead, and then proposed a simple 3-layer CNN-based model to extract key feature representations. Zhou et al.24 proposed a few-shot

learning model with siamese CNN (FSL-SCNN) to alleviate the over-fitting issue and enhance the performance for intelligent anomaly detection. The FSL-SCNN encoding network was helpful to

optimize feature representations and enhance the efficiency of the training process. Asgharzadeh et al.25 worked out CNN-BMECapSA-RF model based on CNN and hybrid layers to enhance classify

accuracy, and developed a binary multi-objective enhanced capuchin search algorithm to implement feature extraction automatically. Ren et al.26 proposed a novel CANET network. The CNN

network and attention mechanism are mixed for local spatio-temporal feature extraction of LDDoS attacks. Experiments demonstrated CANET is efficient in accuracy, detection rate, and false

positive rate. Venkateshwarlu et al.27 proposed a framework named LRDADF, which used CNN and deep sparse autoencoder to detect LDDoS attack. The sparse autoencoder was used to learn from

traffic patterns and CNN was for classifying. Considering the time series properties of LDDoS, Salahuddin et al.28 proposed an anomaly detection system based on autoencoder to leverage

time-based features over multiple time windows for efficiently detecting anomalous LDDoS. Romany29 employed adaptive harmony search algorithm for feature selection, and attention based

bi-directional gated recurrent neural network model for time-based feature classification. And researchers introduced LSTM network in LDDoS attack detection, since its good performance in

time sequence data. Zhou et al.30 designed a variational LSTM (VLSTM) deep model for IoT attack detection. It used encoder-decoder network to reconstruct feature representation, used a

variational reparameterization scheme to learn key feature representation, and used three loss functions to control weigh learning. Mushtaq et al.31 proposed a hybrid framework comprising

deep auto-encoder with LSTM and bi-LSTM for attack detection. In this framework, the auto-encoder was used to obtain optimal features and LSTMs were used to finish the classification task.

Liu et al.32 proposed a novel, practical and fast ensemble model FastCBLA-EM. It used a competitive network composed of 1-D CNN and a bi-LSTM network to learn feature representations of

samples, and then, an arbitration mechanism was used to weigh the representations for classifying LDDoS attacks. Du et al.33 proposed NIDS-CNNLSTM model for network security. It combined the

powerful learning ability of LSTM network in time series data to extract key features, and use CNN network to classify those filtered features. To abridge, the state-of-the-art centralized

methods took use of both CNN and LSTM for feature extraction. But these existing methods used CNN and LSTM sequentially. The CNN was used to extract internal features, and then, the LSTM was

used to extract association based on the output of CNN. This means it is possible to loss important spatio-temporal features. This is unbearable on the limited poor dataset, specially.

Thus, in this paper, we propose a novel multi-scale CNN and bi-LSTM arbitration dense network model MSCBL-ADN for alleviating the impact of the insufficiency of high-quality attack examples.

The contributions of this paper are as follows: * The paper introduces a novel fixed _T_-length sliding segmentation data augmentation method to split raw multi-variate data into multiple

equal length detection blocks real time. This not only enlarges the high-quality attack examples, but also speeds up the LDDoS attack detection. * The paper proposes a MSCBL-ADN model, which

incorporates multi-scale CNN for preliminary spatial feature extraction and embedding-based bi-LSTM for time relationship extraction; which employs arbitration network for high accuracy by

redistribution of the weights of key representations and uses 2-block dense connection network to perform final classification. * The paper compares MSCBL-ADN model with state-of-the-art

models by classification accuracy and time consumption on ISCX-2016-SlowDos dataset. The experimental results demonstrate the effectiveness of the proposed method. The rest of the paper is

organized as follows: In Section “Methods”, the data augmentation method and the architecture of MSCBL-ADN model have been expressed in detail. In Section “Results”, the process of

ISCX-2016-SlowDos dataset augmentation and the experiment environment are introduced. And the performance evaluation is described in terms of the detection accuracy and the time consumption.

“Discussion” Section covers the analysis of the ablation experiments and contrast experiments. Finally, in Section “Conclusions”, the paper is concluded and suggests future directions

briefly. METHODS A novel multi-scale CNN and bi-LSTM arbitration dense network model is proposed and explained detailed in this section. As illustrated in Fig. 1, the block diagram includes

data pre-processing and MSCBL-ADN network model. In data pre-processing part, the captured packets are organized into different flows per their flow 5-tuple (source IP, source port,

protocol, destination IP, destination port). And then, the flow is segmented into multiple detection blocks by the default length _T_, named $F_x$-$SubF_{kx}$. The feature number in

$F_x$-$SubF_{kx}$ is _n_. Since the raw multi-variate data contains spatio-temporal features, the $F_x$-$SubF_{kx}$ is split into $F_x$-$SubF_{kx}$-_S_ for spatial feature set

and $F_x$-$SubF_{kx}$-_T_ for time relationship feature set, whose numbers are $n_1$ and $n_2$ separately. In MSCBL-ADN network model part, the spatial features are handled by

multi-scale CNN network with $1 \times 1$, $3 \times 3$ and $5 \times 5$ convolutional kernels. The extracted features are concatenated and flattened into a vector. Parallelly, the

bi-LSTM networks are used to learn the valid information from time delta embedding $F_x$-$SubF_{kx}$-_T_ dataset through forward and backward propagation. After that, the learnt temporal

representation vector concatenates with spatial feature vector. And then, an arbitration network is adopted to redistribute weights by multi-head attention mechanism. As shown, the critical

representations are given heavy weights. At last, 1-D dense network with two dense blocks is employed for classifying. Overall, some important methods, such as batch normalization,

_softmax_ function, cross-entropy loss and dropout technology, are used in MSCBL-ADN model for eliminating over-fitting problem. FIXED _T_-LENGTH SLIDING SEGMENTATION METHOD The MSCBL-ADN

model needs to be trained based on the limited poor LDDoS dataset, and run in real-time network. For that, the paper introduces a novel fixed _T_-length sliding segmentation method for both

data augmentation and speeding detection up. The method is proposed as illustrated in Fig. 2. It mainly has four tasks. One of tasks is to sort packets and extract features to form flows

based on communication quintuple and arrival time. Sequentially, the second task is to segment flows into fixed _T_-length sub-flows. And then, the third task is to pad the flow-agnostic

packets, once the length of the last sub-flow is less than the required length during training phase or the expected packets don’t arrive yet. Finally, the method splits multi-variate

sub-flows into spatial part $F_x$-$SubF_{kx}$-_S_ and time relationship part $F_x$-$SubF_{kx}$-_T_. As a result, the high-quality LDDoS training examples are augmented and the LDDoS

attack detection is speeded up. In task one, we find that the collected packets on dataset and in real network world are mixed up and sorted by the arrival time. The reason is that the

physical links are shared and multiplexed by flows. But the network attacks fire from fixed victims normally. This asks us to distinguish flows. Therefore, we sort packets based on

communication quintuple and arrival time. Let the number of packets be _m_, the number of flows be _z_, then $m=\sum _{i=1}^zl_i$, where $l_i$ means the length of $i^{th}$ flow. Next

in task two, the sliding window with the fixed _T_-length is used to segment LDDoS attack detection blocks on the flows. By default, the stride of sliding window is set to 1 for data

augmentation purpose. Thus, we will have a new sub-flow detection once a new packet is arrived. Take sub-flows $F_1$-$SubF_1$ as example, when new packet “Pkg #1,2,_T_+1” is arrived, the

new _T_-length detection block $F_1$-$SubF_2$ is formed. Its _T_-1 packets are same with $F_1$-$SubF_1$. By this method, the MSCBL-ADN model can detect whether LDDoS attack happens

when any new packet arrives in real network world. It is obviously that the detection speed is accelerated. As to task three, the method helps to dilate the sub-flows with zero-padding

packets, when the number of packets in the sub-flow is less than _T_. Take $F_2$-$SubF_1$ as examples, there are only a few packets collected when LDDoS attack detection is triggered,

since it is a new sub-flow. It is obviously that the count of collected packets is less than _T_. For the input requirement of MSCBL-ADN model, we must dilate the sub-flow. Take

$F_z$-$SubF_{kz}$ as examples, its last packet is agnostic. That is, the count of packages in the last segment may be less than _T_. For that, we must pad the sub-flow. When reviewing

the MSCBL-ADN model, there is a bi-LSTM network part. It is high time consumption, especially when huge feature set is fed into. But these time-independent features have almost no impact for

detection accuracy32. Therefore, in task four, we split the original multi-variate sub-flows into spatial feature sub-flows $F_x$-$SubF_{kx}$-_S_ and time relationship sub-flows

$F_x$-$SubF_{kx}$-_T_. After above operations, the raw data are reshaped into the tensor of $\sum _{k=1}^zk_i*(n_1+n_2)*T$ as shown in Eq. (1), $$\begin{aligned} k_i= {\left\{

\begin{array}{ll} l_i-T+1 &{} l_i \ge T \\ 1 &{} l_i < T \end{array}\right. }, \end{aligned}$$ (1) where $k_i$ is the number of segmentations to be trained/detected, $n_1$ and

$n_2$ are the number of spatial and temporal features separately. And we label the corresponding sub-flows per its original flow classification in training set. Obviously, the method has

the several advantages. First, it can speed the frequency of detection up. Once any new package arrives, the detection can start without waiting for all the packages on this flow. Second,

multi-variate data is split and fed into CNN and bi-LSTM paths parallelly, which save time consumption further. MULTI-SCALE CNN NETWORK LDDoS attacks have the characteristic of periodicity.

As shown in Fig. 3, the multi-scale CNN network is designed to handle the insufficient utilization of LDDoS attack packet. It adopts a multi-scale feature fusion extraction channel that

combines skip connections and feature fusion operations. Usually, this kind of periodicity can be analyzed by observing 3–5 packets at least. Take slow-body attacks as examples, they

continue to send small data packets to server with an ultra-slow speed once the connects are established. The first packets in these attacks have bigger content-length values in their HTTP

headers normally. And the sequential data packets have the almost same HTTP header, but different HTTP body. Thus, we need to analyze 3–5 packets before making a decision. The 5$\times$5

kernel is just used for finishing this kind of analysis. Another LDDoS attacks, such as slow-headers, aim to deplete CPU resources or network connection pools. They attack servers with

endless HTTP headers. These packets have same HTTP headers and can be analyzed through 1–3 packets. The $3 \times 3$ kernel is good at handing these cases. The $1 \times 1$ kernel helps

to add non-linear into the original data and ascend dimensions. This improves the presentation ability of the multi-scale CNN network. After above convolutional operations, these extracted

presentations are fused and flatted by long skip connections for prediction. Firstly, three convolutional kernels of different sizes, 1$\times$1, 3$\times$3 and 5$\times$5, are used to

extract features from detection blocks. The 1$\times$1 convolutional kernel in the first channel is used to obtain detailed features of network packets. In this channel, we use 56

1$\times$1 convolutional kernels. The formula is shown in Eq. (2), $$\begin{aligned} F_1(Y)=W_1 \times F_x\text {-}SubF_{kx}\text {-}S + B_1 \end{aligned}$$ (2) where $F_1(Y)$ is the

outputted feature map by the first channel. $W_1$ is the weights of 56 1$\times$1 convolutional kernels and $B_1$ represents the bias. Normally, LDDoS attack flows are periodic

multi-variate time series pulses. That is, the packets nearby have strong positive correlation. Thus, we use 28 3$\times$3 convolutional kernels in the second channel and 28 5$\times$5

convolutional kernels in the third channel. They are used to cope with spatial relationship among packets. The formula is shown in Eq. (3), $$\begin{aligned} \begin{aligned} {\begin{matrix}

&{}F_2(Y)=W_2 \times F_x\text {-}SubF_{kx}\text {-}S + B_2, \\ &{}F_3(Y)=W_3 \times F_x\text {-}SubF_{kx}\text {-}S + B_3 \end{matrix}} \end{aligned} \end{aligned}$$ (3) where

$F_2(Y)$ and $F_3(Y)$ are the outputted feature map of the second channel and third channel separately. $W_2$ and $W_3$ are the weights of all those 3$\times$3 and 5$\times$5

convolutional kernels. $B_2$ and $B_3$ represents the bias. And then, a _Concat_ operation is adopted to combine the different scales of feature information after the second and third

channels. At the same time, we use the long skip connections to fuse the multi-scale feature information. This ensures the completeness of LDDoS attack features from detection blocks, and

thereby improving the ability to extract useful high-frequency spatial information. The formula is shown in Eq. (4), $$\begin{aligned} \begin{aligned} {\begin{matrix}

&{}F_{23}(Y)=Concat(F_2(Y) + F_3(Y)), \\ &{}F(Y)=W \times (F_1(Y) + F_{23}(Y)) + B \end{matrix}} \end{aligned} \end{aligned}$$ (4) where $F_{23}(Y)$ is the concatenated feature map

of the second channel and third channel, the size is $(56, T, n_1)$. And the size of $F_1(Y)$ is $(56, T, n_1)$, too. Thus, we add $F_{1}(Y)$ and $F_{23}(Y)$, and get _F_(_Y_).

_W_ is the weights and _B_ is the bias in this fusion layer. At last, a _Flatten_ layer is for transforming multi-dimensional feature maps into one-dimensional arrays for next arbitration

network. TIME DELTA EMBEDDING-BASED BI-LSTM NETWORK The collected LDDoS attack sub-flows are composed of a sequence of attack packets. They are seen as time series data. Thus, they can be

coped with LSTM for time relationship information. But there may be some packets missed in detection blocks under the condition of network jam in our case. We need refer whether LDDoS

attacks happen from forwards and backwards directions simultaneously. Naturally, bi-LSTM network, which uses stacked two LSTM layers, is the effective mean as shown in Fig. 4. It can learn

long-term and short-term time relationship information of network packets from both directions. Nevertheless, LSTM-like network is time-consuming. And the consumed time mainly comes from two

aspects. The first aspect is introduced by its recurrent structure. As known, the calculation of current time step must wait the hidden variables in last time step in LSTM-like network. As

a result, the calculations in different time steps are hard to execute parallelly. Thus, we turn to the second aspect, the computation. Per our research, too many inputs mean large

computations. we propose a time-delta embedding method through encoding and fusing limited key time-related features as input to get hidden variables. These features include ‘arrival time’,

‘begin time’ and ‘time delta’ so on. The ‘arrival time’ is the key feature, which includes the chronological order of LDDoS attack. The ‘time to live’ means how many networks the packet has

passed through over time. It can be used to infer the ‘begin time’ of attacks. Most importantly, we introduce the ‘time delta’ feature between any two packets in same sub-flow. It is an

import indicator of LDDoS attack, which can indicate whether flow is similar. As the input features decrease, the computational workload greatly cuts down. The import components of bi-LSTM

are LSTM memory cell and bidirectional self-cycling mechanism. The LSTM memory cell adopts the forget gate $f_t$, input gate $i_t$ and output gate $o_t$. The $f_t$ determines what is

essential to retain from the previous state, the $i_t$ determines how to add new information to the cell state, and the $o_t$ defines what should be the next hidden state. In addition,

$\tilde{C}_t$ represents the candidate updating value. Thus, the outputs of current LSTM cell $h_t$ and $h'_t$ can be calculated through Eq. (5), $$\begin{aligned} \begin{aligned}

{\begin{matrix} &{}h_t=\sigma (W_ox_t+U_oh_{t-1}+b_o) \times tanh(\sigma (W_fx_t+U_fh_{t-1}+b_f)C_{t-1}+\sigma (W_ix_t+U_ih_{t-1}+b_i)\tilde{C}_{t-1}),\\ &{}h'_t=\sigma

(W'_ox_t+U'_oh'_{t-1}+b'_o) \times tanh(\sigma (W'_fx_t+U'_fh_{t-1}+b'_f)C'_{t-1}+\sigma

(W'_ix_t+U'_ih_{t-1}+b'_i)\tilde{C'}_{t-1}). \end{matrix}} \end{aligned} \end{aligned}$$ (5) where $x_t$ is the input at time _t_. $W's$, $U's$ and

$b's$ denote the weights and bias of $f_t$, $i_t$, $o_t$. _sigma_ and _tanh_ are activation functions, while $\times$ means dot multiplication. And then, the $og_t$ can be

gotten by Eq. (6), $$\begin{aligned} og_t=W_4h_t+W_5h'_t. \end{aligned}$$ (6) and it has bi-directional feature representations. MULTI-HEAD ATTENTION ARBITRATION NETWORK The extracted

features by multi-scale CNN and bi-LSTM are contacted into a vector. And the feature vector is fed into the arbitration network for their weights redistribution. The arbitration network

depends mainly on multi-head attention mechanism to efficiently extract information from the input feature vector, as depicted in Fig. 5. Firstly, the arbitration network projects at the

_Qs_, _Ks_, and _Vs_ _N_ times with different linear models. They are self-attended, that is, the projected _Qs_, _Ks_, and _Vs_ come from that same extracted feature vector. Then, the _Qs_,

_Ks_, and _Vs_ feed into scaled dot-product attention module in parallel, and yield result values by Eq. (7), $$\begin{aligned} Head_i=Softmax\left( \frac{QK^T}{\sqrt{d}}\right) V.

\end{aligned}$$ (7) where $Head_i$ is the result value of head _i_ ($i<N$). _d_ represents the dimensional of _Qs_ or _Ks_. At last, the result values are concatenated and projected

once again, resulting in the arbitration values by Eq. (8), $$\begin{aligned} arbitration=FC(Concat(head_1, \cdots , head_N)). \end{aligned}$$ (8) After this arbitration network, the

upstream extracted features by different network models are weighed. It helps find the important features out. And it has strong parallelism capability for speeding LDDoS detection up. 1-D

DENSE CLASSIFICATION NETWORK To further keep the key LDDoS detection feature representation between layers, the direct connections from any layer to its subsequent layers are adopted in a

feed-forward fashion. That is, the multiple output vectors produced in layers from 0 to $l-1$ are concatenated into a single tensor as described in Eq. (9), $$\begin{aligned} x_l=

Concat([x_0, x_1, \cdots , x_{l-1}]). \end{aligned}$$ (9) where $x_i$ refers to the output vector in layer _i_. There are two dense blocks in our proposal. And we add $2\times 2$ average

pooling operation between blocks, which can down-sample and shrink the size of feature vectors. In this 1-D dense network, there are many different paths between layers, which can alleviate

the vanishing-gradient problem effectively. At the same time, the convolutional kernels can be reduced since the feature is propagated well between layers. This means the number of

parameters in model substantially reduces, which can speed LDDoS detection up further. For getting the classification of LDDoS, _softmax_ activations are used in last layer. And during

MSCBL-ADN model compiling, $categorical\_crossentropy$ loss function is used to calculate cost in training set and validation set. _Adam_ optimizer is used to adjust weights and biases

through back propagation. RESULTS In this section, the dataset, experimental environment and results are given. Firstly, the public ISCX-2016-SlowDos dataset is descripted and revised for

the sequenced experiments. And then, experimental environment is deployed, hyperparameters are listed, and evaluation indicators are descripted. At last, experiments and comparable

experiments are carried out in terms of accuracy and computational time complexity. DATASET DESCRIPTION, REVISION AND PRE-PROCESSING To verify the detection accuracy and time performance of

MSCBL-ADN model, a reliable, multi-class, and small-scale balanced raw-format dataset is needed. The popular low-rate DDoS detection dataset ISCX-2016-SlowDos34, provided by Canadian

Institute for Cybersecurity, is collected from the dynamic, real and complex network testbed and stored as the raw-format data. This declares the reliability of these collected packets

without doubt. In terms of multi-class labeling, the dataset includes different kinds of DDoS packets, such as low-rate and high-volume ones, but the bad news is that the packets are all

labelled as single malign type. Even worse, there is not any benign packet captured on the dataset. In this point, the dataset without revision is not suitable for verifying multi-class

detection ability of MSCBL-ADN model. The other reason for revising the ISCX-2016-SlowDos dataset is to validate the model’s learning ability on small-scale dataset. As described in above

sections, the novel LDDoS attacks are increasingly hard to be captured. As a result, the size of the training dataset is also decreasing reasonably. To simulate this situation, we need to

perform down-sampling on the original dataset. Based on above principles and rationalities, we firstly filter RUDY, slow-body, slow-headers, and slow-read LDDoS attacks out. Taking RUDY as

an example, its target servers are “208.113.162.153” and some other listed IP’s per the official documents34. Thus, we apply the filter of “ip.src==208.113.162.153 or

ip.dst==208.113.162.153” to filter raw RUDY network packets out in WireShark tool. The same method is applied to other IP’s and other LDDoS attack type. And then, we keep hulk attack on this

dataset as high-volume DDoS attack and choose packets from 1999 DARPA dataset35 as benign ones for the purpose of multi-class detection. The training data of first week on 1999 DARPA

dataset does not contain any attacks35. And the packets are stored as raw-format data, which are consistent with the ones on the ISCX-2016-SlowDos dataset. They can be intermixed easily

later. Next, we sort the packets by communication quintuple and segment them by TCP SYN, FIN, and RESET flags into flows. At last, a random counter and a packet counter are used for

generating the revised dataset. By the random counter, a certain flow is selected. By the packet counter, the data down-sampling is performed, and the size of dataset is under control. When

we set the packet counter for every attack, a small-scale revised dataset is gotten. The detail of the revised dataset is shown in Table 1. After getting the revised dataset, the packets in

flows are pre-processed. The one-hot encoding is applied to transform the categorical features into numeric features, such as protocol type, and the max-min normalization is performed for

the purpose of preventing gradient explosion during model training. To learn patterns of flows, the flows are further split into sub-flows (samples) and labeled by the fixed _T_-length

sliding segmentation method. The detailed operation can refer to section Methods. In this way, the conventional packet-based features are formed to many fixed length two-dimensional

sub-flows, by which the model can learn LDDoS attack patterns based on current and previous $(T-1)$ packets. The size of sub-flows is with 85.71% overlap, since the default stride is set

to 1. As known, the detailed experimental results of MSCBL-ADN model are discussed on both binary-class and multi-class scenarios. In the binary-class scenario, we label both hulk and benign

samples as normal classification, and label others as LDDoS classification. In multi-class scenarios, the attacks keep their original labels. At last, the dataset is randomly partitioned

into two small sets by 80%:20% by using the train_test_split() function of the sklearn library. The 80% dataset is for training, while the 20% one is for testing. After above data

operations, we analyze about whether the data is balanced. As shown in Fig. 6, the numbers of LDDoS and normal sub-flows almost equal in the binary-class scenario. And the numbers of

different LDDoS attacks are between 3600 and 5650 in multi-class scenario, they are good for training and testing. Thus, we also down sample the benign number to 5000 during multi-class

experiments. EXPERIMENTAL SETUP AND EVALUATION INDICATORS The experiments are conducted on an Ubuntu deep learning server with an Intel(R) Xeon(R) Silver 4216 CPU, 128G RAM and 2 Nvidia

GeForce RTX 3080 graphic card. The proposed model is implemented using the Python 3.6.2 and the library Keras with TensorFlow as the backend. The key well-tuned experiment regarding the

selection of fixed length _T_ is done as Fig. 7 shown. Normally, some kinds of LDDoS attacks based on TCP three-way handshake are finished in 3 or 4 packets. The other kinds of LDDoS attacks

such as slow-read attacks may need 7 packets or so. Thus, the well-tuned range of _T_ should be from 3 to 8 theoretically. But the good news is the MSCBL-ADN model has a multi-scale CNN

network which can extract feature representation among the 5 most recently arrived packets. As a result, the range of _T_ shrinks to [6, 7, 8]. The tuning result shows that the accuracy of

model looks best when the _T_ is set to 7. The accuracy is as high as 96.74%, and its curve is above the others’. At the same time, we know the average flow size is 7 after packets sorted

per communication quintuple in Table 1. That is, the size 7 of sliding window can segment most of flows into sub-flows without padding flow-agnostic packets. This reduces the impact of

additional operations. Once _T_ is greater than or equal to 8, there could be too many flow-agnostic packets in some segmentations to be detected. In other words, even if the largest

5$\times$5 convolutional kernel is used, it cannot get any useful information. Obviously, this may lead to multi-scale CNN network in MSCBL-ADN model failure. Additionally, in the real

network, the collection time of packets increases since the model has to wait for the 8th packet before detection. Overall, the chosen and recommended _T_ is 7. And the MSCBL-ADN model

parameters and settings are shown in Table 2. To evaluate the performance of model, three evaluation metrics (accuracy, precision, and recall rate) based on confusion matrix and one CPU/GPU

elapse (time complexity) are adopted. In confusion matrix, it contains _TP_, _FN_, _FP_, _TN_ elements. _TP_ means the number of normal sub-flows classified as normal traffic, and _FN_ means

the number of abnormal sub-flows classified as abnormal traffic. _FP_ represents the number of normal sub-flows classified as abnormal ones, while _TN_ represents the number of abnormal

sub-flows classified as normal ones. According to confusion matrix, accuracy, precision, and recall rate are calculated by Eq. (10). Each experiment runs 10 times by StratifiedKFold, the

average values of above estimated metrics are used as the results. $$\begin{aligned} \begin{aligned} {\begin{matrix} &{}accuracy=\frac{TP+TN}{TP+TN+FP+FN},\\

&{}precision=\frac{TP}{TP+FP},\\ &{}recallrate=\frac{TP}{TP+FN}. \end{matrix}} \end{aligned} \end{aligned}$$ (10) EXPERIMENTS AND COMPARABLE EXPERIMENTS EXPERIMENTAL RESULTS:

CLASSIFICATION AND TIME INDICATORS Figure 8 shows that accuracy and loss curves on the revised ISCX-2016-SlowDos dataset. On the binary-class scenario, the accuracy bumps to 98.53% during

first several epochs, and then, increases smoothly and steadily to 98.99%. The loss looks good, too. It decreases gradually to 0.0366. On the multi-class scenario, the accuracy grows quickly

from 89.16 to 95.05% in its first four epochs, and increases slowly to 96.90%. At the same time, its loss curve is smooth. The final loss is 0.1108. Obviously, the performance of MSCBL-ADN

model is excellent for classifying the network LDDoS attacks. And benefiting from its multi-channel structure, it can be trained a lot of epochs. As Figure 9 shown, they are the confusion

matrix of classification of MSCBL-ADN model on 20% testing set. Figure 9a is about binary-class scenario. We can get accuracy, precision, and recall rate as 98.78%, 98.82%, and 98.93%. The

_FP_ and _FN_ are both small, which proves the model has good classification ability. Figure 9b is about multi-class scenario. the MSCBL-ADN model has a higher but bearable error rate

between RUDY and slow-body. The reason is RUDY is a kind of slow-body fired by RUDY tool. Thus, its features are similar to slow-body ones. The overall accuracy, precision and recall rate

are 96.74%, 96.77%, and 96.74%. This indicates the MSCBL-ADN model is good to finish LDDoS attack detection. The time consumption experiments are run in the mentioned Ubuntu server. The GPU

elapse is 5.831 ms every 32 sub-flows. And the time consumption is increased linearly, when we enlarge the number of detecting sub-flows. Thus, the MSCBL-ADN model can be deployed into real

network for detection task. PERFORMANCE COMPARISON EXPERIMENTS During comparison experiments, we compare the proposed MSCBL-ADN model with RF18, SVM19 and KNN20 shallow classifiers on

testing set. We also compare it with CNN-based models (CNN-BMECapSA-RF25, LRDADF27), LSTM-based models (VLSTM30), and CNN-LSTM-based models (AsyncFL-bLAM16, NIDS-CNNLSTM33) during training

and testing phase. In Fig. 10, it displays the validation accuracy and training loss of multi-class scenario during 100 epochs. We can obviously see that the loss curve of CNN-based

CNN-BMECapSA-RF25 model is not stable. It explains that the LDDoS attack is strong time relationship, which is a little hard to extract time features to CNN-based model. The LRDADF27 model

is enhanced by deep sparse autoencoder. As a result, its loss curve is smoother than other CNN-based models. LSTM-based models look better. Their accuracies all can reach to 96% above, and

losses are lower than 0.15. The CNN-LSTM-based models improve the accuracies and decrease their losses further. Per the Fig. 10, our proposed model has the best performance. Figure 11 shows

the binary-class training scenario. As known, we merge the hulk and benign samples as normal sub-flows. As a result, VLSTM30 model spends lots of epochs to learn time relationship

repressions. Until 80th epoch or so, its accuracies and losses are normal. Its first 80 training accuracies and losses don’t show in Fig. 11, since the values are out of the display range.

But to CNN-based models, their accuracy curves are graceful. It proves that sparse features are important to detect LDDoS attacks, too. Obviously, the proposed model is a little better than

other comparison models. It has an accuracy of 99.04% and a loss of 0.0057 on training set. The shallow machine learning classifiers have no ability to analyze time steps. Thus, we treat a

single sub-flow as a training/testing example. As listed in Table 3, we compare the proposed model with classic machine learning classifiers and deep learning models on testing set on

multi-class scenario. The experimental results show that the key indicators of RF18, SVM19 and KNN20 are significantly lower than the deep learning models. But their detection time is faster

than deep learning models, since they have only a few parameters for the purpose of classification. Among the deep learning models, the CNN-based models have faster detection speed, but

lower detection accuracy. On the contrary, the detection speed of LSTM-based models are slower. In CNN-LSTM-based models, such as AsyncFL-bLAM16, NIDS-CNNLSTM33, researchers enhanced LSTM

part. Their accuracy and detection time gain a little improvement. The testing results demonstrate that the performance of MSCBL-ADN model is a little better than other state-of-the-art

models, whose accuracy is as high as 96.74 % and detection time is as low as 5.831 ms each batch. DISCUSSION Per the results of experiments and comparison experiments, the proposed MSCBL-ADN

model has a high accuracy, precision, recall rate, and an acceptable detection time. This benefits from data augmentation method, multi-scale CNN network, time delta embedding-based bi-LSTM

network, and multi-head attention arbitration network. Let’s discuss them by ablation studies. The first experiment is used to verify the effectiveness of the fixed _T_-length sliding

segmentation data augmentation method. It is conducted with all modules on the original dataset and augmented dataset. As shown in Fig. 12, on the original poor data, the accuracy and loss

of MSCBL-ADN model are both impacted. Its final accuracy is lower than 95%, and it has a higher loss as 0.26. When we augment the original dataset, the training data is dilated almost by 10

times. The new augmented dataset is good for training model, since its accuracy and loss curves are both growing steadily. This experiment explains the proposed data augmentation method is

effective. The experimental results shown in Fig. 13 are about the effectiveness of multi-scale CNN network. We conduct the experiments without CNN network, with CNN network and with

multi-scale CNN network. Without CNN network, the proposed model can learn how to detect LDDoS attacks, but the accuracy is low. By using CNN network, the sparse representation can be

learned for detection. As known, some LDDoS attacks, such as slow-read attacks, need more than one packets for firing an attack. That is, we need different scale CNN kernels for learning

representations among packets. As a result, multi-scale CNN network model has the best accuracy and the smallest loss. The basic idea of third group of experiments is to verify time delta

embedding-based bi-LSTM network. We compare the accuracies and losses in the experiments without LSTM network, with LSTM network, with bi-LSTM network and with embedding-based bi-LSTM

network. The packets of LDDoS attacks have strong time positive correlation. As shown in Fig. 14, without LSTM network, the accuracy curve is volatile, and its loss curve is growing after

dozens of epochs. With LSTM network, the curves look better. By using bi-LSTM network, the model can learn from both the front and back packets. This can alleviate the bad impact of missed

packets in sub-flows, and improve the accuracy further. When we embed the time delta info between packets into bi-LSTM network, the model is enhanced. From the experimental outputs, our

proposed network is more accurate and effective. The arbitration network helps a lot, too. The representations from multi-scale CNN network and embedding-based bi-LSTM network are extracted

parallelly and independently. Thus, an arbitration network is needed to decide which representations are key items. As illustrated in Fig. 15, the performance is better with it. For saving

the detection time, benefiting from the fixed _T_-length sliding segmentation method and padding technology, when any packet comes, the LDDoS detection can start. This speeds the frequency

of detection up in real network world. In bi-LSTM network part, we only allow time related features as inputs. Thus, the number of bi-LSTM neurons can be small, which saves the scale of

training parameters and speed detection up. The MHA structure handles the inputted representations parallelly. This makes use of multi-GPU processors and saves time. By above technologies

and tricks, the MSCBL-ADN model gets LDDoS detection results in an acceptable amount of computational time. CONCLUSIONS LDDoS attacks are a kind of new worldwide network security issues.

There is only a little poor public dataset used for training AI model currently. In this paper, we have designed and verified an effective LDDoS attack detection model under the limit of

poor data, called MSCBL-ADN. It augments the dataset by a novel fixed _T_-length sliding segmentation data augmentation method. It incorporates multi-scale CNN and embedding-based bi-LSTM

for key feature extraction, and employs arbitration network for redistributing weights of key representations. The experimental results demonstrate that MSCBL-ADN model has good performance

in both binary-class and multi-class scenario. In the near future, we wish to enhance the MSCBL-ADN model with two tasks. The task is to try and replace LSTM-like network with attention

network for the purpose of speeding up detection time further. At the same time, we also find there is a high error rate between hulk and slow-body attacks per experimental results. Thus,

the other task is to analyze the detailed reason and retrain the enhanced MSCBL-ADN model. DATA AVAILABILITY The datasets analysed during the current study are available in the Canadian

Institute for Cybersecurity webpage https://www.unb.ca/cic/datasets/dos-dataset.html, and in the MIT webpage

http://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset. REFERENCES * Tang, D., Zhang, S., Yan, Y., Chen, J. & Qin, Z. Real-time detection and mitigation of

ldos attacks in the SDN using the HGB-FP algorithm. _IEEE Trans. Serv. Comput._ 15, 3471–3484. https://doi.org/10.1109/TSC.2021.3102046 (2022). Article Google Scholar * BlackNurse.

Blacknurse-it can bring you down. BlackNurse (2018). * Fortinet. Fortinet predicts highly destructive and self-learning “swarm” cyberattacks in 2018. Fortinet (2018). * Liu, Z., Yin, X.

& Hu, Y. CPSS lr-ddos detection and defense in edge computing utilizing DCNN q-learning. _IEEE Access_ 8, 42120–42130. https://doi.org/10.1109/ACCESS.2020.2976706 (2020). Article Google

Scholar * Wang, R., Ma, C. & Wu, P. An intrusion detection method based on federated learning and convolutional neural network. _Netinfo

Secur._https://doi.org/10.3969/j.issn.1671-1122.2020.04.006 (2020). Article Google Scholar * Li, B. _et al._ Deepfed: Federated deep learning for intrusion detection in industrial

cyber-physical systems. _IEEE Trans. Industr. Inf._ 17, 5615–5624. https://doi.org/10.1109/TII.2020.3023430 (2021). Article Google Scholar * Mothukuri, V. _et al._ Federated-learning-based

anomaly detection for iot security attacks. _IEEE Internet Things J._ 9, 2545–2554. https://doi.org/10.1109/JIOT.2021.3077803 (2022). Article Google Scholar * Idrissi, M. J. _et al._

Fed-anids: Federated learning for anomaly-based network intrusion detection systems. _Expert Syst. Appl._ 234, 121000. https://doi.org/10.1016/j.eswa.2023.121000 (2023). Article Google

Scholar * Wu, W. & Zhang, Y. An efficient intrusion detection method using federated transfer learning and support vector machine with privacy-preserving. _Intell. Data Anal._ 27,

1121–1141. https://doi.org/10.3233/IDA-226617 (2023). Article Google Scholar * de Carvalho Bertoli, G., Júnior, L. A. P., Saotome, O. & dos Santos, A. L. Generalizing intrusion

detection for heterogeneous networks: A stacked-unsupervised federated learning approach. _Comput. Secur._ 127, 103106. https://doi.org/10.1016/j.cose.2023.103106 (2023). Article Google

Scholar * Tang, D., Wang, S., Liu, B., Jin, W. & Zhang, J. GASF-IPP: Detection and mitigation of ldos attack in SDN. _IEEE Trans. Serv. Comput._ 16, 3373–3384.

https://doi.org/10.1109/TSC.2023.3266757 (2023). Article Google Scholar * Zhao, R., Yin, Y., Shi, Y. & Xue, Z. Intelligent intrusion detection based on federated learning aided long

short-term memory. _Phys. Commun._ 42, 101157. https://doi.org/10.1016/j.phycom.2020.101157 (2020). Article Google Scholar * Huong, T. T. _et al._ Detecting cyberattacks using anomaly

detection in industrial control systems: A federated learning approach. _Comput. Ind._ 132, 1–16. https://doi.org/10.1016/j.compind.2021.103509 (2021). Article Google Scholar * Zhang, X.,

Ma, Z., Wang, A., Mi, H. & Hang, J. Lstfcfedlear: A LSTM-FC with vertical federated learning network for fault prediction. _Wirel. Commun. Mob. Comput._ 1–10, 2021.

https://doi.org/10.1155/2021/2668761 (2021). Article Google Scholar * Wang, X., Liu, J. & Zhang, C. Network intrusion detection based on multi-domain data and ensemble-bidirectional

LSTM. _EURASIP J. Inf. Secur._ 2023, 5. https://doi.org/10.1186/s13635-023-00139-y (2023). Article Google Scholar * Liu, Z., Guo, C., Liu, D. & Yin, X. An asynchronous federated

learning arbitration model for low-rate ddos attack detection. _IEEE Access_ 11, 18448–18460. https://doi.org/10.1109/ACCESS.2023.3247512 (2023). Article Google Scholar * Rahman, S. A.,

Tout, H., Talhi, C. & Mourad, A. Internet of things intrusion detection: Centralized, on-device, or federated learning?. _IEEE Network_ 34, 310–317.

https://doi.org/10.1109/MNET.011.2000286 (2020). Article Google Scholar * Jiang, J., Wang, Q., Shi, Z., Lv, B. & Qi, B. RST-RF: A hybrid model based on rough set theory and random

forest for network intrusion detection. In Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018, Guiyang, China, March 16-19, 2018, 77–81,

https://doi.org/10.1145/3199478.3199489 (ACM, 2018). * Kaushik, R., Singh, V. & Kumari, R. Multi-class svm based network intrusion detection with attribute selection using infinite

feature selection technique. _J. Discrete Math. Sci. Cryptogr._ 24, 2137–2153. https://doi.org/10.1080/09720529.2021.2009189 (2021). Article Google Scholar * de Miranda Rios, V., Inácio,

P. R. M., Magoni, D. & Freire, M. M. Detection of reduction-of-quality ddos attacks using fuzzy logic and machine learning algorithms. _Comput. Netw._ 186, 107792.

https://doi.org/10.1016/j.comnet.2020.107792 (2021). Article Google Scholar * Tang, D., Yan, Y., Zhang, S., Chen, J. & Qin, Z. Performance and features: Mitigating the low-rate

tcp-targeted dos attack via SDN. _IEEE J. Sel. Areas Commun._ 40, 428–444. https://doi.org/10.1109/JSAC.2021.3126053 (2022). Article Google Scholar * Tang, D., Wang, X., Li, X.,

Vijayakumar, P. & Kumar, N. AKN-FGD: adaptive kohonen network based fine-grained detection of ldos attacks. _IEEE Trans. Dependable Secur. Comput._ 20, 273–287.

https://doi.org/10.1109/TDSC.2021.3131531 (2023). Article Google Scholar * Muhammad, A., Hidayatul, K., Wibawati & Lee, M. H. Support vector data description with kernel density

estimation (svdd-kde) control chart for network intrusion monitoring. Sci. Rep. 13, 1–12, https://doi.org/10.1038/s41598-023-46719-3 (2023). * Zhou, X., Liang, W., Shimizu, S., Ma, J. &

Jin, Q. Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems. _IEEE Trans. Industr. Inf._ 17, 5790–5798.

https://doi.org/10.1109/TII.2020.3047675 (2021). Article Google Scholar * Asgharzadeh, H., Ghaffari, A., Masdari, M. & Gharehchopogh, F. S. Anomaly-based intrusion detection system in

the internet of things using a convolutional neural network and multi-objective enhanced capuchin search algorithm. _J. Parallel Distrib. Comput._ 175, 1–21.

https://doi.org/10.1016/j.jpdc.2022.12.009 (2023). Article Google Scholar * Ren, K., Yuan, S., Zhang, C., Shi, Y. & Huang, Z. CANET: A hierarchical cnn-attention model for network

intrusion detection. _Comput. Commun._ 205, 170–181. https://doi.org/10.1016/j.comcom.2023.04.018 (2023). Article Google Scholar * Venkateshwarlu, V., Ranjith, D. & Raju, A. Lrdadf: An

ai enabled framework for detecting low-rate ddos attacks in cloud computing environments. In 2023 Fifth International Conference on Electrical, Computer and Communication Technologies

(ICECCT), 1–8, https://doi.org/10.1109/ICECCT56650.2023.10179834 (2023). * Salahuddin, M. A., Pourahmadi, V., Alameddine, H. A., Bari, M. F. & Boutaba, R. Chronos: Ddos attack detection

using time-based autoencoder. IEEE Transactions on Network and Service Management 1–1, https://doi.org/10.1109/TNSM.2021.3088326 (2021). * Mansour, R. F. Artificial intelligence based

optimization with deep learning model for blockchain enabled intrusion detection in cps environment. _Sci. Rep._ 12, 1–14. https://doi.org/10.1038/s41598-022-17043-z (2022). Article CAS

Google Scholar * Zhou, X., Hu, Y., Liang, W., Ma, J. & Jin, Q. Variational LSTM enhanced anomaly detection for industrial big data. _IEEE Trans. Ind. Inform._ 17, 3469–3477.

https://doi.org/10.1109/TII.2020.3022432 (2021). Article Google Scholar * Mushtaq, E., Zameer, A., Umer, M. & Abbasi, A. A. A two-stage intrusion detection system with auto-encoder and

lstms. _Appl. Soft Comput._ 121, 108768. https://doi.org/10.1016/j.asoc.2022.108768 (2022). Article Google Scholar * Liu, Z., Yu, J., Yan, B. & Wang, G. A deep 1-d CNN and

bidirectional LSTM ensemble model with arbitration mechanism for lddos attack detection. _IEEE Transact. Emerg. Top. Comput. Intell._ 6, 1396–1410. https://doi.org/10.1109/TETCI.2022.3170515

(2022). Article Google Scholar * Du, J., Yang, K., Hu, Y. & Jiang, L. NIDS-CNNLSTM: Network intrusion detection classification model based on deep learning. _IEEE Access_ 11,

24808–24821. https://doi.org/10.1109/ACCESS.2023.3254915 (2023). Article Google Scholar * Jazi, H. H., Gonzalez, H., Stakhanova, N. & Ghorbani, A. A. Detecting http-based application

layer dos attacks on web servers in the presence of sampling. _Comput. Netw._ 121, 25–36. https://doi.org/10.1016/j.comnet.2017.03.018 (2017). Article Google Scholar * MIT. Darpa intrusion

detection evaluation dataset. MIT (1999). Download references ACKNOWLEDGEMENTS This work was supported in part by the Key Technologies R &D Program of Weifang under Grant 2023GX063 and

2021GX056, in part by the Foundation for the Talents by the Shandong Vocational College of Science and Technology, in part by the Foundation for the Talents by the Weifang University of

Science and Technology under Grant KJRC2021002, in part by the Natural Science Foundation of Shandong Province under Grant ZR2021MF086, and in part by the Key R &D Program of Shandong

Province under Grant 2019GNC106034. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Shandong Provincial University Laboratory for Protected Horticulture, Weifang Key Laboratory of Blockchain

on Agricultural Vegetables, Weifang University of Science and Technology, Weifang, 262700, China Xiaochun Yin * Computer Science and Engineering College, Weifang University of Science and

Technology, Weifang, 262700, China Wei Fang & Deyong Liu * School of Information Engineering, Shandong Vocational College of Science and Technology, Weifang, 261053, China Zengguang Liu

Authors * Xiaochun Yin View author publications You can also search for this author inPubMed Google Scholar * Wei Fang View author publications You can also search for this author inPubMed

Google Scholar * Zengguang Liu View author publications You can also search for this author inPubMed Google Scholar * Deyong Liu View author publications You can also search for this author

inPubMed Google Scholar CONTRIBUTIONS X.Y. conceived the methodology and wrote the manuscript, Z.L. conceived the experiments. W.F., Z.L. and D.L. analyzed the results. All authors reviewed

the manuscript. CORRESPONDING AUTHOR Correspondence to Zengguang Liu. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION

PUBLISHER'S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. RIGHTS AND PERMISSIONS OPEN ACCESS This article

is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in

this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative

Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a

copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Yin, X., Fang, W., Liu, Z. _et al._ A novel multi-scale

CNN and Bi-LSTM arbitration dense network model for low-rate DDoS attack detection. _Sci Rep_ 14, 5111 (2024). https://doi.org/10.1038/s41598-024-55814-y Download citation * Received: 05

October 2023 * Accepted: 28 February 2024 * Published: 01 March 2024 * DOI: https://doi.org/10.1038/s41598-024-55814-y SHARE THIS ARTICLE Anyone you share the following link with will be

able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing

initiative KEYWORDS * Arbitration mechanism * Dense connection * Embedding-based bi-LSTM * LDDoS attacks * Multi-scale CNN * Network security