Machine Learning Based Device Type Classification for IoT device Continuous and Re-Authentication

Machine Learning Based Device Type
Classification for IoT device Continuous
and Re-Authentication
C
 
O
 
L
 
L
 
E
 
G
 
E  O
 
F  E
 
N
 
G
 
I
 
N
 
E
 
E
 
R
 
I
 
N
 
G
Kaustubh Gupta
Masters Thesis Defense
Academic Advisor – Dr. Nirnimesh Ghose
University of Nebraska – Lincoln
Committee Members – Dr. Byrav Ramamurthy and Dr. Lisong Xu
4/26/22
I
n
t
r
o
d
u
c
t
i
o
n
 
Explosion in number of IoT Devices [1]
 
Security vulnerabilities of IoT devices [2]
4/26/22
[1] https://iot-analytics.com/state-of-the-iot-2020-12-billion-iot-connections-surpassing-non-iot-for-the-first-time/
[2] https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=IoT
C
h
a
l
l
e
n
g
e
s
 
 
Many IoT devices today are manufactured by 
traditional home appliance manufacturers
 [1]
 
Using the 
vulnerabilities in IoT devices
, an adversary can launch multiple attacks on other
devices in the network [1]
 
Previous work does not provide a 
strict protocol 
for IoT authentication that is 
independent
of vulnerable characteristics of the network
 or a protocol that does not rely significantly on
the 
user input 
[1, 2]
 
Existing machine learning solutions 
do not classify the type of IoT device 
being
authenticated, leaving the network vulnerable to actuating attacks carried out using
vulnerable IoT devices [3]
 
There needs to be an 
efficient protocol 
that uses the least amount of time and memory for
each authentication instance
[1] Miettinen, Markus, et al. "Iot sentinel: Automated device-type identification for security enforcement in iot." 2017 IEEE 37th International Conference on Distributed Computing Systems
      (ICDCS). IEEE, 2017.
[2] Zhang, Jiansong, et al. "Proximity based IoT device authentication." IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 2017.
[3] Al-Garadi, Mohammed Ali, et al. "A survey of machine and deep learning methods for internet of things (IoT) security." IEEE Communications Surveys & Tutorials 22.3 (2020): 1646-1685.
4/26/22
3
C
o
n
t
r
i
b
u
t
i
o
n
 
We present the 
RADTEC
 (
R
e and continuous 
A
uthentication based on 
D
evice 
T
yp
E C
lassification)
protocol
The protocol leverages 
cross layer data 
(including network, data link, transport, and application,) and
Machine Learning 
(ML) techniques to classify the type of device with an 
accuracy over 95%
Different types of
 
machine learning classifiers were compared to best estimate the 
types of IoT device
and to
 
develop a stricter and more efficient method for authentication
4/26/22
4
C
o
n
t
r
i
b
u
t
i
o
n
 
 
We present a device type-based 
re and continuous authentication
 protocol
 
We address the case where vulnerable IoT devices that can be compromised by an adversary and manipulated to:
 
1.
Capture
 network traffic
2.
Poison
 the network by uploading malicious packets and
3.
Actuate
 a device to perform activities of some other type of device
 
Perform extensive experimentation by utilizing the available data for our 
ML-based device type classification 
technique
to show the performance of various classifiers based on algorithms including:
 
1.
Random Forest
2.
K-Nearest Neighbors
3.
Support Vector Machine
4.
Gradient Boosting
5.
Gaussian Naive Bayes
 
We also show that 
Random Forest Classifier
 is overall most 
accurate
 and 
efficient
 in performing accurate classifications
4/26/22
5
R
e
l
a
t
e
d
 
W
o
r
k
Existing work includes 
machine learning-based techniques
 to differentiate several types of IoT
devices from non-IoT devices or detect attacks while others include 
proximity-based solutions
Miettinen et al. [1] 
relies on MAC addresses
 to identify a new device trying to authenticate with the network,
which can be spoofed
Bremler et al. [2] focus on distinguishing between IoT and Non-IoT (NoT) devices using ML, however, their
classification model is 
only capable of classifying a device as IoT or not-IoT
Meidan et al. [3] 
whitelist devices
 that are classified as 
trustworthy
 by using machine learning, however, this
could give a 
compromised whitelisted device unrestricted access to the network
.
Al-Garadi et al. [4] present work that focuses mainly on 
detecting attacks by monitoring the behavior of the
IoT device
 but 
does not
 extend machine learning classification techniques to 
differentiate between several
types of IoT devices.
4/26/22
6
[1] Miettinen, Markus, et al. "Iot sentinel: Automated device-type identification for security enforcement in iot." 2017 IEEE 37th International Conference on Distributed Computing Systems
(ICDCS). IEEE, 2017.
[2] Bremler-Barr, Anat, Haim Levy, and Zohar Yakhini. "Iot or not: Identifying iot devices in a short time scale." NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium.
IEEE, 2020.
[3] Meidan, Yair, et al. "Detection of unauthorized IoT devices using machine learning techniques." arXiv preprint arXiv:1709.04647 (2017).
[4] Al-Garadi, Mohammed Ali, et al. "A survey of machine and deep learning methods for internet of things (IoT) security." IEEE Communications Surveys & Tutorials 22.3 (2020): 1646-1685.
R
e
l
a
t
e
d
 
W
o
r
k
Xiao et al. [5] present machine learning techniques to improve 
IoT systems spoofing resistance and detection
,
and to 
authenticate a device to protect data privacy
Identifying different attacks can end up 
utilizing many resources
The 
attack in real-life might differ
 from what the models are trained to identify
Liu et al. [6] propose ML-based classification to 
detect and identify legitimate and rogue IoT devices
It 
does not present a complete and formal authentication protocol 
that can incorporate the ML techniques
presented in order to protect the network from certain attacks in real life
Finally, Zhang et al. [7] is a 
proximity-based solution 
requires a 
significant amount of work to be performed by
the user
Furthermore, the authors only 
address initial authentication
4/26/22
7
[5] Xiao, Liang, et al. "IoT security techniques based on machine learning: How do IoT devices use AI to enhance security?." IEEE Signal Processing Magazine 35.5 (2018): 41-49.
[6] Liu, Yongxin, et al. "Machine Learning for the Detection and Identification of Internet of Things Devices: A Survey." IEEE Internet of Things Journal 9.1 (2021): 298-320.
[7] Zhang, Jiansong, et al. "Proximity based IoT device authentication." IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 2017.
S
y
s
t
e
m
 
M
o
d
e
l
 
The main components of the system are:
Legitimate devices (D)
Hub (A)
Verification Server (V )
4/26/22
8
A
d
v
e
r
s
a
r
y
 
M
o
d
e
l
 
The adversary (M) can utilize compromised knowledge to hijack a vulnerable device in the network as an
attempt to:
Poison
 the network by injecting malicious packets
Capture
 network traffic to extract sensitive data
Actuate
 a device to perform activities of some other type of device
We assume that the adversary has 
no prior knowledge of the traffic pattern
 of any compromised device
4/26/22
9
S
e
c
u
r
i
t
y
 
R
e
q
u
i
r
e
m
e
n
t
s
 
 
The security requirement of RADTEC is to 
authenticate devices 
based on the classification of
the 
device type
 
Identifying IoT device type helps prevent device 
actuation attacks
, for example the protocol
can identify an IoT Camera as compromised if it is performing tasks that a Smart Lock would
 
Help identify attacks performed by an
 adversary’s IoT device 
present in the network to
either capture traffic or inject malicious traffic into the network
 
The hub is responsible for the 
verification of the credentials input by the user 
and for the
comparison of claimed and observed device
 types based on the traffic pattern
 
The hub and the verification server can be assumed to be a single entity as a 
secured
gateway
. The secured gateway performs:
 
1.
Initial trust establishment
2.
Policy-based network access
3.
Continuous and re-authentication of devices
4/26/22
10
10
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
 
Supervised learning
 algorithms implemented in this thesis include the Random Forest Classifier,
K-Nearest Neighbors, Support Vector Machine Classifier, Gradient Boost Classifier, and Naive
Bayes Classifier
Supervised Learning
 is a sub-category of machine learning where the learning of an
algorithm is 
supervised
 which essentially means that the algorithm is 
taught by using
examples
4/26/22
11
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
 
Random Forest Classifier
 – It is based on a 
decision tre
e like structure at its core, and it can be
categorized as an 
ensemble-based
 learning method used for making classifications
4/26/22
12
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
 
K-Nearest Neighbor
 – It is a supervised machine learning algorithm, mostly used for solving
classification problems. The algorithm works by estimating the test data point in a group, based
on its 
nearest “K” number of neighbors
4/26/22
13
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
 
Gradient Boosting Classifier 
– It is a supervised machine learning algorithm based on the
ensemble technique
, which means that it utilizes the predictions made by several different
weak decision trees 
to give a 
strong final prediction
4/26/22
14
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
 
Support Vector Machine 
– It performs classification by 
finding hyperplanes 
that would
differentiate the multiple classes, and if a test data point can be placed within a certain
hyperplane
, it will share the same class with the data points in its neighborhood
4/26/22
15
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
M
o
d
e
l
s
Gaussian Naive Bayes
 – It is often used for classification jobs where the values of all
features are continuous and 
distributed in a Gaussian distribution
The algorithm is called 
naive
 because it implies that the 
presence of any feature 
is
completely independent 
of the 
existence of any other feature
It is based on the 
Bayes theorem
 which helps define the probability of the
occurrence of hypothesis A after the data B, is already given by:
4/26/22
16
R
A
D
T
E
C
 
RADTEC - 
R
e and continuous 
A
uthentication based on 
D
evice 
T
yp
E C
lassification protocol,
Performs 
device type level classification 
based on 
traffic pattern/device fingerprint
Utilizes the device type to 
perform additional verification
 during the authentication process
Device Fingerprint Generation 
– The traffic from the legitimate device (D) is collected by the hub (A)
4/26/22
17
R
A
D
T
E
C
 
 
T
h
e
 
P
r
o
t
o
c
o
l
4/26/22
18
IoT Device (D)
Hub (A)
Verifier (V)
 
Initialization
 
Limited Access
 
Device Traffic
 
Fingerprint
 
Device Type
 
Full Access / Access Denied
 
Continuous & Re-Authentication
 
Device Type Classification
 
Device Type Verification
 
Capture Fingerprint
R
A
D
T
E
C
 
 
V
e
r
i
f
i
e
r
 
4/26/22
19
S
e
c
u
r
i
t
y
 
A
n
a
l
y
s
i
s
 
 
R
A
D
T
E
C
 
 
It can 
identify
 whether a device is 
known
 or 
unknown
 by using classification
techniques, and if the 
device has been compromised
 
The protocol addresses scenarios where an adversary can:
 
1.
Exploit a vulnerable device to 
inject malicious packets
 and therefore, 
poison the
network
 
2.
Use a vulnerable IoT device to 
extract sensitive data 
even if the network packets
are encrypted
 
3.
Compromise a vulnerable IoT device and 
actuate the activities 
that would
typically be performed by a different type of device
 
RADTEC does so by keeping a constant track of 
changes in a device fingerprint
and following the 
access policies
 defined within the hub
4/26/22
20
S
e
c
u
r
i
t
y
 
A
n
a
l
y
s
i
s
 
 
C
l
a
s
s
i
f
i
c
a
t
i
o
n
 
T
e
c
h
n
i
q
u
e
 
Adversary can easily 
spoof the MAC address 
of an already authenticated device and
authenticate itself with the hub [1]
The only time we use the MAC address is to check if the device already exists in the database
The verifier will automatically be able to send confirmation to the hub that the 
MAC addresses
match
, but if the 
fingerprint does not match
, so this device must be 
compromised
This is based on the idea that when a machine learning model is trained and tested on the
same or at least similar dataset, it should provide predictions with the highest accuracy
Device is reclassified every time when it needs 
re-authentication
 irrespective of its previous
authentication with the server
4/26/22
21
[1] Sheng, Yong, et al. "Detecting 802.11 MAC layer spoofing using received signal strength." 
IEEE INFOCOM 2008-The 27th Conference on Computer
Communications
. IEEE, 2008.
D
i
s
c
u
s
s
i
o
n
 
 
 
We assume that there are three scenarios where a device needs to be authenticated when,
A 
new device 
is first introduced in the network
A 
previously authenticated device 
moves into and out of the network and requires re-authentication
A 
device that constantly remains inside the network
 requires continuous authentication
 
The protocol addresses all three scenarios by 
correlating a device MAC address to the ones previously stored in
the database
, and then providing authentication by using the classification techniques applied in this paper
 
If a 
vulnerable device is compromised 
by an adversary and the authentication credentials are stolen, it will
generally be the case that the 
adversary does not know the type of the device
 
However, 
even if the adversary can identify the type of device
, the adversary will not be able to perform any
actions using the vulnerable device since the 
classification model will notice the change in device fingerprint
,
resulting in blocking the access given to the device
4/26/22
22
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
-
 
D
a
t
a
s
e
t
 
Implementation techniques used to
classify IoT devices such as data
preprocessing involving, 
data cleaning
and splitting, standardizing features,
numerical imputation, and feature
engineering
We chose 
15 devices 
as shown in Table
6.1 and obtained relevant information
from packet capture files by extracting
important features using 
tshark
 into a
.csv 
file [1]
[1] Sivanathan, Arunan, et al. "Classifying IoT devices in smart environments using network traffic characteristics." 
IEEE Transactions on Mobile
Computing
 18.8 (2018): 1745-1759.
4/26/22
23
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
D
e
v
i
c
e
 
I
d
e
n
t
i
f
i
c
a
t
i
o
n
 
The 
device identification 
process is done by utilizing the 
device fingerprint 
collected by the hub
The process for developing the ML model is 
scalable
New categories of IoT devices can simply be trained on the combined classifier and targeted
classifiers by using device packet capture
Algorithm selection
 - Based on previous work and approaches, we use the following 
five classifiers
:
1.
Random Forest
2.
K-Nearest Neighbors
3.
Support Vector Machine
4.
Gradient Boosting
5.
Gaussian Naive Bayes
4/26/22
24
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
D
a
t
a
 
P
r
e
-
P
r
o
c
e
s
s
i
n
g
 
Data Pre-processing
1.
Data Cleaning and Splitting
Irrelevant data points were 
removed
Data was 
labelled
 and 
split
 into train and test data, 80% and 20% respectively
2.
Standardizing Features
Features were 
standardized
 by subtracting the mean from the values and then
scaling it to unit variance
3.
Numerical Imputation
Missing values were imputed to the 
median values
 of each individual feature
4/26/22
25
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
F
e
a
t
u
r
e
 
E
n
g
i
n
e
e
r
i
n
g
 
4.
Feature Engineering
We use a 
random forest search classifier 
to
extract 
feature importance scores 
from the
19 features
A 
threshold of 0.05 
was set for the
importance score of features
All features with importance scores below
the threshold were eliminated
Such features included tcp.urgent pointer,
ip.flags.mf, tcp.analysis.ack rtt, ip.proto,
tcp.time delta, ip.flags.df, tcp.flags, tcp.len,
tcp.ack, udp.port, tcp.time relative, tcp.seq
4/26/22
26
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
T
r
a
i
n
i
n
g
 
&
 
T
e
s
t
i
n
g
 
5.
Training and Testing
Training
 was performed by 
fitting the model onto the data
 to make classifications
During 
testing
, labels for test data were 
predicted
 by the trained models
Accuracy scores 
while performing testing were recorded
Time taken by the training
 
process and the classification of a single device for
each classifier was recorded (Table 6.2)
4/26/22
27
R
e
s
u
l
t
s
 
 
F
1
 
S
c
o
r
e
The 
highest average F1 score 
for all classes was provided by the 
Random Forest Classifier
 (RFC)
RFC
 is near perfect for differentiating between an IoT and a non-IoT device
4/26/22
28
R
e
s
u
l
t
s
 
 
A
c
c
u
r
a
c
y
 
S
c
o
r
e
 
After making the predictions, we established that the 
Random Forest Classifier (RFC)
 was the most
accurate model for making the predictions with an accuracy of 
95.2%
The 
second most accurate
 classifier was the 
Gradient Boost Classifier 
(GBC) with an accuracy of 
94.8%
K-Nearest Neighbors Classifier 
(KNN) accuracy – 
93.3%
Support Vector Machine Classifier 
(SVM) accuracy – 
88.3%
Gaussian Naïve Bayes Classifier 
(NAB) accuracy – 
76.8%
4/26/22
29
C
o
n
c
l
u
s
i
o
n
 
 
We proposed the 
RADTEC protocol 
as a solution to improve the security of IoT devices
 
Through 
machine learning and cryptographic techniques
, RADTEC addresses the events in which an adversary
can exploit a vulnerable device to:
 
1.
Capture
 network traffic
2.
Poison
 the network by injecting malicious packets
3.
Actuate
 a device to perform activities of some other type of device
 
We used the 
Random Forest Classifier, Gradient Boost Classifier, K-Nearest Neighbors Classifier, Support Vector
Machine Classifier and the Naive Bayes Classifier
 to identify the type of IoT device using 
cross-layer data
 
Random Forest Classifier 
was able to make the most accurate prediction with an accuracy score of 
95.2% and
the highest average F1 score
 
The time required to train the model and the average time taken to classify a single data point were recorded
for all classifiers. 
Gaussian Naive Bayes Classifier
 was found to be the 
fastest classifier for training and testing
 
However, we would 
prefer to use the Random Forest Classifier
, as it still required less time than most other
models for training and testing and 
provided the most accurate results
4/26/22
30
F
u
t
u
r
e
 
W
o
r
k
 
In our future work we plan to train classification models for 
several more types of
IoT devices
We would further 
collect more data 
for each type of device to eliminate bias in our
model, if any, to provide more accurate classifications for all the devices
considered
In addition to the machine learning classifiers used in this thesis, we would
perform device type level classification 
using several other types of class
ifiers
Finally, we would apply 
model fine-tuning
 techniques to improve the performance
of our classifiers
Thank you!
4/26/22
31
Slide Note
Embed
Share

IoT devices has led to security vulnerabilities, necessitating a robust device authentication protocol. This study introduces the RADTEC protocol, utilizing Machine Learning for accurate device type classification to enhance re-authentication security. Various ML classifiers are compared to develop a stringent and efficient authentication method.

  • IoT Security
  • Device Classification
  • Machine Learning
  • Authentication Protocol

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. 4/26/22 Machine Learning Based Device Type Classification for IoT device Continuous and Re-Authentication Kaustubh Gupta Masters Thesis Defense Academic Advisor Dr. Nirnimesh Ghose University of Nebraska Lincoln Committee Members Dr. Byrav Ramamurthy and Dr. Lisong Xu COLLEGE OF ENGINEERING

  2. 4/26/22 Introduction Introduction Explosion in number of IoT Devices [1] Security vulnerabilities of IoT devices [2] [1] https://iot-analytics.com/state-of-the-iot-2020-12-billion-iot-connections-surpassing-non-iot-for-the-first-time/ [2] https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=IoT

  3. 4/26/22 Challenges Challenges Many IoT devices today are manufactured by traditional home appliance manufacturers [1] Using the vulnerabilities in IoT devices, an adversary can launch multiple attacks on other devices in the network [1] Previous work does not provide a strict protocol for IoT authentication that is independent of vulnerable characteristics of the network or a protocol that does not rely significantly on the user input [1, 2] Existing machine learning solutions do not classify the type of IoT device being authenticated, leaving the network vulnerable to actuating attacks carried out using vulnerable IoT devices [3] There needs to be an efficient protocol that uses the least amount of time and memory for each authentication instance [1] Miettinen, Markus, et al. "Iot sentinel: Automated device-type identification for security enforcement in iot." 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2017. [2] Zhang, Jiansong, et al. "Proximity based IoT device authentication." IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 2017. [3] Al-Garadi, Mohammed Ali, et al. "A survey of machine and deep learning methods for internet of things (IoT) security." IEEE Communications Surveys & Tutorials 22.3 (2020): 1646-1685. 3

  4. 4/26/22 Contribution Contribution We present the RADTEC (Re and continuous Authentication based on Device TypE Classification) protocol The protocol leverages cross layer data (including network, data link, transport, and application,) and Machine Learning (ML) techniques to classify the type of device with an accuracy over 95% Different types ofmachine learning classifiers were compared to best estimate the types of IoT device and todevelop a stricter and more efficient method for authentication 4

  5. 4/26/22 Contribution Contribution We present a device type-based re and continuous authentication protocol We address the case where vulnerable IoT devices that can be compromised by an adversary and manipulated to: 1. 2. 3. Capture network traffic Poison the network by uploading malicious packets and Actuate a device to perform activities of some other type of device Perform extensive experimentation by utilizing the available data for our ML-based device type classification technique to show the performance of various classifiers based on algorithms including: 1. 2. 3. 4. 5. Random Forest K-Nearest Neighbors Support Vector Machine Gradient Boosting Gaussian Naive Bayes We also show that Random Forest Classifier is overall most accurate and efficient in performing accurate classifications 5

  6. 4/26/22 Related Work Related Work Existing work includes machine learning-based techniques to differentiate several types of IoT devices from non-IoT devices or detect attacks while others include proximity-based solutions Miettinen et al. [1] relies on MAC addresses to identify a new device trying to authenticate with the network, which can be spoofed Bremler et al. [2] focus on distinguishing between IoT and Non-IoT (NoT) devices using ML, however, their classification model is only capable of classifying a device as IoT or not-IoT Meidan et al. [3] whitelist devices that are classified as trustworthy by using machine learning, however, this could give a compromised whitelisted device unrestricted access to the network. Al-Garadi et al. [4] present work that focuses mainly on detecting attacks by monitoring the behavior of the IoT device but does not extend machine learning classification techniques to differentiate between several types of IoT devices. [1] Miettinen, Markus, et al. "Iot sentinel: Automated device-type identification for security enforcement in iot." 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2017. [2] Bremler-Barr, Anat, Haim Levy, and Zohar Yakhini. "Iot or not: Identifying iot devices in a short time scale." NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium. IEEE, 2020. [3] Meidan, Yair, et al. "Detection of unauthorized IoT devices using machine learning techniques." arXiv preprint arXiv:1709.04647 (2017). [4] Al-Garadi, Mohammed Ali, et al. "A survey of machine and deep learning methods for internet of things (IoT) security." IEEE Communications Surveys & Tutorials 22.3 (2020): 1646-1685. 6

  7. 4/26/22 Related Work Related Work Xiao et al. [5] present machine learning techniques to improve IoT systems spoofing resistance and detection, and to authenticate a device to protect data privacy Identifying different attacks can end up utilizing many resources The attack in real-life might differ from what the models are trained to identify Liu et al. [6] propose ML-based classification to detect and identify legitimate and rogue IoT devices It does not present a complete and formal authentication protocol that can incorporate the ML techniques presented in order to protect the network from certain attacks in real life Finally, Zhang et al. [7] is a proximity-based solution requires a significant amount of work to be performed by the user Furthermore, the authors only address initial authentication 7 [5] Xiao, Liang, et al. "IoT security techniques based on machine learning: How do IoT devices use AI to enhance security?." IEEE Signal Processing Magazine 35.5 (2018): 41-49. [6] Liu, Yongxin, et al. "Machine Learning for the Detection and Identification of Internet of Things Devices: A Survey." IEEE Internet of Things Journal 9.1 (2021): 298-320. [7] Zhang, Jiansong, et al. "Proximity based IoT device authentication." IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 2017.

  8. 4/26/22 System Model System Model The main components of the system are: Legitimate devices (D) Hub (A) Verification Server (V ) 8

  9. 4/26/22 Adversary Model Adversary Model The adversary (M) can utilize compromised knowledge to hijack a vulnerable device in the network as an attempt to: Poison the network by injecting malicious packets Capture network traffic to extract sensitive data Actuate a device to perform activities of some other type of device We assume that the adversary has no prior knowledge of the traffic pattern of any compromised device 9

  10. 4/26/22 Security Requirements Security Requirements The security requirement of RADTEC is to authenticate devices based on the classification of the device type Identifying IoT device type helps prevent device actuation attacks, for example the protocol can identify an IoT Camera as compromised if it is performing tasks that a Smart Lock would Help identify attacks performed by an adversary s IoT device present in the network to either capture traffic or inject malicious traffic into the network The hub is responsible for the verification of the credentials input by the user and for the comparison of claimed and observed device types based on the traffic pattern The hub and the verification server can be assumed to be a single entity as a secured gateway. The secured gateway performs: 1. 2. 3. Initial trust establishment Policy-based network access Continuous and re-authentication of devices 1 0

  11. 4/26/22 Machine Learning Models Machine Learning Models Supervised learning algorithms implemented in this thesis include the Random Forest Classifier, K-Nearest Neighbors, Support Vector Machine Classifier, Gradient Boost Classifier, and Naive Bayes Classifier Supervised Learning is a sub-category of machine learning where the learning of an algorithm is supervised which essentially means that the algorithm is taught by using examples 11

  12. 4/26/22 Machine Learning Models Machine Learning Models Random Forest Classifier It is based on a decision tree like structure at its core, and it can be categorized as an ensemble-based learning method used for making classifications 12

  13. 4/26/22 Machine Learning Models Machine Learning Models K-Nearest Neighbor It is a supervised machine learning algorithm, mostly used for solving classification problems. The algorithm works by estimating the test data point in a group, based on its nearest K number of neighbors 13

  14. 4/26/22 Machine Learning Models Machine Learning Models Gradient Boosting Classifier It is a supervised machine learning algorithm based on the ensemble technique, which means that it utilizes the predictions made by several different weak decision trees to give a strong final prediction 14

  15. 4/26/22 Machine Learning Models Machine Learning Models Support Vector Machine It performs classification by finding hyperplanes that would differentiate the multiple classes, and if a test data point can be placed within a certain hyperplane, it will share the same class with the data points in its neighborhood 15

  16. 4/26/22 Machine Learning Models Machine Learning Models Gaussian Naive Bayes It is often used for classification jobs where the values of all features are continuous and distributed in a Gaussian distribution The algorithm is called naive because it implies that the presence of any feature is completely independent of the existence of any other feature It is based on the Bayes theorem which helps define the probability of the occurrence of hypothesis A after the data B, is already given by: 16

  17. 4/26/22 RADTEC RADTEC RADTEC - Re and continuous Authentication based on Device TypE Classification protocol, Performs device type level classification based on traffic pattern/device fingerprint Utilizes the device type to perform additional verification during the authentication process Device Fingerprint Generation The traffic from the legitimate device (D) is collected by the hub (A) 17

  18. 4/26/22 RADTEC RADTEC The Protocol The Protocol Verifier (V) Hub (A) IoT Device (D) Capture Fingerprint Device Type Classification Device Type Verification 18

  19. 4/26/22 RADTEC RADTEC Verifier Verifier 19

  20. 4/26/22 Security Analysis Security Analysis RADTEC RADTEC It can identify whether a device is known or unknown by using classification techniques, and if the device has been compromised The protocol addresses scenarios where an adversary can: 1. Exploit a vulnerable device to inject malicious packets and therefore, poison the network 2. Use a vulnerable IoT device to extract sensitive data even if the network packets are encrypted 3. Compromise a vulnerable IoT device and actuate the activities that would typically be performed by a different type of device RADTEC does so by keeping a constant track of changes in a device fingerprint and following the access policies defined within the hub 20

  21. 4/26/22 Security Analysis Security Analysis Classification Technique Classification Technique Adversary can easily spoof the MAC address of an already authenticated device and authenticate itself with the hub [1] The only time we use the MAC address is to check if the device already exists in the database The verifier will automatically be able to send confirmation to the hub that the MAC addresses match, but if the fingerprint does not match, so this device must be compromised This is based on the idea that when a machine learning model is trained and tested on the same or at least similar dataset, it should provide predictions with the highest accuracy Device is reclassified every time when it needs re-authentication irrespective of its previous authentication with the server [1] Sheng, Yong, et al. "Detecting 802.11 MAC layer spoofing using received signal strength." IEEE INFOCOM 2008-The 27th Conference on Computer Communications. IEEE, 2008. 21

  22. 4/26/22 Discussion Discussion We assume that there are three scenarios where a device needs to be authenticated when, A new device is first introduced in the network A previously authenticated device moves into and out of the network and requires re-authentication A device that constantly remains inside the network requires continuous authentication The protocol addresses all three scenarios by correlating a device MAC address to the ones previously stored in the database, and then providing authentication by using the classification techniques applied in this paper If a vulnerable device is compromised by an adversary and the authentication credentials are stolen, it will generally be the case that the adversary does not know the type of the device However, even if the adversary can identify the type of device, the adversary will not be able to perform any actions using the vulnerable device since the classification model will notice the change in device fingerprint, resulting in blocking the access given to the device 22

  23. 4/26/22 Implementation Implementation - - Dataset Dataset Implementation techniques used to classify IoT devices such as data preprocessing involving, data cleaning and splitting, standardizing features, numerical imputation, and feature engineering We chose 15 devices as shown in Table 6.1 and obtained relevant information from packet capture files by extracting important features using tshark into a .csv file [1] 23 [1] Sivanathan, Arunan, et al. "Classifying IoT devices in smart environments using network traffic characteristics." IEEE Transactions on Mobile Computing 18.8 (2018): 1745-1759.

  24. 4/26/22 Implementation Implementation Device Identification Device Identification The device identification process is done by utilizing the device fingerprint collected by the hub The process for developing the ML model is scalable New categories of IoT devices can simply be trained on the combined classifier and targeted classifiers by using device packet capture Algorithm selection - Based on previous work and approaches, we use the following five classifiers: 1. Random Forest 2. K-Nearest Neighbors 3. Support Vector Machine 4. Gradient Boosting 5. Gaussian Naive Bayes 24

  25. 4/26/22 Implementation Implementation Data Pre Data Pre- -Processing Processing Data Pre-processing 1. Data Cleaning and Splitting Irrelevant data points were removed Data was labelled and split into train and test data, 80% and 20% respectively 2. Standardizing Features Features were standardized by subtracting the mean from the values and then scaling it to unit variance 3. Numerical Imputation Missing values were imputed to the median values of each individual feature 25

  26. 4/26/22 Implementation Implementation Feature Engineering Feature Engineering 4. Feature Engineering We use a random forest search classifier to extract feature importance scores from the 19 features A threshold of 0.05 was set for the importance score of features All features with importance scores below the threshold were eliminated Such features included tcp.urgent pointer, ip.flags.mf, tcp.analysis.ack rtt, ip.proto, tcp.time delta, ip.flags.df, tcp.flags, tcp.len, tcp.ack, udp.port, tcp.time relative, tcp.seq 26

  27. 4/26/22 Implementation Implementation Training & Testing Training & Testing 5. Training and Testing Training was performed by fitting the model onto the data to make classifications During testing, labels for test data were predicted by the trained models Accuracy scores while performing testing were recorded Time taken by the trainingprocess and the classification of a single device for each classifier was recorded (Table 6.2) 27

  28. 4/26/22 Results Results F1 Score F1 Score The highest average F1 score for all classes was provided by the Random Forest Classifier (RFC) RFC is near perfect for differentiating between an IoT and a non-IoT device 28

  29. 4/26/22 Results Results Accuracy Score Accuracy Score After making the predictions, we established that the Random Forest Classifier (RFC) was the most accurate model for making the predictions with an accuracy of 95.2% The second most accurate classifier was the Gradient Boost Classifier (GBC) with an accuracy of 94.8% K-Nearest Neighbors Classifier (KNN) accuracy 93.3% Support Vector Machine Classifier (SVM) accuracy 88.3% Gaussian Na ve Bayes Classifier (NAB) accuracy 76.8% 29

  30. 4/26/22 Conclusion Conclusion We proposed the RADTEC protocol as a solution to improve the security of IoT devices Through machine learning and cryptographic techniques, RADTEC addresses the events in which an adversary can exploit a vulnerable device to: 1. Capture network traffic 2. Poison the network by injecting malicious packets 3. Actuate a device to perform activities of some other type of device We used the Random Forest Classifier, Gradient Boost Classifier, K-Nearest Neighbors Classifier, Support Vector Machine Classifier and the Naive Bayes Classifier to identify the type of IoT device using cross-layer data Random Forest Classifier was able to make the most accurate prediction with an accuracy score of 95.2% and the highest average F1 score The time required to train the model and the average time taken to classify a single data point were recorded for all classifiers. Gaussian Naive Bayes Classifier was found to be the fastest classifier for training and testing However, we would prefer to use the Random Forest Classifier, as it still required less time than most other models for training and testing and provided the most accurate results 30

  31. 4/26/22 Future Work Future Work In our future work we plan to train classification models for several more types of IoT devices We would further collect more data for each type of device to eliminate bias in our model, if any, to provide more accurate classifications for all the devices considered In addition to the machine learning classifiers used in this thesis, we would perform device type level classification using several other types of classifiers Finally, we would apply model fine-tuning techniques to improve the performance of our classifiers Thank you! 31

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#