辅导案例-CSCM38

CSCM38 Advanced Topics: Artificial Intelligence and Cyber Security by Dr. Jingjing Deng Released on 2nd March 2020 The assignment consists of multiple tasks that are designed to be completed during the lab sessions and signed off by either module instructor or teaching assistant. If you are not able to complete the tasks during the lab sessions, then you should do them at home and have them ready to be marked off in the lab session by the deadline. All lab tasks also must be uploaded to Blackboard before the deadline stated on each assignment sheet. If there is any report or dissertation, it must be written and submitted in PDF format. Source codes must be organised and formatted neatly, sufficient and clear comments are very welcome and necessary for markers to assess your work. Submissions and feedback will be done via Blackboard-Tunitin system. Plagiarism will not be tolerated. Zip all your files with the following naming convention for submission: • [Student Number]-[Last Name][First Initial]-[Assignment][Number].zip • For example: 123456-DengJ-Assignment1.zip CSCM38 Assignment 2 Complete by 20/03/2020 Background: The dataset to be audited was provided which consists of a wide variety of intrusions simulated in a military network environment. It created an environment to acquire raw TCP/IP dump data for a network by simulating a typical US Air Force LAN. The LAN was focused like a real environment and blasted with multiple attacks. A connection is a sequence of TCP packets starting and ending at some time duration between which data flows to and from a source IP address to a target IP address under some well-defined protocol. Also, each connection is labelled as either normal or as an attack with exactly one specific attack type. Each connection record consists of about 100 bytes. For each TCP/IP connection, 41 quantitative and qualitative features are obtained from normal and attack data (3 qualitative and 38 quantitative features). The class variable has two categories: Normal or Anomalous.1 A copy of the dataset can be downloaded from Blackboard. Task 2.1 – AI-Driven Network Intrusion Detection (15 marks) This assignment is to construct an artificial intelligence model using machine learning approach to detect the anomalous network flows. The dataset consists of a training and a testing data files in CSV format, and the problem can be formulated as a supervised classification problem given the network measurements (all columns except the last one) and the label of intrusion 1Anonymised reference for module assignment purposed. 1 (the last column). The following steps are required to be completed using Python 3 program- ming language in Jupyter Notebook format. Machine learning package Scikit-Learn2, data manipulation package Pandas3 and visualisation packages matplotlib4 and ggplot2 5 can be used in this assignment. • Load the Dataset (2 marks): Read the training and the testing data from two CSV files into your programme. Hints: You can use either Python built-in CSV reader6 or Pandas CSV reader7 to achieve this. Be aware of that some attributes are categorical variables in String format, where you might need to convert them into numerical variables. • Visualise the Features (3 marks): Selection a few attributes to visually assess the differences between normal and anomalous categories. Hints: For the same attribute, you can plot the frequencies or distributions of two different categories using bar chart or histogram. • Train a Machine Learning Model (6 marks): To predict the network flow, train a machine learning model using training data in a supervised fashion. Hints: A few algorithms can be used for prediction, such as nearest neighbours, linear discriminant analysis, random forests, Naive bayes classifier and neural networks.8 Normally, you need to pre-process the data before applying the learning step, which may includes data normalisation, feature extraction and selection. • Test the Model (4 marks): Evaluate your model on testing data and calculate the prediction accuracy. Hints: For detection problem, you can quantitatively compute the confusion matrix 9 given the prediction from your model and the label provided in the data. In addition, you can plot the ROC (Receiver Operating Characteristic)10 to illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. 2https://scikit-learn.org/stable/ 3https://pandas.pydata.org/ 4https://matplotlib.org/ 5https://ggplot2.tidyverse.org/ 6https://docs.python.org/3/library/csv.html 7https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html 8https://scikit-learn.org/stable/user_guide.html 9https://en.wikipedia.org/wiki/Confusion_matrix 10https://en.wikipedia.org/wiki/Receiver_operating_characteristic 2

辅导案例-CSCM38

Related

Previous Post辅导案例-ECS 170

Next Post辅导案例-L3

Author admin