Category | Assignment | Subject | Computer Science |
---|---|---|---|
University | Swinburne University of Technology | Module Title | COS10022 Data Science Principles |
Assessment Title | Predictive Model Creation and Evaluation |
---|---|
Academic Year | 2025 |
One (1) piece of a written report no more than 10 pages long, with the signed Assignment Cover Sheet.
The submitted report must be checked by Turnitin, and the similarity from not the template part should be less than 12%.
The submitted report should answer all questions listed in the assignment task section in sequence.
You must include a digitally signed Assignment Cover Sheet with your submission.
This assignment aims to evaluate students' achievement of the following unit learning outcomes:
1. Explain the key concepts, techniques, and tools for handling the data and creating prediction models.
2. Work on feature and model selection and implementation in a data science project.
This is an individual assignment that requires peer review and communication with colleagues. Refer to the Unit Outline for the late submission penalty policy. You can ignore the high similarity on the cover page and the template wording, but not in your report content. You must ensure your submitted report has a similarity lower than 12% in total and less than 6% from a single source. Otherwise, your report will not be marked.
You are asked to divide the dataset and then utilise the linear and logistic regressions to build two models in the KNIME analytic platform.
The dataset contains 150 tuples of 7 commonly seen fish species in the market. There are 6 attributes included in the source data. We have two goals in this assignment: the first goal is building a linear regression model for predicting the weight of the fish, e.g., the value in the "Weight_of_Fish_in_Gram" attribute; the second goal is building a logistic regression model for predicting the species of the fish. You are expected to follow the instructions for building your predictive model and answer questions.
This assignment aims to build experiences for students to select independent attributes, split the data into training and test sets, train a usable predictive model, and explain the outputs. A small part of the discovery and research component is included in the assignment to expand the students' skill set.
The dataset has been cleaned and organised with no missing data. Your tasks are to select the proper attributes and to create the predictive models according to the instructions for answering the questions listed below. The source file i"Fish_Species_2024.csv".The report should be prepared with the template and answer the questions, followed by finding the required information, splitting the dataset, model training and testing. A table of contents is not required.
You must follow the instructions to split the given data set into training and test sets. Remember, a well-split dataset is the foundation of support for the model training and testing. You are required to use a Shuffle node with 9214 as the seed value to shuffle the input data. Moreover, you need to partition 80% of the input data in the training set by the "draw randomly" method with 9214as the seed value.
The data source contains many details of the record. Our goal is to build a predictive model for predicting the weight of the fish. The weight of the fish is recorded in the attribute "Weight_of_Fish_in_Gram" in the given file. Your mission is to create a linear regression model in KNIME and visualise the prediction result.
Using the same source file, we aim to build a predictive model for classifying the input fish into the corresponding species.
Creating a linear regression model is simple. How to improve the accuracy of the prediction result requires a bit more effort. Let's focus on a single species of fish - the Perch. If you are limited to selecting three (3) attributes as the input for your linear regression model only, find a way to decide which attributes should be included. Note that when building the linear regression model, you must ensure that tuples in the new training and test sets are a full subset of the original training and test sets.
### Important Note ### You must use the seed value specified in the instructions. Otherwise, you will get different results from the correct answer in almost all questions.
There are 100 marks on this assignment. Your proposal must address the following tasks.
1. Follow the instructions above to split the source data into training and test sets. Answer the following questions after splitting the data. [10 marks in total]
1) Submit the workflow of Assignment 1 via Assignment 1.1.[2.5 marks]
2) How many tuples are included in the training set? [2.5 marks]
3) How many species are included in the test set? [2.5 marks]
4) Do species "Whitefish" and "Smelt" have the same number of tuples included in the test set? [2.5 marks]
Submit Your Assignment Questions & Get Plagiarism-Free Answers
Order Non-Plagiarised Assignment
Get expert assignment help for COS10022 Data Science Principles! We specialise in offering high-quality computer science assignment help, with an option for students to pay our experts to take on their assignment challenges. Need a reference? We also provide a free list of assignment examples to help you get started. With years of experience, our writers deliver 100% plagiarism-free content and offer unlimited revisions to meet your needs. Trust us to help you excel in your studies!
Let's Book Your Work with Our Expert and Get High-Quality Content