Category |
Coursework |
Subject |
Computer Science |
University |
London Metropolitan University |
Module Title |
CC5067NT Smart Data Discovery |
Coursework Type |
Individual |
Academic Year |
2025 |
Submission Instructions
Submit the following to Itahari International College's MST Portal before 01:00 PM on the due date:
- A report (document) in .pdf format in the MST Portal or through any medium that the Module Leader specifies.
- Associate a Python program with a ZIP file
Plagiarism
You are reminded that there exist regulations concerning plagiarism. Extracts from these regulations are printed overleaf. Please sign below to say that you have read and understood these extracts:
Extracts from University Regulations on Cheating, Plagiarism, and Collusion
Section 2.3: "The following broad types of offence can be identified and are provided as indicative examples ....
- Cheating: including taking unauthorised material into an examination; consulting unauthorised material outside the examination hall during the examination; obtaining an unseen examination paper in advance of the examination; copying from another examinee; using an unauthorised calculator during the examination or storing unauthorised material in the memory of a programmable calculator which is taken into the examination; copying coursework.
- Falsifying data in experimental results.
- Personation, where a substitute takes an examination or test on behalf of the candidate. Both the candidate and substitute may be guilty of an offence under these Regulations.
- Bribery or attempted bribery of a person is thought to have some influence on the candidate's assessment.
- Collusion to present joint work as the work solely of one individual.
- Plagiarism, where the work or ideas of another are presented as the candidate's own.
- Other conduct calculated to secure an advantage on assessment.
- Assisting in any of the above.
Some notes on what this means for students:
- Copying another student's work is an offence, whether from a copy on paper or a computer file, and in whatever form the intellectual property being copied takes, including text, mathematical notation, and computer programs.
- Taking extracts from published sources without attribution is an offence. To quote ideas, sometimes using extracts, is generally to be encouraged. Quoting ideas is achieved by stating an author's argument and attributing it, perhaps by quoting, immediately in the text, his or her name and year of publication, e.g. "e = mc2 (Einstein 1905)". A reference section at the end of your work should then list all such references in alphabetical order of authors' surnames. (There are variations on this referencing system which your tutors may prefer you to use.) If you wish to quote a paragraph or so from published work, then indent the quotation on both left and right margins, using an italic font where practicable, and introduce the quotation with attribution.
Coursework Assignment
The coursework is an individual assessment, weighted 60% of the marks for the module. It is primarily an exercise in applying programming knowledge and skills to data analysis tasks, demonstrating your skills for problem-solving and critical thinking/evaluation. This assignment involves the Customer Service Requests analysis. You are expected to write Python programs and a technical report on data understanding, preparation, exploration, and initial analysis.
Data Set Description
The data contains information about various factors that can influence Customer Service requests. The objective of this analysis is to perform a service request for New York City 311 calls. You will focus on the data wrangling techniques to understand the pattern in the data and also visualise the major complaint types. Domain: Customer Service
The primary objective of your work is to prepare data for further data mining and analysis.
Requirements Specifications
1. Data Understanding
- To understand what your data resources are and the characteristics of those resources. Write down your findings. [10 Marks]
2. Data Preparation
- Import the dataset [5 marks]
- Provide your insight on the information and details that the provided dataset carries. [5 marks]
- Convert the columns "Created Date" and "Closed Date" to the datetime datatype and create a new column "Request_Closing_Time" as the time elapsed between request creation and request closing [10 Marks]
- Write a Python program to drop the irrelevant Columns which are listed below.
['Agency Name',' Incident Address',' Street Name',' Cross Street 1',' Cross Street 2',' Intersection Street 1', 'Intersection Street 2',' Address Type',' Park Facility Name',' Park Borough',' School Name', 'School Number',' School Region',' School Code',' School Phone Number',' School Address',' School City'
'School State',' School Zip',' School Not Found',' School or Citywide Complaint',' Vehicle Type', 'Taxi Company Borough',' Taxi Pick Up location','Bridge Highway Name','Bridge Highway Direction', 'Road Ramp','Bridge Highway Segment',' Garage Lot Name',' Ferry Direction',' Ferry Terminal Name','Landmark', 'X Coordinate (State Plane)',' Y Coordinate (State Plane)',' Due Date',' Resolution Action Updated Date',' Community Board',' Facility Type', 'Location'] [5 Marks]
- Write a Python program to remove the NaN missing values from the updated dataframe. [5 Marks]
- Write a Python program to see the unique values from all the columns in the dataframe. [5 Marks]
3. Data Analysis
- Write a Python program to show summary statistics of sum, mean, standard deviation, skewness, and kurtosis of the data frame. [5 Marks]
- Write a Python program to calculate and show the correlation of all variables. [5 Marks]
4. Data Exploration
- Provide four major insights through visualisation that you come up with after data mining. [10 Marks]
- Arrange the complaint types according to their average 'Request_Closing_Time', categorised by various locations. Illustrate it through a graph as well. [10 Marks]
Milestone 1 (Week 7)
1. Data Understanding
- To understand what your data resources are and the characteristics of those resources. Write down your findings. [10 Marks]
2. Data Preparation
- Import the dataset [5 marks]
- Provide your insight on the information and details that the provided dataset carries. [5 marks]
- Convert the columns "Created Date" and "Closed Date" to the datetime datatype and create a new column "Request_Closing_Time" as the time elapsed between request creation and request closing [10 Marks]
- Write a Python program to drop the irrelevant Columns which are listed below.
['Agency Name',' Incident Address','Street Name',' Cross Street 1',' Cross Street 2',' Intersection Street 1', 'Intersection Street 2',' Address Type',' Park Facility Name','Park Borough','School Name', 'School Number','School Region','School Code','School Phone Number','School Address','School City', 'School State','School Zip','School Not Found','School or Citywide Complaint',' Vehicle Type', 'Taxi Company Borough',' Taxi Pick Up location','Bridge Highway Name','Bridge Highway Direction', 'Road Ramp',' Bridge Highway Segment',' Garage Lot Name',' Ferry Direction',' Ferry Terminal Name','Landmark', 'X Coordinate (State Plane)',' Y Coordinate (State Plane)',' Due Date',' Resolution Action Updated Date',' Community Board',' Facility Type', 'Location'] [5 Marks]
- Write a Python program to remove the NaN missing values from the updated dataframe. [5 Marks]
- Write a Python program to see the unique values from all the columns in the dataframe. [5 Marks]
3. Data Analysis
- Write a Python program to show summary statistics of sum, mean, standard deviation, skewness, and kurtosis of the data frame. [5 Marks]
- Write a Python program to calculate and show the correlation of all variables. [5 Marks]
Milestone 2 (Week 10)
7. Data Exploration
- Provide four major insights through visualisation that you come up with after data mining. [10 Marks]
- Arrange the complaint types according to their average 'Request_Closing_Time', categorised by various locations. Illustrate it through a graph as well. [10 Marks]
8. Statistical Testing
Test 1: Whether the Average Response Time Across Complaint Types is Similar or Not.
- State the Null Hypothesis (H0) and Alternate Hypothesis (H1).
- Perform the statistical test and provide the p-value.
- Interpret the results to accept or reject the Null Hypothesis. [10 Marks]
Test 2: Whether the Type of Complaint or Service Requested and the Location are Related.
- State the Null Hypothesis (H0) and Alternate Hypothesis (H1).
- Perform the statistical test and provide the p-value.
- Interpret the results to accept or reject the Null Hypothesis. [10 Marks]
9. Document Organisation
- Report Structure [5 Marks]