771762 Assignment: Big Data and Data Mining Project Report 2025

Published: 17 Jun, 2025
Category Assignment Subject Computer Science
University ___ Module Title 771762 Big Data and Data Mining
Count Words: 2000 words
Assignment Format: Report
Due date: Thursday 28th of August 2025

771762 Context

Unlike in our prior module, Fundamentals of Data Science, this assignment is based on real world data from two sources for two separate tasks: Firstly, we would like you to focus on road traffic accidents in 2019. You will have already encountered this database for the presentation assignment. Secondly, we will provide you with social network data from the Stanford Network Analysis Platform (SNAP), which was obtained via Facebook. This assignment is a chance to test your skills against these real-world data, by analysing it, understanding and interpreting the results of that analysis in order to produce meaningful conclusions based upon what has been taught in this module.

Project Report Background Information.

The project report is split into two parts, each with a set of tasks for you to complete:

  1. The first part deals with real accident data from 2019 in the ‘accident_data_v1.0.0_2023.db’ database for you to collate, process and analyse with an interpretation and conclusion of your findings by outlining recommendations for policy changes or interventions that should be made by the UK Government.
  2. The second part deals with real Facebook Data of edges from a series of egonets as produced by SNAP and this is compiled into ‘facebook_combined.txt’. You’ll explore the structure and properties of a social network using this dataset. The objective here will be to analyse the network's structure, identify communities and provide insights from a network science context, rather than to generate policy recommendations like in part (a).

Again, in this assignment we will be using the data from 2019 as it represents a very complete sample with a lot of ancillary data available. We have uploaded the relevant data to Canvas here.

Are You Looking for Answer of 771762 Project Report

Order Non Plagiarized Assignment

771762 Tasks

(a) - Report on Accidents

Imagine that you are a data scientist working for the Department for Transport (DfT) confronted with the accident database detailed earlier.

Your task is to advise the DfT (and any other relevant UK Gov department) on the policy changes/interventions required to improve road safety, as well as to create a model that would predict such accidents and the injuries that people may incur.

Main Objective: You will write a formal report and detail your analysis with results (and visualisations) alongside with concluding remarks on the recommendations to changes in policy/interventions either locally or nationally depending on what the data inform us.

Alongside the main objective, you should also aim to address these questions (at minimum) for the project report using accidents from 2019 only (although, there may be occasional use of ‘historical data’ present in the dataset):

  1. Are there any particular hours of the day, and days of the week, on which these accidents are more likely occur to a significant degree? If there is, using your data analysis, what possible reasons could help explain such a pattern?
  2. For motorbikes, are there any particular hours of the day and days of the week, on which these accidents are more likely occur to a significant degree? We suggest a focus on comparisons between Motorcycles of 125cc and under; Motorcycles over 125cc and up to 500cc; Motorcycles over 500cc. If there is, using your data analysis, what reasons would there be for a category of motorcycle to have more accidents than others for certain days of week and times of day?
  3. For pedestrians involved in accidents, are there any particular hours of the day, and days of the week, on which they are more likely to be involved in said accidents to a significant degree? If there is, using your data analysis, what could explain why you see the patterns you observe?
  4. Using the apriori algorithm, explore the impact of selected variables on the accident severity.
  5. Identify accidents in our region: Kingston upon Hull, Humberside, and the East Riding of Yorkshire ONLY. You can do this by filtering on the LSOA, or police region or another method if you can find one. Run clustering algorithm methods on this data and analyse. What do these clusters reveal about the distribution of the accidents across our region?
  6. Choose three policing areas by filtering the data using the "police_force" column, then create a separate time series model for each policing area chosen to predict weekly accident counts for 2019 based on historical data from 2017 to 2018. How do these predictions compare for each of the chosen policing areas with the actual 2019 accident data?
  7. Identify the top thirty (30) Local Super Output Areas (LSOAs) for the City of Hull that recorded the highest number of road accidents in the first three months of 2019. Then aggregate these top thirty records together, so you can employ a time series model leveraging data for the first six months of 2019 (e.g., January to June) for these high incident areas so you can forecast the daily accident occurrences for the following month (e.g., July).

(b) - Social Network Analysis

Main Objective: You will write an analysis on the outcomes of constructing a social network based off the edge node information present in ‘facebook_combined.txt’. You will explore the structure and properties of a social network within this dataset to analyse the structure of the network, identify any communities that may be present and provide implications/meaning behind this analysis within a network science context. This task will be shorter than task (a).
Alongside the Main Objective for task (b) please follow the tasks below in order to successfully complete this component:

  1. Construct a social network using the provided data and visualise the network, then provide the basic network characteristics, including numbers of nodes and edges, network density, average degree.
  2. Calculate the edge centrality of this network and plot the distribution of the edge centrality values.
  3. Use two community detection algorithms to detect the clusters/community within this social network, then compare the difference of results (the number of clusters and numbers of nodes in each cluster).

Buy Answer of 771762 Project Report & Raise Your Grades

Order Non Plagiarized Assignment

771762 Report Structures (Suggested Approach).

Your structure for (a).
Please structure your report as follows.

  1. Short introduction. No more than a few sentences introducing the dataset and the problems that you seek to solve using it.
  2. Analysis and Results. Present an analysis of the data, including any visualizations, that address the questions 1-7, above. This should be broken down in to analysing when, where, and under what conditions accidents happen, as per the questions above.
  3. Predictions and Discussion. This should be working models to address points 6 and 7 in Task (a), above, that can predict the conditions under which accidents are most likely to occur in, and the severity of injuries sustained given the conditions they happen under.
  4. Recommendations. What recommendations can be made to government agencies based on this data and your analysis to improve safety? Keep this to your top 4 or 5 bullet points.

Your structure for (b).

Please structure your report as follows.

  1. Short introduction. No more than a few sentences introducing the dataset and how you will construct the social network with the provided data.
  2. Analysis and Results. Present an analysis of the constructed social network, detailing its structure and the communities including any visualizations. Do not forget to justify, by referencing where appropriate, any method or algorithm you use for those objectives outlined in task (b) (i.e. for each point 1-3).
  3. Discussion. Discuss the results of you analysis and the implications/interpretations/meaning behind what you find.
  4. Conclusions. Keep this to a few bullet points that highlight the key scientific outcomes from constructing this social network.

Grading.

The following grading rubric (on the next page) will be applied to your supplied answers. The total number of marks available for this assignment is 100. Please note that submitting lots of data is unlikely to attract many marks. Instead, we want to see fully reasoned analyses supported by evidence derived from the data supplied.

Given the word count, it is essential to be concise in your answers. It is strongly suggested that you illustrate your answers with appropriate diagrams (i.e. visualisations) or appendices of example calculations. Further, you might need to read around the topic and undertake library/online research to help with this assignment to achieve the highest grades.

Please upload:

  1. Your report for Tasks (a) and (b).
  2. The code you wrote to produce the results and/or visualisations used in the assignment.

Achieve Higher Grades 771762 Project Report

Order Non Plagiarized Assignment

If you are worried about the 771762 Big Data and Data Mining Project Report ? Then no need to worry anymore! Our experts are provide Report Writing Help that have designed for the students. You will get expert guidance and help on assignments that will strengthen your concepts. We also provide you with assignment example sample that will help you understand. And the best part? All the content is 100% original, written by PhD expert writers, and well-researched, so that you get the best quality. So don’t delay now; boost your grades with our help!

Workingment Unique Features

Hire Assignment Helper Today!


Latest Free Samples for University Students

JXH-4402 The Sport and Exercise Science Practitioner Assignment Example | BU

Category: Assignment

Subject: Psychology

University: Bangor University

Module Title: JXH-4402 The Sport and Exercise Science Practitioner

View Free Samples

IMA7001 International Marketing Management Assignment Sample | RCL

Category: Assignment

Subject: Management

University: Regent College London (RCL)

Module Title: IMA7001 International Marketing Management

View Free Samples

MSc/PGDip IHM Rooms Division Strategy 2210 Assignment Sample | UCB

Category: Assignment

Subject:

University: University College Birmingham

Module Title: 2210 Rooms Division Strategy

View Free Samples

MARK723-Contemporary Marketing Assignment Sample 2025-26 | LBU

Category: Assignment

Subject: Marketing

University: Leeds Beckett University

Module Title: MARK723-Contemporary Marketing

View Free Samples

HC70025W Public Health Health Systems (PHHS) Formative And Summative Assessment Sample Answers

Category: Assignment

Subject:

University: Leeds Beckett University

Module Title: HC70025W Public Health Health Systems

View Free Samples
Online Assignment Help in UK