CSCI312 Big Data Management Assignment 2 Questions | UOW

Published: 21 Apr, 2025
Category Assignment Subject Computer Science
University University of Wollongong Module Title CSCI312 Big Data Management

CSCI312 Assignment 2

Scope

The objectives of Assignment 2 include conceptual modelling of a data warehouse, implementation of 0nf tables in HQL, implementation of external tables in HQL, and querying a data cube. This assignment is due on Sunday, 11 May 2025, 8:00 pm (sharp) Singaporean Time (SGT). This assignment is worth 20% of the total evaluation of the subject. The assignment consists of 4 tasks, and the specification of each task starts on a new page.

A policy regarding late submissions is included in the subject outline. Only one submission of Assignment 2 is allowed, and only one submission per student is accepted. A late submission penalty (5% of the total mark) will be applied for every 24 hours late. A submission that contains an incorrect file attached is treated as a correct submission, with all consequences coming from the evaluation of the file attached.

All files left on Moodle in a state “Draft (not submitted)” will NOT be evaluated. An implementation that does not compile well due to one or more syntactical and/or runtime errors scores no marks.

The second assignment is an individual assignment, and it is expected that all its tasks will be solved individually without any cooperation with the other students. However, it is allowed to declare in the submission comments that a particular component or task of this assignment has been implemented in cooperation with another student. In such a case, evaluation of a task or component may be shared with another student. In all other cases, plagiarism will result in a FAIL grade being recorded for the entire assignment. If you have any doubts, questions, etc., please consult your lecturer or tutor during laboratory/tutorial classes or via email.

CSCI312 Task 1 

Intuitive design of a data cube from a conceptual schema of an operational database

Consider the following conceptual schema of an operational database owned by a multinational real estate company. The database contains information about the real estate properties offered for sale, owners of the properties, potential buyers who are interested in the properties and real estate agents involved in the selling of the properties.

CSCI312 Big Data Management

Whenever a property is put on the market by an owner, a description of the property is entered into an operational database. Whenever a property is purchased, its description is removed from an operational database.

The real estate company would like to create a data warehouse to keep information about the finalised real estate transactions, properties involved in the transactions, sellers/owners, and agents involved in the real estate transactions. The real estate company would like to use a data warehouse to implement the following classes of analytical applications.

Struggling to complete this Assignment and feeling stressed? Take our Assignment Writing Services

Order Non-Plagiarised Assignment
  • Find the total number of real estate properties sold per month, year, street, city, country, and agent involved.
  • Find the average asking price of real estate properties sold per month, year, street, city, country, and agent involved.
  • Find the average final price of real estate properties sold per month, year, street, city, country, and agent involved.
  • Find the average period on the market of real estate properties sold per month, year, street, city, country, and agent involved.
  • Find the total number of times each real estate property has been sold in a given period.
  • Find the total number of buyers interested in purchases of real estate properties sold per day, month, year, street, city, country, and agent involved.

Note, the operational database does not contain all the information necessary to implement the classes of applications listed above. Additional information must be added when data is transferred from an operational database to a data warehouse.

  • Using the short explanation of the database domain and the conceptual schema provided above, identify a suitable data cube that the multinational real estate company could implement for its data warehouse. In your specification, list the facts, measures, dimension names, and their corresponding hierarchies. For each measure, include an explanation of how it is derived.
  • Pick any three dimensions from a data cube found in the previous step, and at least 4 values in each dimension and one measure to draw a sample three-dimensional data cube in a perspective view similar to a view included in a presentation, 09 Data Warehouse Concepts, slide 6.

Deliverables

A file solution1.pdf that contains

  • a specification of a data cube as a list of facts, measures, dimensions, and hierarchies obtained as a result of task (1),
  • A perspective drawing of a three-dimensional data cube as a result of task (2).

CSCI312 Task 2 

Conceptual modelling of a data warehouse

An objective of this task is to create a conceptual schema of a sample data warehouse domain described below. Read and analyse the following specification of a data warehouse domain.

A person is represented as either a patient or a medical worker, or an administrative worker. Medical and administrative workers work in medical facilities that have a name, address, and possibly (not obligatory) specialisation. Each medical worker is described as a unique staff member at a facility, including name, address, and phone number.

A patient visits a medical facility for the treatment of a health problem. Each service involves a patient, a medical worker, and an administrative worker. The service can be a diagnosis, treatment, or checkup. A description and date of each service are recorded. Time spent on service and the costs are recorded as well.

A patient is eligible for his or her company health care benefits. Patient data includes name, ID number (social security number), address (street, city, state, zip), and phone.

A medical worker must hold one or more credentials that are granted to work in a particular medical facility. Doctors are allowed to deliver diagnoses and give treatment based on their specialisation. Paramedics are allowed to deliver only emergency diagnosis and treatment for any type of life-threatening problem. Nurses do not deliver diagnoses, but they do participate in treatment, particularly if the patient must be prepared for surgery or remain at the facility overnight.

The administration workers are concerned with personnel needs and assignments. Each medical worker must have at most one assignment at a facility. Several administration workers can be assigned to one assignment.

Medical facilities are located in different suburbs of different cities. A medical facility is uniquely identified by an address.

  • A data warehouse must be designed such it should be possible to easily implement the following classes of applications.
  • The management of the medical facilities would like to get from a data warehouse information about the total number of medical services performed per medical facility, per year, per month, per day, per city and medical worker.
  • The total length of medical services per medical facility, per year, per month, per day, per city and medical worker,
  • the average length of medical services per medical facility, per year, per month, per day, per city and medical worker,
  • the total number of doctors/paramedics/nurses involved in medical services, per year, per month, per day, per medical facility, per city,
  • the average time spent on medical services per year, month, day,
  • the total costs of medical services per year, month, day, medical facility, and city.
  • To draw a conceptual schema, use the graphical notation introduced in Presentation 11: Conceptual Data Warehouse Design.

Get the Solution to this Assessment. Hire Experts to solve this assignment before your Deadline

Buy Today, Contact Us

Follow the steps below to create a conceptual schema for a sample data warehouse domain:

  • Identify a fact entity and the measures that describe it.
  • Identify the dimensions.
  • Identify hierarchies within each dimension.
  • Identify the descriptions (attributes) of all entity types.
  • Draw the conceptual schema.

You can use the UMLet diagram drawing tool and select the Conceptual Modelling notation. The notation selection is available in the top-right corner of UMLet’s main menu. UMLet version 14.3 can be downloaded from the subject’s Moodle website under the WEB LINKS section. Alternatively, a neat hand-drawn diagram is also acceptable.

Deliverables

A file solution2.pdf with a drawing of a conceptual schema of a sample data warehouse domain.

CSCI312 Task 3 

Implementation of a table with a complex column type (0nf table) in Hive
Assume that we have a collection of semi-structured data with information about the employees (unique employee number and full name), the projects they are assigned to (project name and percentage of involvement) and their programming skills (the names of known programming languages). Some of the employees are on leave and they are not involved in any project. Also, some of the employees do not know any programming languages.

Few sample records from the collection are listed below.

007|James Bond|DB/3:30,Oracle:25,SQL-2022:100|Java,C,C++
008|Harry Potter|DB/3:70,Oracle:75|
010|Robin Banks| |C,Rust
009|Robin Hood| |

  • Implement HQL script solution3.hql that creates an internal relational table to store information about the employees, the projects they are assigned to (project name and percentage of involvement) and their programming skills.
  • Include in the script INSERT statements that load sample data into the table. Insert at least 5 rows into the relational table created in the previous step. Two employees must participate in a few projects and must know a few programming languages. One employee must participate in a few projects and must not know any programming languages. One employee must know a few programming languages and must not participate in any projects. One employee must not know programming languages and must not participate in the projects.
  • Include in the script SELECT statements that list the contents of the table. When ready, use a command line interface, beeline, to process a script solution3.hql and to save a report from processing in a file solution3.rpt. If the processing of the file returns the errors, then you must eliminate the errors!

Deliverables

A file solution3.rpt with a report from the processing of the HQL script solution3.hql. The report MUST NOT include any errors, and the report must list all SQL statements processed.

CSCI312 Task 4 

Implementation of a data warehouse as a collection of external tables in Hive

Consider the following two-dimensional data cube.

CSCI312 Assignment 2 Questions 2025

The data cube contains information about the parts that can be shipped by the suppliers. Download and unzip the file task4.zip. You should obtain a folder task4 with the following files: part.tbl, supplier.tbl, partsupp.tbl.

Use an editor to examine the contents of *.tbl files. Note that the contents of the files can be loaded into the relational tables obtained from the transformation of the two-dimensional data cube given above into the relational tables PART, SUPPLIER, and PARTSUPP.

Transfer the files into HDFS.

Implement HQL script solution4.hql that creates the external tables obtained from a step of logical design performed earlier. The external tables must overlap with the files transferred to HDFS in the previous step. Note that a header in each *.tbl file must be removed before creating the external tables.

Include in solution 4. HQL script SELECT statements that return any 5 rows from each of the external tables implemented in the previous step, and the total number of rows included in each table.

When ready, use a command line interface, beeline, to process a script solution4.hql and to save a report from processing in a file solution4.rpt.

Submit Your Assignment Questions & Get Plagiarism-Free Answers.

 Order Non-Plagiarised Assignment

Deliverables

A file solution4.rpt with a report from the processing of the HQL script solution4.hql.

Submission of Assignment 2

Note that you have only one submission. So, make sure that you submit the correct files with the correct contents. Please submit an Academic Consideration in SOLS if an extension (1 week maximally) is required.

Please combine all files into a single zipped file (A2-solutions.zip). Please submit the zipped file through Moodle in the following way:

  • To log in, use a Login link located in the top right corner of the Web page or the middle of the bottom of the Web page
  • When logged select a site ISIT312 (SP225) Big Data Management
  • Scroll down to the section SUBMISSIONS
  • Click on the Assignment 2 link.
  • Click on the Add Submission button
  • Move the zipped file A2-solutions.zip into an area where you can drag and drop files to add them. You can also use a link, Add…
  • Click on a button to save changes
  • Click at the button Submit assignment
  • Click on the checkbox with a text attached: By checking this box, I confirm that this submission is my work, … to confirm authorship of your submission.
  • Click at the button Continue

Stuck on your CSCI312 Big Data Management? Don't worry! Our Computer Science Assignment Help service is the best for you. If you need assignment help Australia, our expert PhD writers will provide you with original content. And yes, you will also get free assignment samples, which will give you a perfect idea of ​​how to write a top-quality assignment. Don't worry about the deadline, as we guarantee on-time delivery. Contact us now for high-quality and plagiarism-free work and boost your grades!

HE Diploma CBB550 Disease and Immunity L3 Assignment Brief - 2025

CBB550: This task requires you to respond to questions including some about a variety of case studies. It includes questions requiring short answers and some where you will need to explain

MGT104 Fundamentals of Financial Decision Making L4 Assignment Brief | UOS

Lee is a sole trader who started trading on 1st January 2023. As a trainee financial manager, you have been presented with a summary of transactions that occurred during the first year of trading.

NURSE301 Health Assessment and Nursing Practice 3 | UoW

The purpose of this assessment is for students to reflect in groups about clinical practice and in the context of serving a population with diverse, interwoven health needs, highlight a problem and propose an innovation that enhances holistic nursing care with a tangible solution.

FINC603 Commercial Banking Group Bank Project Sem 1 Assignment Brief - 2025

FINC603: The group project is an analysis of the structure, performance and conduct of TWO commercial banks (New Zealand domestic banks or international banks)

BMG936 International Entrepreneurship Assignment Brief | UU

Use the BMC template to design your business model. Components of BMC to be shared among members. Use relevant data,  demonstrate creativity and provide in-text citations for any external sources used in designing the poster.

BTEC HND Unit 5 Security Assignment 2 Brief (RQF)- 2025

BTEC HND Unit 5 Security Assignment 2 LO3: Review mechanisms to control organisational IT security LO4: Manage organisational security.

DM932 PG Individual Projects Assignment Brief | UoS

The project should not be looked upon as a “necessary evil‟ that is required to fulfil the requirements of the course. It should be taken as an opportunity for you to show your competence and ability in analysing and solving problems.

Unit CO401 Health, Safety and Hygiene for Aesthetic Procedures Assignment Questions 2025

Keep your writing simple and factual, and pay careful attention to the wording of the particularly specific questions. Check the command verb used and refer to the attached command verb document to enable you to incorporate the required level of information.

CI7801 User Experience Major Project 2025-26 | KU

It is essential that your proposed project is within the scope of User Experience Design. Talk to your supervisor about the suitability of your project idea. User Experience Design is interpreted broadly, but to apply the marking criteria,.

L4 Fundamentals of Nursing Practice Assessment Brief 2025

You are required to complete a 2000-word essay on person-centred care, followed by a short reflection using the format of your choice. This assignment will help you recognise the use of person-centred care in practice and develop your evidence-based practice.

Online Assignment Help in UK