Category | Assignment | Subject | Computer Science |
---|---|---|---|
University | University of Wollongong | Module Title | CSCI312 Big Data Management |
The objectives of Assignment 2 include conceptual modelling of a data warehouse, implementation of 0nf tables in HQL, implementation of external tables in HQL, and querying a data cube. This assignment is due on Sunday, 11 May 2025, 8:00 pm (sharp) Singaporean Time (SGT). This assignment is worth 20% of the total evaluation of the subject. The assignment consists of 4 tasks, and the specification of each task starts on a new page.
A policy regarding late submissions is included in the subject outline. Only one submission of Assignment 2 is allowed, and only one submission per student is accepted. A late submission penalty (5% of the total mark) will be applied for every 24 hours late. A submission that contains an incorrect file attached is treated as a correct submission, with all consequences coming from the evaluation of the file attached.
All files left on Moodle in a state “Draft (not submitted)” will NOT be evaluated. An implementation that does not compile well due to one or more syntactical and/or runtime errors scores no marks.
The second assignment is an individual assignment, and it is expected that all its tasks will be solved individually without any cooperation with the other students. However, it is allowed to declare in the submission comments that a particular component or task of this assignment has been implemented in cooperation with another student. In such a case, evaluation of a task or component may be shared with another student. In all other cases, plagiarism will result in a FAIL grade being recorded for the entire assignment. If you have any doubts, questions, etc., please consult your lecturer or tutor during laboratory/tutorial classes or via email.
Intuitive design of a data cube from a conceptual schema of an operational database
Consider the following conceptual schema of an operational database owned by a multinational real estate company. The database contains information about the real estate properties offered for sale, owners of the properties, potential buyers who are interested in the properties and real estate agents involved in the selling of the properties.
Whenever a property is put on the market by an owner, a description of the property is entered into an operational database. Whenever a property is purchased, its description is removed from an operational database.
The real estate company would like to create a data warehouse to keep information about the finalised real estate transactions, properties involved in the transactions, sellers/owners, and agents involved in the real estate transactions. The real estate company would like to use a data warehouse to implement the following classes of analytical applications.
Struggling to complete this Assignment and feeling stressed? Take our Assignment Writing Services
Order Non-Plagiarised AssignmentNote, the operational database does not contain all the information necessary to implement the classes of applications listed above. Additional information must be added when data is transferred from an operational database to a data warehouse.
Deliverables
A file solution1.pdf that contains
Conceptual modelling of a data warehouse
An objective of this task is to create a conceptual schema of a sample data warehouse domain described below. Read and analyse the following specification of a data warehouse domain.
A person is represented as either a patient or a medical worker, or an administrative worker. Medical and administrative workers work in medical facilities that have a name, address, and possibly (not obligatory) specialisation. Each medical worker is described as a unique staff member at a facility, including name, address, and phone number.
A patient visits a medical facility for the treatment of a health problem. Each service involves a patient, a medical worker, and an administrative worker. The service can be a diagnosis, treatment, or checkup. A description and date of each service are recorded. Time spent on service and the costs are recorded as well.
A patient is eligible for his or her company health care benefits. Patient data includes name, ID number (social security number), address (street, city, state, zip), and phone.
A medical worker must hold one or more credentials that are granted to work in a particular medical facility. Doctors are allowed to deliver diagnoses and give treatment based on their specialisation. Paramedics are allowed to deliver only emergency diagnosis and treatment for any type of life-threatening problem. Nurses do not deliver diagnoses, but they do participate in treatment, particularly if the patient must be prepared for surgery or remain at the facility overnight.
The administration workers are concerned with personnel needs and assignments. Each medical worker must have at most one assignment at a facility. Several administration workers can be assigned to one assignment.
Medical facilities are located in different suburbs of different cities. A medical facility is uniquely identified by an address.
Get the Solution to this Assessment. Hire Experts to solve this assignment before your Deadline
Buy Today, Contact UsFollow the steps below to create a conceptual schema for a sample data warehouse domain:
You can use the UMLet diagram drawing tool and select the Conceptual Modelling notation. The notation selection is available in the top-right corner of UMLet’s main menu. UMLet version 14.3 can be downloaded from the subject’s Moodle website under the WEB LINKS section. Alternatively, a neat hand-drawn diagram is also acceptable.
Deliverables
A file solution2.pdf with a drawing of a conceptual schema of a sample data warehouse domain.
Implementation of a table with a complex column type (0nf table) in Hive
Assume that we have a collection of semi-structured data with information about the employees (unique employee number and full name), the projects they are assigned to (project name and percentage of involvement) and their programming skills (the names of known programming languages). Some of the employees are on leave and they are not involved in any project. Also, some of the employees do not know any programming languages.
Few sample records from the collection are listed below.
007|James Bond|DB/3:30,Oracle:25,SQL-2022:100|Java,C,C++
008|Harry Potter|DB/3:70,Oracle:75|
010|Robin Banks| |C,Rust
009|Robin Hood| |
Deliverables
A file solution3.rpt with a report from the processing of the HQL script solution3.hql. The report MUST NOT include any errors, and the report must list all SQL statements processed.
Implementation of a data warehouse as a collection of external tables in Hive
Consider the following two-dimensional data cube.
The data cube contains information about the parts that can be shipped by the suppliers. Download and unzip the file task4.zip. You should obtain a folder task4 with the following files: part.tbl, supplier.tbl, partsupp.tbl.
Use an editor to examine the contents of *.tbl files. Note that the contents of the files can be loaded into the relational tables obtained from the transformation of the two-dimensional data cube given above into the relational tables PART, SUPPLIER, and PARTSUPP.
Transfer the files into HDFS.
Implement HQL script solution4.hql that creates the external tables obtained from a step of logical design performed earlier. The external tables must overlap with the files transferred to HDFS in the previous step. Note that a header in each *.tbl file must be removed before creating the external tables.
Include in solution 4. HQL script SELECT statements that return any 5 rows from each of the external tables implemented in the previous step, and the total number of rows included in each table.
When ready, use a command line interface, beeline, to process a script solution4.hql and to save a report from processing in a file solution4.rpt.
Submit Your Assignment Questions & Get Plagiarism-Free Answers.
Order Non-Plagiarised AssignmentDeliverables
A file solution4.rpt with a report from the processing of the HQL script solution4.hql.
Submission of Assignment 2
Note that you have only one submission. So, make sure that you submit the correct files with the correct contents. Please submit an Academic Consideration in SOLS if an extension (1 week maximally) is required.
Please combine all files into a single zipped file (A2-solutions.zip). Please submit the zipped file through Moodle in the following way:
Stuck on your CSCI312 Big Data Management? Don't worry! Our Computer Science Assignment Help service is the best for you. If you need assignment help Australia, our expert PhD writers will provide you with original content. And yes, you will also get free assignment samples, which will give you a perfect idea of how to write a top-quality assignment. Don't worry about the deadline, as we guarantee on-time delivery. Contact us now for high-quality and plagiarism-free work and boost your grades!
Let's Book Your Work with Our Expert and Get High-Quality Content