Category |
Assignment |
Subject |
Business |
University |
University of Surrey |
Module Title |
MANM526 Statistics and Econometrics |
Essential Information:
- Students must prepare the final project individually and submit it before Thursday, 2 January 2025, 4:00 pm via SurreyLearn. The assessment comprises 100% of the final grade for this module.
- The project includes applying econometric analysis based on a real-world dataset using Stata statistical software. The expected level of analysis is based on the lectures and lab sessions.
- Stata is available on all FABSS labs and library computers. It is also accessible remotely via workspaces.surrey.ac.uk. If you need assistance with remote access, please contact IT Services (itservicedesk@surrey.ac.uk).
- All documents and files (e.g., Dataset, Assessment Brief, Sample Report) related to the final project are available from SurreyLearn: Course Materials - Assessment Information.
- Please ensure you are aware of the assessment regulations and submission.
Context
The app industry is one of the most lucrative and rapidly growing sectors within the digital economy. Apps compete fiercely in app stores, such as the Apple iOS App Store, to capture users’ attention. Achieving a top chart ranking in the app store is challenging but can greatly impact an app’s success. Engaging users is essential to generating revenue, as they are more likely to make in-app purchases, such as monthly subscriptions, new levels or characters in games, additional features in photo editing apps, Pro version upgrades, and digital consumables, while actively using the app. Active users also enable revenue generation by watching third-party advertisements shown during app engagement. However, user behaviour varies across countries, and the composition of top charts can differ from one country to another. This project aims to examine some of the determinants of app revenue, taking into account the country’s effect.
MANM526 Data
The dataset contains variables for a sample of free apps in the iOS App Store, with data captured on 4 December 2022. Free apps are those that users can download and install without an upfront purchase. Consequently, free apps’ primary revenue sources are in-app purchases, supplemented by third-party advertising revenue. The apps in the sample are among the top 200 revenue-generating apps in four countries: Germany, the UK, China, and Japan.
Below is the description of the variables for apps in the dataset.
- monthly_revenue: average monthly revenue ($) in the given country
- monthly_downloads: average monthly downloads in the given country
- active_users: number of active users (those who download the app and use it frequently) in the given country
- rank: rank in the top chart in the given country (lower ranks mean higher on the chart)
- updates: a score (between 1 and 5; from low to high) that measures how frequently the app has been updated since its release
- app_id: app unique identifier
- in_app_purchases: identifies if the app contains in-app purchases
- shows_ads: identifies if the app shows third-party advertisements (advertisement strategy)
- country: the country where the app appears in the top chart (in this sample)
Other general variables are also included (app’s main category, app name, app’s publisher name and ID, app release date, app version, and operating system).
Note: If during your analysis you face this error: “matsize too small”, which may or may not happen depending on your working memory, run this code and then continue your analysis: set matsize 1000
Content and Structure
Introduction
Provide a brief explanation of the project’s objective and methodology, such as data, the definition of dependent, independent, and control variables, and the baseline model (as explained in the Main Regression Analysis section).
- The app’s monthly revenue, monthly downloads, and active users should be used in the natural-logarithm-transformed version in all analyses (descriptive, graphs, regressions, etc.).
Descriptive Analysis
- Provide a two-way table that shows the summary statistics of the dependent, independent, and control variables for subsamples of four countries, as well as the full sample in one table. Briefly discuss the results.
- Apply an appropriate test to evaluate if there is any statistically significant difference (at 0.05 significance level) across countries regarding apps’ monthly downloads (logged). Briefly discuss the results. Can you show a graph that visually shows this?
- Apply an appropriate test to evaluate if there is any statistically significant difference (at 0.05 significance level) across countries regarding apps’ active users (logged). Briefly discuss the results.
- Provide the correlation matrix of the dependent, independent, and control variables. Briefly discuss the results.
Exploratory Analysis
- Inspect the data graphically, such as the distribution of variables, preliminary checks for potential outliers, initial assessments of relationships between the dependent and independent variables, etc. The types of graphs are up to your discretion, intending to offer a concise yet informative overview of the data and visually highlight patterns relevant to the project’s objective. You may select from the suggested types of graphs (or other ones) that effectively describe various aspects of the data before running regression analyses. Each reported graph needs a brief discussion.
- Plot a graph that shows the distribution of apps across categories and countries. That is, by looking at the graph, one should be able to see, for example, that most of the top Finance apps are in Germany (based on this sample). Briefly discuss the graph.
Main Regression Analysis
Conduct an OLS regression to estimate the effect of monthly downloads (logged), active users (logged), rank, and country on the app’s monthly revenue (logged) while controlling the updates, main category, and advertisement strategy. This will be the baseline model. Carefully interpret and discuss the results (e.g., R-squared, the statistical significance of coefficients, and their effect size).
- Briefly provide some explanations for relationships you find between the independent variables and the dependent variable in this context. For example, why do you think we see a positive or negative impact (as you see in your results) of the app’s monthly downloads on the app’s monthly revenue?
- Overall, in which country do apps have the lowest monthly revenue? Discuss and use a suitable graph to enhance your discussion. In which country do apps seem to have the highest monthly revenue?
- Discuss whether monthly downloads or active users have a stronger impact on the app’s monthly revenue.
Modify the baseline model to evaluate the differential effect of active users on the app’s monthly revenue (logged) in different countries. Discuss the results and explain which country has the most lucrative active users. Use a suitable graph to enhance your discussion.
Diagnostics and Robustness Analysis:
Apply diagnostic analyses on the baseline model to check the potential heteroskedasticity and apply an appropriate remedy if needed. Briefly compare the new results with the original results of the baseline.
Going back to the baseline model’s results, the effect of updates on the app’s monthly revenue may seem strange or even counterintuitive. It might be because the update frequency has a nonlinear effect in the model. Check the possibility of a quadratic effect of the updates on the app’s monthly revenue (logged) and discuss the result. You can use graphical illustrations to enhance your discussion. Can you provide some explanations for this relationship?
- Explain potential endogeneity problems in the baseline model. Discuss this specifically related to the model (not a generic explanation or definition of endogeneity).
How can you use other available variables in the dataset to improve your model? When suggesting a variable, specifically discuss how it can improve the model.
- If you could collect panel (i.e., longitudinal) data for these apps, discuss how it could mitigate the endogeneity problems further and enhance causal inference of results (again, please be specific).
Appendix
- Copy the programming codes in the appendix in Word format. Do not copy the codes as a screenshot. Alternatively, you can upload the Stata do-file along with your report on SurreyLearn as a separate file. It is important that the code precisely matches the results you incorporate in your report. Ensure that all results are directly obtained from your code and avoid including any redundant code.
Important Notes
- The project report should be only in Microsoft Word (recommended) or PDF format. Other formats, such as ZIP files, are not acceptable. As mentioned, programming codes can be uploaded in a Stata do-file format.
- The project’s word count should be no more than 3500 words. The word count includes everything from the introduction's first word to the conclusion's last word. The word count does not include tables, figures, images, and appendices. It does not include the abstract, table of contents, abbreviation pages, or references (though these are unnecessary for this project). You should report the word count, your name and student number at the beginning of your project. According to the university policy, exceeding the word count limit is subject to a 10-point penalty.
- Please note that if you submit to the wrong module page, your work will be considered a non-submission and will result in a mark of 0% being awarded. It is your responsibility to familiarise yourself with SurreyLearn before submission deadlines. Please do NOT submit links to shared drives (e.g., Google, OneDrive).
Guidelines and Tips
- Apply the analyses as explained section by section (from Introduction to Diagnostics and Robustness Analysis).
- The report—the writing, explanations, tables, and graphs—should be clear and informative as a self-explanatory and stand-alone document for readers who do not have access to this Final Project Description document.
- In the introduction, concisely explain the aim of the empirical report, sample and data, and the definition of all final variables incorporated in your regression models. This information (such as sample and variable definitions) has been provided, but you must concisely summarise it in your report.
- All tables and graphs should be numbered and titled (with captions if an additional explanation is required) and should be referred to in the report accordingly. The labels of the variables in tables and graphs should be informative.
- Graphs should be visually clear (axis title, colour, legend, scale, etc.). Do not populate the report with too many graphs; be selective and use the most informative ones for your purpose.
- Tables should be exported from Stata to a proper and readable Word format (please do not use screenshots of tables and graphs). You can report various regression models in one or two tables (each in one column). Yet, you must number your models and refer to them in the discussions accordingly.
- In the regression tables, standard errors should be reported below each coefficient (in the parentheses), and the significance level of the coefficient should be determined by asterisks. The R-squared and number of observations for each model should be reported (see the Sample Report).
- The programming codes used for preparing the tables, graphs, and regressions should be provided in a clear, easy-to-trace, and readable format in the appendix or as a separate do-file.
- You don’t need to cite any reference, but if you intend to, use a proper citation style and provide the reference list in the appendix.
- Overall, the project’s quality (i.e., clarity, rigour, precision, and depth) is more important than the length.