This is a program in R that uses Mallows' Cp, adjusted r-squared, forward selection, and backwards elimination methods to determine the regression lines to the number of COVID-19 cases and deaths as a function of provided input variables that possibly could be used to predict the number of COVID-19 cases/deaths in other countries.
There are two files - the code itself used to determine the regression lines (the comments in the code describe it), and the data spreadsheet.
I will define the variables in the data spreadsheet here:
- covCases - the number of coronavirus cases in a given country
- covDeaths - the number of deaths due to coronavirus in a given country
- airQual - the air quality of a given country. Higher values indicate a worse air quality, as it means that more harmful particles are in the air. Units: micrograms per cubic meter.
- percentEducation - the amount a given country's government spends on education as a percentage of its total GDP
- GDP - gross domestic product of a given country, measured in $USD.
- avgInc - the average yearly household income in a given country, measured in $USD.
- landlocked - 1 if a country is landlocked, 0 if it isn't.
- waterlocked - 1 if a country is entirely surrounded by water, 0 if it isn't.
- popDens - the average population density in a given country, measured in people per square kilometer.