Problem Set 2

Don't use plagiarized sources. Get Your Custom Essay on
Just from \$13/Page

GNI per capita, PPP (current international \$)

http://data.worldbank.org/indicator/NY.GNP.PCAP.PP.CD?view=chart

Adult literacy rate, population 15+ years, both sexes (%)

Create a single data set for the year 2010. Your final data set should have countries in rows and variables in columns. Your data set should contain countries only, and not aggregates. You should not have observations for regions, such as High Income, Euro area, and World.

There are two ways that you can create this data set.

1. The first option is to download the datasets individually from the links provided above. Then you can use copy and paste to put the data for 2010 into a single spreadsheet. You will have to look through the data and remove observations for regional aggregates.

2. The second option is to format the dataset using the World DataBank (http://databank.worldbank.org), then download a single file. If you choose to use the databank method, make sure that you do not download formatted data. If your data contain commas (e.g. 74,342.00), Excel will read the data as text, and you will be unable to perform calculations on them. Your data should not contain regional aggregates. You can exclude these in the DataBank selection, or delete them in Excel after downloading the data.

In either case, your data set should be organized like the ones on the next page.

After downloading your data, delete all observations that have missing data. To make this faster, first sort the data by literacy rate, then delete observations with missing data. Your final data set should have 43 observations with no missing data. In Excel, your data set should look like this: Question 1: Scatter plot

Create a scatter plot with GNI per capita on the vertical axis and adult literacy on the horizontal axis. Give your graph an informative title and label both axes appropriately. Do the two variables appear to have a linear relationship?

Question 2: Correlation

Calculate the correlation coefficient between the two variables. Follow the format used in the handout on correlation, page 7. Consider GNI per capita as the y-variable and adult literacy as the x-variable. Including the scatter plot, your Excel sheet should look like the picture below. What is the correlation between the two variables? How would you interpret this statistic? Question 3: Linear Regression Formula

Suppose you are interested in explaining GNI per capita in terms of adult literacy using linear regression. Which variable is your dependent variable? Which variable is your independent variable? Write the linear regression equation that you would use to explore this relationship.

Question 4: Linear Regression Coefficients

Calculate the slope and intercept coefficients from your linear regression equation. What are the coefficients? Interpret the meaning of the slope coefficients. Include the measurement units of each variable. (Hint: Are you interpreting a one percent change in the adult literacy rate or a one percentage point change in the adult literacy rate?)

Question 5: Predicted values of GNI

Using your regression coefficients from question 4, calculate the predicted value of GNI for each observed value of adult literacy. Add these data points to your scatter plot as a line, as pictured below. Are all of the predicted values reasonable? Are there any that are impossible? Explain. Question 6: Predicted residuals

Use the predicted values of yhat to calculate estimated residuals, uhat. What is the expected value (mean) of your estimated residuals?