ECON 3100
Statistical investigation:
This exercise again draws on the data set ?grandfather clocks,? available on Canvas or under the
Help menu, as a sample data set in JMP.
1. Develop an estimated regression equation with price as the dependent variable; age as the
independent variable (yes, you did this for the previous problem set, use that output if you?d like,
or run it again). Call this model A.
2. Develop an estimated regression equation with price as the dependent variable, age and
number of bidders as the independent variables. Call this model B.
3. Which model do you prefer, model A or B? Why?
4. Interpret the coefficient on age in Model B. What precisely does it tell you about the
relationship between age and price?
5. Interpret the coefficient on number of bidders in Model B. What precisely does it tell you
about the relationship between the number of bidders and price.
6. In Model B, conduct an F-test as to whether there is a useful linear connection between the
dependent variables and the independent variables in the population. Explain your results in
non-technical terms.
7. In Model B, conduct t-tests as to whether the variables age or bidders individually have a
statistically significant relationship to price, holding the other constant. Explain your findings
fully.
8. Again, in Model B, include a plot of the residuals against y-bar and a histogram of the
residuals (created in ?Analyze Distribution?).
9. List and briefly explain the four assumptions of the regression model. Looking at the residual
plot and histogram created in problem 8, do any of the assumptions appear to be violated?
10. Calculate the studentized residual of each observation. Are there any observations in the
data set that you would consider outliers, based on a ?studentized residual greater than 2 in
absolute value? rule? Which observation has the largest studentized residual, in absolute value?
11. Are any observations unusually influential? Use Cook?s distance measure to determine the
leverage of each observation. Which observation has the greatest leverage?
Everyday statistics
in this folder on Canvas. You might also listen to or read the related radio story:
http://www.mprnews.org/story/2012/03/07/target-data-mining-privacy
1. How does the process of habit formation make it more difficult to market new products?
2. Why does Target have an interest in predicting whether a customer (?guest?) is expecting a
baby? What other life events might be of interest to retailers, in terms of selling new products?
3. What information (variables) might be relevant in a model used to predict whether a customer
is expecting a baby? List at least five or six independent variables that data analysts might
include in a model intended to predict whether a customer is expecting a baby.
4. Felix Wu, a scholar of privacy issues, describes three big data ?threats,? that is, three ways in
which the use of big data might be problematic. They are surveillance: giving people the
uncomfortable feeling of being watched and studied, disclosure: the possibility that private
information about you might be revealed to others inappropriately, and discrimination: the
possibility that you will be treated differently than others based on information collected about
you. Does Target?s use of customer data raise any of these threats (surveillance, disclosure,
discrimination)? Which one(s) and why?
5. How did Target alter its approach to potentially pregnant customers to make it less
threatening? Why were customers feeling threatened?
