- Intro
- Synopsis
- Protocol
- Data
- Questions
- Projects
- References
- Credits

**Introduction**

How much alcohol can one consume before one’s blood alcohol content (BAC) is above the legal limit? Two undergraduate statistics projects, one conducted at The Ohio State University in Columbus, Ohio, and the other at the University of Western Sydney in Sydney, Australia, explored the relationship between BAC and other variables such as amount of alcohol consumed, weight, gender, and age.

**Synopsis**

__Abstract__

This example examines the prediction of blood alcohol content (measured from the breath) based on the amount of beer/wine consumed as well as the weight and gender of the individual.

__Data Set__

Ohio Study: 7 variables, 16 cases

Australian Study: 7 variables, 22 cases

**Extensions**

Project Ideas.

**10 Questions**

Experimental design, measurement validity, correlation, regression, examining residuals, transformations

Basic: Q1-10

**Protocol
**

__The Ohio State University__

The experiment took place in February of 1986 at a student dormitory. Sixteen students volunteered to be the subjects in the experiment. Each student blew into a breathalyzer*Breathalyzer:*A machine that measures alcohol in the breath and uses that measurement to infer a blood alcohol level. (shown in picture above) to indicate that his or her initial BAC was zero. The number (between 1 and 9) of 12 ounce beers to be drunk was assigned to each of the subjects by drawing tickets from a bowl. Thirty minutes after consuming their final beer, students had their BAC measured by a police officer of the OSU police department. The officer also administered a road sobriety test before and after the alcohol consumption. This involved performing four simple tasks, graded on a scale of 1 to 10 (ten being a perfect rating), demonstrating coordination: balancing on one foot, touching the tip of one’s nose with a forefinger, placing one’s head back with one’s eyes closed, and walking heel to toe. The police officer was not aware of how much alcohol each subject had consumed.

__University of Western Sydney__

In October of 1993, 22 volunteers, mainly university staff and students, attended a lunchtime “cocktail party.” Prior to the start of the experiment, a police officer from the Drug and Alcohol Unit breath-tested all of the participants using a breathalyzer. White wine (10% alcohol/volume) was distributed in 120 ml glasses, and each participant chose his or her own rate of drinking. Snack food and water were available at all times. After 45 minutes participants were told to stop drinking. After waiting an additional 15 minutes each subject was breath-tested again and the number of glasses consumed was recorded for all participants.

**Data**

The following variables from the OSU and Australia studies are contained in the stored data:

__OSU__

ID = identification number

Gender

Weight = weight of each subject in pounds

Beers = number of 12 ounce beers consumed

BAC = blood alcohol content

1st-Sobriety = combined score on the four road sobriety tests before alcohol consumption

2nd-Sobriety = combined score on the four road sobriety tests after alcohol consumption

__Australia__

ID = identification number

Gender

Weight (kg)

Height (cm)

Age

1hr-BAC = BAC after the first hour

Wine = number of glasses of wine consumed during the first hour

**Questions**

- Question 1
- Question 2
- Question 3
- Question 4
- Question 5
- Question 6
- Question 7
- Question 8
- Question 9
- Question 10

Question 1

**The 1986 OSU Study:**

Make a scatterplot of BAC versus number of beers consumed. Do you think the number of beers consumed would be a good predictor of BAC? Why?

Learning Objectives

- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers).

- Be able to interpret scatterplots including identifying the distinction between response and explanatory variables and the form of association and correlation between the response and explanatory variables.

Solution

The number of beers consumed would be a good predictor because it has a strong relationship with BAC. In general, as the number of beers consumed increases, BAC also increases.

Question 2

**The 1986 OSU Study:**

**a) **For each of the situations below, make a scatterplot showing the relation between BAC and the difference in sobriety scores (you will need to create this variable). Decide which variables should be the independent and dependent variables in each case, and comment on the plots.

- Suppose you wanted to know how much coordination deteriorates because of increased alcohol in the body.
- It is illegal to drive with a blood alcohol content of 0.1. Suppose you wanted to know how reliable the sobriety test is at determining BAC.

**b) **Which variable is the most valid measure for a police officer “on the street” addressing the following issues? Why? What are the assumptions involved in using even the most valid measures?

Issue 1: What is the true alcohol level in the driver’s blood?

Issue 2: How safely can a person drive a motor vehicle?Variable 1: The number of drinks consumed by an individual.

Variable 2: The BAC measured by a breathalyzer.

Variable 3: The difference between pre and post alcohol consumption sobriety tests.

Variable 4: The sobriety test after suspected alcohol consumption.

Learning Objectives

- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers). - Be able to interpret scatterplots including identifying the distinction between response and explanatory variables.
- Understand what it means for a measurement to be valid, reliable, and unbiased. Understand that the validity of a measurement involves appropriate standardization of measurements.

Solution

**a(i)**

The independent variable should be BAC, while the dependent variable should be the difference in the sobriety test scores (first test minus second test). In general, the plot shows that as your BAC increases, the deterioration in your coordination increases. However, the plot also shows that there are exceptions to the trend. The red **o** shows a case where the subject had a very low BAC, but still showed a marked drop in the score on the sobriety test. On the other hand, if we consider removing the blue **x**, the resulting plot would show a weaker relationship.

**a(ii)** The independent variable should be the difference between the sobriety test scores, while the dependent variable should be BAC. In general, the plot shows that when a subject's coordination deteriorates, his or her BAC is at a high level. Again, the same exceptions stand out. The red **o** represents a subject who showed a marked loss of coordination, but registered a very low BAC, and the blue **x** represents a subject who was unusually high in both variables.

**b)** For Issue 1, the measurement of choice should be Variable 2, the BAC measured by a breathalyzer. It would be impossible for a police officer who stops a motorist to know the number of drinks an individual has consumed or the difference between the two road sobriety tests. The sobriety test given to an individual after suspected alcohol consumption can be measured, but it measures coordination, not alcohol level. The assumption involved in using the breathalyzer to determine the true alcohol level in the blood is that the machine actually does this! The breathalyzer infers alcohol level in the blood by checking alcohol level in a person's breath.

For Issue 2, the sobriety test given to an individual after suspected alcohol consumption should be preferred, as it measures coordination. The assumption in using this measure, however, is that the coordination skills tested in a sobriety test have something to do with driving coordination skills. In other words, if a person is not able to walk heel to toe, does that give an indication that he would be an impaired driver?

Question 3

**The 1986 OSU Study:**

**a)**Using the OSU data, regress BAC on the difference between the road sobriety test scores. What value of BAC would you predict for a subject who had a difference in sobriety test scores of 2.5? Is this prediction reliable? Explain.

**b)**Regress BAC on the number of beers consumed. What value of BAC would you predict for a subject who drank 4 beers? Calculate the sample standard deviation of BAC and compare it to the estimate of the standard deviation of BAC about the regression line. Why do the two numbers differ?

**c)**Which of the two independent variables (i.e. number of beers consumed or difference on road sobriety test scores) is a better predictor of BAC for this data? Why?

Learning Objectives

- Be able to use the computer to calculate simple linear regression estimates and know how to interpret the resulting output including (
*y*= response variable,= estimated response, and*x*= the explanatory variable): the*R*-squared value (the square of the correlation measures the proportion of the variance of one variable that can be explained by straight-line dependence on the other variable) and RMSE (the new standard deviation of*y*after you know*x*).

Solution

**a)**

The regression output above indicates the equation for the least-squares line in this setting is

predicted BAC = 0.031377 + 0.022048 × Sobriety diff.

So for a sobriety test score difference of 2.5, we predict

BAC = 0.031377 + 0.022048 × (2.5) = 0.086497.

The value of *R*^{2} for this regression is only 51.1%. Thus, there is a good deal of variation in BAC that isn't explained by the difference between sobriety test scores. This indicates that the prediction may not be reliable.

**b)**

The regression output above indicates that the least-squares line in this setting is

predicted BAC = –0.012701 + 0.017964 × Beers.

So for a subject who drank 4 beers we predict

BAC = –0.012701 + 0.017964 × (4) = 0.059155.

The sample standard deviation of BAC is 0.0441, while the estimate of the standard deviation of BAC about the regression line, *s*, is 0.0204, less than half as large. This decrease is due to the ability of the number of beers consumed to explain differences in BAC.

**c) **The number of beers consumed is a better predictor of BAC for this data. Comparing the regression outputs for Question 3 Part a and Part b we see that:

*R*^{2} = 51.1% for the regression of BAC on the difference between sobriety test scores

and

*R*^{2} = 80.0% for the regression of BAC on the number of beers consumed.

In other words, the number of beers consumed explains more of the variation in BAC than does the difference between sobriety test scores. In addition, a glance at the scatterplots of Question 1 shows that the points in the BAC versus numbers of beers consumed plot are more clustered around a straight line than those in the plot of BAC versus sobriety test scores difference. The higher value of *R*^{2} substantiates that visual cue.

Question 4

**The 1986 OSU Study:**

**a)** Make a scatterplot of residuals versus predicted values for the regression in Question 3 Part b. Does this scatterplot show that there may be problems with the regression?

**b)** For the plot of residuals versus predictors from the regression of BAC on the number of beers consumed mark the points for male and female subjects differently. Do you notice any pattern in the plot when gender is indicated? If so, explain.

Learning Objectives

- Understand how to check the key model assumptions that are made in regression using residual plots and other model diagnostics.

Solution

**a)**

**Residuals versus Predicted values for the regression of BAC on number of beers consumed. **

There is no pattern to this plot. All of the points fall within 2 RMS error (i.e. 2 × (0.0204) = 0.0408) from zero. Thus, this plot does not suggest any problems with the regression.

**b) **Males are indicated by a blue **x** in the plot, while females are indicated by a red **o**. Males tend to have negative residuals, while females tend to have positive residuals. Thus the regression tends to predict too high a BAC for males and too low a BAC for females. This may be due to other factors such as weight.

Question 5

**The 1986 OSU Study:**

Before the breathalyzer came into widespread use, the road sobriety test was commonly used to decide whether a driver should be cited for drunk driving. Examine the scatterplot of BAC versus the road sobriety test administered after alcohol was consumed to see whether you think the test would be a good predictor of BAC. What could be a problem with its use as a predictor of BAC?

Learning Objectives

- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers). - Be able to interpret scatterplots including identifying the form of association and correlation between the response and explanatory variables.

Solution

The scatterplot indicates that the road sobriety test administered after alcohol was consumed would not be a good predictor of BAC. In using the road sobriety test score to predict BAC one must take into account that there are differences in natural abilities among subjects. For example, someone who is a dancer may be much better than others at balancing on one foot even if he/she is impaired by alcohol.

Question 6

**The 1986 OSU Study:**

**a) **How could the weight of a person affect your prediction of BAC based on the number of beers consumed by an individual? Give evidence to support your claim. Give a potential confounding factor with respect to how weight affects a person’s BAC.

**b) **Form a new variable from a ratio of the two variables beers and weight in the following manner. Let the beer-to-weight ratio be:

B/W-ratio = Beers/(Weight + 120.0 lbs)

Make a scatterplot of BAC versus the beer-to-weight ratio. Is the beer-to-weight ratio a better predictor of BAC than the number of beers consumed?

**c)** Regress BAC on the beer-to-weight ratio. Give a prediction of the value of BAC for a subject who drank 3 beers and weighed 140 pounds. Is this prediction reliable? Explain.

**d)** Make a scatterplot of the residuals predicted for the regression of Part c. Indicate whether the residual came from a male subject or female subject. Does the scatterplot reveal any potential problems?

**e)** The scatterplot from Part a of this question reveals one point that is highly separated from the others (i.e. the point with B/W-ratio > 0.03, BAC > 0.16). Remove that point and perform another regression of BAC on the beer-to-weight ratio. Did the regression change much? Comment.

Learning Objectives

- Understand how to check the key model assumptions that are made in regression using residual plots and other model diagnostics.
- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers). - Understand that simple linear regression is inappropriate when an outlier (or influential observation) drives the results.

Solution

**a)** A relatively heavy person may consume more beers than a relatively light person and still register a lower BAC. Evidence for this claim may be seen by plotting the residuals from the regression of BAC on the number of beers versus the subject's weight.

The plot reveals that heavy people tend to register a lower BAC than what the regression line would predict, and that light people tend to register a higher BAC than what the regression line would predict.

Gender may also play a role in determining the prediction of BAC since males tend to have a lower BAC than the regression line would predict (see Question 4 Part b), while females tend to have a higher BAC than the regression would predict. Likewise, males tend to be heavier than females, as can be seen in the given dotplot (males are indicated by a blue **x**, while females are indicated by a red **o**). That is, the factors of weight and gender are confounded.

**b)** The beer-to-weight ratio appears to be an even better predictor of BAC than the number of beers. The points in this plot fall very close to a straight line.

**c)**

From the output above the regression equation given is:

predicted BAC = –0.013873 + 5.24897 × B/W-ratio

The beer-to-weight ratio for someone who drank 3 beers and weighs 140 pounds is:

B/W-ratio = 3 / (140 + 120) = 0.01154

So the prediction of BAC for the given subject would be:

BAC = –0.013873 + 5.24897 × (0.01154) = 0.0467

This regression is even better than was the regression of BAC on the number of beers consumed. The *R*^{2} value of 95.4% shows that the prediction is quite reliable. Note that the intercept is now significantly different from 0 (*P*-value of 0.0293), but still near 0 in magnitude.

**d)** The scatterplot doesn't indicate any real problems, although the point with the largest predicted value may be quite influential to the regression (see next question where this is examined further). Note that we no longer have the problem that the residuals for males (marked with a blue **x**) tend to be negative and the residuals for females (marked with a red **o**) tend to be positive. The gender effect disappears after controlling for the weight of the subject.

**e) **The regression with the influential point removed gives the following result:

The regression again gives an excellent fit. The value of *R*^{2} changed a bit, but overall the results are more or less the same. The regression equation for this case is:

predicted BAC = –0.014937 + 5.32572 × B/W-ratio.

This is very similar to our equation in Part c. Thus we conclude that this single observation does not influence the fitted regression line much.

Question 7

**The 1993 Australian Study:**

Make a scatterplot of BAC versus the number of drinks consumed in the first hour for the Australian study. Do you think that the number of drinks consumed would be a good predictor of BAC?

Learning Objectives

- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers). - Be able to interpret scatterplots including identifying the form of association and correlation between the response and explanatory variables.

Solution

While there is definitely a trend in the data (in general as the number of drinks increases so does BAC), the number of drinks consumed doesn't appear to be as good a predictor of BAC as the number of beers was in the previous study. We should also note that as the number of drinks increases, the values of BAC become more spread out, indicating a greater variance in BAC.

Question 8

**The 1993 Australian Study:**

**a) **Perform a regression of BAC on the number of drinks consumed, and make a scatterplot of the residuals versus the predicted values. Explain how this relates to the scatterplot of Question 7.

**b) **Predict the value of BAC for a person who drank 4 glasses of wine, and compare your prediction to that of Question 3 Part b. Does the comparison make sense? What differences in the experiments could have caused beers to be a better predictor of BAC in the OSU experiment than number of drinks consumed was of BAC in the Australian experiment?

Learning Objectives

- Be able to use the computer to calculate simple linear regression estimates and know how to interpret the resulting output including (
*y*= response variable,= estimated response, and*x*= the explanatory variable): the*R*-squared value (the square of the correlation measures the proportion of the variance of one variable that can be explained by straight-line dependence on the other variable).

Solution

**a)**

The regression output indicates that the equation for the least-squares line in this setting is:

predicted 1hr-BAC = –0.00437534 + 0.010889 × Wine.

The residuals versus predicted plot shows that the regression suffers from non-constant variance. This potential problem was seen in the scatterplot of Question 7, but is clearer in the residual plot.

**b) **For a person who drank 4 glasses of wine we would predict

BAC = –0.004375 + 0.010889 × 4 = 0.0479

Comparing this value to the value we found for BAC in Question 3 Part b (calculated for a subject who drank 4 beers) we see that it is lower (0.048 as opposed to 0.059) and that the prediction is less reliable (*R*^{2} of 39.9% as opposed to 80.0%). We would not expect the predictions to be exactly the same, or to have the same reliability, as the experimental settings aren't the same for each set of data.

The regression of BAC on the number of beers consumed for the OSU experiment gave a better fit than did the regression of BAC on the number of drinks consumed for the Australian experiment. However, there are a few major differences between the two experiments which we outline below.

- The subjects in the OSU experiment waited thirty minutes after consuming their final beer to have the breathalyzer administered, while the Australian subjects took their breath tests only fifteen minutes after consuming their final drink. Since alcohol takes a time to enter the bloodstream this could cause a difference in the BAC values.

- The Australian subjects had snack food available to them at all times during the experiment. Food can slow the absorption of alcohol into the bloodstream, so subjects who ate more may have registered a lower BAC than others who chose not to eat the snack food or ate little of it. This effect could add variation in the data for the Australian study.

- The number of beers that a subject drank was randomly assigned in the OSU study, while the number of drinks consumed was subject determined in the Australian study.

- All of the subjects in the OSU study were students, while staff were included in the Australian study. Age could be a factor in determining BAC.

- The two police officers may have been trained to use the breathalyzer quite differently.

Question 9

**The 1993 Australian Study**

**
a) **Consider a transformation similar to Question 6 Part b. In particular, let the alcohol-to-weight ratio be:

A/W-ratio = (number of drinks consumed) / (Weight – 20.0 kg).

Make a scatterplot of BAC versus the alcohol-to-weight ratio. Do you think that the alcohol-to-weight ratio will be a better predictor of BAC than the number of drinks consumed is?

**b) **Perform a regression of BAC on the alcohol-to-weight ratio. Predict the value of BAC that would be obtained for a person who consumed 3 drinks and who weighs 63.63 kilograms (i.e. 140 pounds). Be sure to include a plot of the residuals versus predicted values. Indicate whether the residuals came from a male or female subject.

Learning Objectives

- Understand how transformations can be used in regression to extend simple linear regression to nonlinear trends.
- Be able to interpret results pertaining to transformed data.

Solution

**a)** The alcohol-to-weight ratio appears to be a better predictor of BAC than the number of drinks consumed. The points are more tightly clustered around a line than in the scatterplot for Question 7.

**b)**

The regression output indicates that the equation for the least-squares regression line in this setting is:

predicted 1hr-BAC = –0.020924 + 0.825856 × (A/W-ratio).

So the predicted value of BAC for a subject who consumed 3 drinks and weighs 63.63 kilograms would be

1hr-BAC = –0.020924 + 0.825856 × (3 / (63.63 – 20.0)) = 0.03586

The value of *R*^{2} indicates that 72.2 percent of the variation in the dependent variable is accounted for by the independent variable. The regression gives a very good fit.

The residual plot shows no clear pattern or indication of non-constant variance, nor is there any differing pattern for either males (marked with a blue **x**) or females (marked with a red **o**). No potential problems with the regression are revealed by this plot.

**Residuals versus Predicted values for the regression of 1hr-BAC on A/W-ratio.**

Question 10

**The 1993 Australian Study:**

Use plots to examine whether age could be a major factor in determining BAC.

Learning Objectives

- Be able to use your software to make a scatterplot.

Be able to interpret the display (for example, recognizing patterns or spotting outliers). - Be able to interpret scatterplots including identifying the form of association and correlation between the response and explanatory variables.

Solution

The scatterplot of BAC versus age shows that age is probably not a major factor in determining BAC. The scatterplot does not show either a negative association or positive association of BAC with age. In addition, the scatterplot of the residuals from the regression in Question 9 versus age shows no apparent trend.

**Projects**

- Use multiple linear regression to examine the prediction of BAC using weight, gender, and glasses consumed, in the OSU study. Compare the linear models to the nonlinear models suggested in the story.

- Find a way to estimate the percent of body fat based on the height and weight of an individual. Examine whether this new variable helps in predicting BAC for the Australian study.

**References**

Pearl, D. K., and Stasny, E. A. (1992)

**Credits**

This story was initiated by Nicole DePriest and prepared by Mark Zabel, and last modified on 7/25/95. Thanks to Prof. Kevin Donegan, Univ. of Western Sydney, for the Australian data and Ms. Laura Dowling for data from her Ohio State class project.