Is Caffeine Dependence Real?


    Many people start their day with a jolt of caffeine from coffee or a soft drink. Most experts agree that people who take in large amounts of caffeine each day may suffer from physical withdrawal symptoms if they stop ingesting their usual amounts of caffeine. Strain, Mumford, Silverman, and Griffiths (1994) ask whether, aside from the fact that a person may develop a physical addiction to caffeine, there is evidence in some individuals of a more serious addiction called caffeine dependence syndrome. To classify caffeine use as dependence, the researchers looked for factors such as continued use despite a doctor's diagnosis that caffeine was causing or worsening physical problems. The criteria used were from among those commonly used to diagnose an individual who has a dependence on some other drug.


A double-blind randomized experiment is conducted on subjects classified as having caffeine dependence to see if withdrawal from caffeine leads, not just to physical symptoms, but to a more serious caffeine dependence syndrome.

Data Set
16 variables, 11 cases

4 Questions
Estimation, experimental design issues, double-blinding, randomization, normal approximation to binomial, means, standard deviations, hypothesis testing.
Basic: Q1-4


Twenty-seven volunteers were recruited through newspaper ads seeking individuals who believed they were psychologically or physically addicted to caffeine but otherwise were in good health. Of these twenty-seven volunteers, sixteen were diagnosed as being caffeine dependent. An individual was diagnosed as caffeine dependent if he or she met at least three out of the following four criteria: (1) tolerance, (2) withdrawal, (3) persistent desire or unsuccessful efforts to cut down or control use, and (4) use continued despite knowledge of a persistent or recurrent physical or psychological problem that is likely to have been caused or exacerbated by substance use.

Of the sixteen subjects who were diagnosed as caffeine dependent, eleven agreed to participate in a double-blind withdrawal study. Daily caffeine intake was measured by evaluating food diaries kept for one week by each subject. The experiment was conducted on two 2-day periods which occurred in most cases exactly one week apart. During one of the 2-day periods, the subjects were given a set of capsules containing the amount of caffeine normally ingested by that subject in one day. During the other study period, the subjects were given placebos. The order in which each subject received the two types of capsules was randomized. The subjects' diets were restricted during each of the study periods. All products with caffeine were prohibited, but to divert the subjects' attention from caffeine, products containing ingredients such as Nutrasweet® and saccharin were also prohibited.

At the end of each 2-day study period, subjects were evaluated using three questionnaires. The first questionnaire assessed depressive symptoms using the Beck Depression Inventory (BDI)InformationBeck Depression Inventory (BDI):The BDI lists 21 items which respondents answer on a scale from 0 to 3 about their severity of depression in response to that item. The BDI was designed primarily to assess depression severity in people already diagnosed with clinical depression, but it can be used with normal subjects as well. When the inventory is given to a non-clinical subject, a score of 15 indicates elevated levels of depression., the second assessed mood states using the Profile of Mood States (POMS)InformationProfile of Mood States (POMS):The POMS records responses to 65 adjectives rated by the subject on a scale of 0 to 4. The POMS measures six factors, two of which are fatigue and vigor. The norms for the POMS scores were developed from a sample of 340 male and 516 female college students., and the third was a checklist developed by the researchers to assess the presence of headaches, level of drowsiness, etc. The subjects also completed a tapping task in which they were instructed to press a button 200 times as fast as they could. Finally, subjects were interviewed by a researcher blinded to the subject's condition to find other evidence of functional impairment. Saliva analyses were conducted to make sure each subject complied with the dietary restrictions.









Nine of the eleven subjects showed evidence of withdrawal symptoms during the period in which they took the capsules that did not contain caffeine. A subject showed evidence of withdrawal if at least one of the following occurred: severe headaches, increased fatigue, reduced vigor, severe depression, or a significant difference between the subject's tapping scores during the two periods. The following criteria, as measured from their responses to the questionnaires, were used to classify a subject as exhibiting withdrawal symptoms.

1) Severe headaches—a score of 3 on a range from 0 to 3.
2) Fatigue—a score at least 2 SD above the norm for college students on the Profile of Mood States.
3) Vigor—a score at least 2 SD below the norm for college students on the Profile of Mood States.
4) Depression—a score of 15 or higher on the Beck Depression Inventory.
5) Tapping—the lowest score of three trials during the caffeine period was higher than the highest score of three trials during the no-caffeine period.


The following variables are contained in the stored data:

Headache-Caffeine = headache score during caffeine period
Headache-NoCaffeine = headache score during no-caffeine period
Fatigue-C = fatigue score during caffeine period
Fatigue-NC = fatigue score during no-caffeine period
Vigor-C = vigor score during caffeine period
Vigor-NC = vigor score during no-caffeine period
Depr-C = depression score during caffeine period
Depr-NC= depression score during no-caffeine period
Tapping-C = mean of three tapping scores taken during caffeine period (beats per minute)
Tapping-NC = mean of three tapping scores taken during no-caffeine period (beats per minute)
Impairment = level of functional impairment during the no-caffeine period, either 'None,' 'Mild,' 'Moderate,' or 'Severe'
Gender = M for male, F for female
Smoker = Y if smoked cigarettes daily, N otherwise
Caffeine Intake = daily intake of caffeine (mg)
Primary Beverage = beverage which accounted for the majority of the subject's caffeine intake, either 'Coffee' or 'Soft Drink'

Data Desk



Text File




  • Question 1
  • Question 2
  • Question 3
  • Question 4
Question 1

Sixteen of the 27 subjects examined in this study were diagnosed as being caffeine dependent. Can we reliably estimate that (16/27) × 100% = 59% of the general population is caffeine dependent? Why or why not? If you answered no, for what population can we estimate that 59% of that population meet the criteria for caffeine dependence?


Learning Objectives
  • Understand the distinction between a population and a sample.
  • Understand what it means for a measurement to be valid, reliable, and unbiased. Understand that the validity of a measurement involves appropriate standardization of measurements.
  • Be able to calculate (using software where appropriate) and interpret the principle measures of center, relative standing, and spread (rates, percentages, mean, median, mode, percentiles, lower quartile, upper quartile, minimum, maximum, standard score, the five-number summary, variance, and standard deviation).

We cannot reliably estimate that 59% of the general population is caffeine dependent. The original 27 subjects were self-selected volunteers who believed they were psychologically or physically addicted to caffeine and were in good health. Thus, all we can say is that of the group of volunteers who believe they are caffeine dependent, about 59% qualify for the clinical diagnosis of caffeine dependence.

Question 2

a) Why were both the subjects and the experimenters interviewing the subjects blinded to whether the subject was receiving the caffeine pills or the caffeine-free pills?

b) Why was the order in which the two series of capsules were taken randomized?

c) Why were the two study periods held one week apart instead of using two consecutive 2-day periods? What is the advantage of having one full week between the start of the two sessions rather than, say, either four days or ten days?

Learning Objectives
  • Understand the notion of confounding. Understand how randomization of group assignments yields comparison groups that are similar with respect to confounding factors.
  • Understand the advantages of a comparative experiment. Be able to identify the basic features of a randomized comparative experiment.
  • Understand three basic techniques fundamental to well-designed experiments: randomization, replication, and blocking. Understand that the first is used to decrease bias and that the last two help to increase precision.

a) Subjects were not told which capsules were the placebos so that they would not develop physical symptoms just from the belief that they should experience withdrawal when they did not ingest caffeine. Because the subjects did not know which capsules were which, differences between the two groups would more likely be caused by the presence or absence of caffeine in the capsule. The interviewers were not told of the subjects' status so that they would not be influenced as they tried to record any evidence of functional impairment. For example, if a subject experienced no unusual problems during one of the study periods, yet the interviewer knew that the subject had been taking the caffeine-free pills, the interviewer might try to elicit examples of impairment from the subject, when in fact none exist.

b) The order of the two series of capsules was randomized so that any differences that might affect the items measured during the two periods would not consistently affect one period over the other. It was also necessary to ensure that the experiment was truly double-blind.

c) The two study periods were not held consecutively to lessen the impact of carry-over effects. If a subject was denied caffeine in the first study period, withdrawal effects might linger during the next few days and confound the results from the second period if the periods are held too close together. Having seven days between the starts of the two sessions ensures that the two study periods fall on the same weekdays, and thus the subjects' routines for the two study periods should be similar. If the two sessions are held either four or ten days apart, some of the study days may fall during the week and some during the weekend. This extra variation from weekday versus weekend routine could confound the results. Thus, it is best to run the experiment on days that are as similar for the subject as possible, and yet choose days far enough apart to minimize any carry-over effects.

Question 3

Each subject's daily caffeine intake was measured by evaluating a diary in which the subject recorded all of the food and beverages that he or she consumed during one week's time. Discuss the advantages and disadvantages of evaluating one week's food diary as opposed to diaries kept for longer or shorter periods. Also comment on any potential biases in such a record that may affect the caffeine intake measurement.

Learning Objectives
  • Be able to identify potential sources of bias.
  • Understand the advantages of a comparative experiment. Be able to identify the basic features of a randomized comparative experiment.
  • Be able to identify problems with poorly designed experiments.


The advantage in keeping a week's worth of food diaries is that it does help illuminate any significant differences in the amount of caffeine consumed on weekdays and weekends, for example. Diaries kept for periods longer than a week might help lessen the influence of days of unusually high or low caffeine intake. A person keeping a food diary during a week of final exams, for example, may show a much higher level of caffeine intake than he or she would consume in a more typical week. The disadvantage to having a subject keep a diary for longer periods is that the subject may become annoyed with the amount of time required to accurately complete the diaries and become very sloppy about recording all food consumed. Bias could enter the measurement if a person does not fill out the record shortly after consuming any food. Later, he or she may very easily over- or under-estimate the amounts of food that were actually eaten. Also, some people who are embarrassed by their actual eating patterns may fudge the records a bit to make their food diaries appear as they want them to appear, rather than reflect the truth.

Question 4

Scores for the fatigue and vigor indices were standardized against a norm calculated from a sample of 850 college students. Five of the eleven participants scored more than two standard deviations below the mean for vigor during the period in which they received no caffeine, where the mean and standard deviation were calculated from the sample of college students.

a) What is the (approximate) probability that a randomly selected person's score would fall more than two standard deviations below the mean on such an exam?

b) Test the hypothesis that the probability that a caffeine-deprived caffeine-dependent individual would score more than 2 SD below the mean is the same as the probability for someone selected from the entire population, versus the hypothesis that the probability for the caffeine-deprived individual is greater. Use your value from Part a for the population probability.


Learning Objectives
  • Understand that the goal of inference is to use the sample statistics to estimate the population parameters with confidence.
  • Understand the difference between the null and alternative hypotheses. Be able to state the null and alternative hypotheses for a given problem.

a) Since the exam was administered to a sample of 850 subjects to create the norms, we can assume that the set of scores used to determine the norm mean and standard deviation are approximately normally distributed. Thus the probability that a randomly selected person's score would fall more than 2 SD below the mean is about P(Z < –2) = 0.023, where Z is distributed as a standard normal.

b) Using the estimate calculated in Part a, we wish to test the hypotheses, H0: P = 0.023 vs. Ha: P > 0.023. Since the value of P is so close to zero and the number of subjects is only 11, we should do an exact binomial test, rather than a test requiring a normal approximation. There were 5 subjects who scored more than 2 SD below the mean. To calculate the P-value for our test we compute the binomial probability

Since this P-value is extremely small, we conclude that a caffeine-deprived caffeine-dependent individual is much more likely to score very low on the test for vigor than someone selected from the whole population. Note: since our subjects are a group of volunteers and not a random sample, it may not be appropriate to make an inference to the whole population of caffeine-dependent persons.




Keyser, D., and Sweetland, R. C. (eds) (1984)

Kramer, J. J., and Conoley, J. C. (eds) (1992)

Strain, E. C., Mumford, G. K., Silverman, K., and Griffiths, R. R. (1994)



This story was prepared by Mike Bowcut and last modified on 5/12/93.