Quantitative methods for rating wines - a preliminary lit review

I am preparing a much larger article, or series of articles, on the subject of statistical and methodological validity of wine rating systems.  It’s a major undertaking…so major, in fact, that from my literature reviews it seems few people have tried to tackle this subject before.  Basically, my mind started churning on a simple problem: If many wine critics and reviewers agree that the 100-point system is flawed, and if there are several competing systems out there to quantify wine quality numerically, which system is the best?  When I say “best” I mean, which system is the most accurate and repeatable among many critics?

Here’s a list of the journal articles and books I was able to track down that appear to relate to my subject:

  • Ashenfelter, Orley, and Quandt, (1999), “Analyzing a wine tasting statistically”, Chance, New Directions for Statistics and Computers, 12 (3) , 16-20
  • Lindley, Dennis V.  (1993), “The analysis of experimental data: The appreciation of tea and wine”, Teaching Statistics, 15 , 22-25
  • Amerine, Maynard A. and Roessler, Edward B.  (1982), “Wines: Their sensory evaluation”‘, W. H. Freeman & Co (New York)

Of these three, only the Ashenfelter and Quandt article seems relevant to my topic.  Happily, I discovered the Liquid Assets Web site (run by Ashenfelter and Quandt themselves).  At this site, the journal article listed above is provided, along with several other relevant articles.

Interestingly, Ashenfelter and Quandt seem to have focused on comparative wine tastings as their method of choice.  Additionally, they focus on using advanced statistical analyses to check the wine tasters’ results against one another; they developed a software program, in fact, that conducts these analyses and generates a final report automatically.  As they put it, ”The primary goal in the analysis of a wine tasting is to determine the extent to which the conclusions that have been drawn are likely to be reproduced on another occasion.”

All of this makes sense to me, but what I was hoping to find was a statistical analysis of wine rating systems, not blind tasting results.  Blind tastings tend to involve judging wines against one another, whereas individual wine tastings and ratings simply assign a score to each wine.  This score then becomes the widely printed and reprinted quantitative measure of that wine’s quality.  But how accurate is this sort of assessment?

That’s the question I want to answer using statistics, sound methodology, and survey design techniques.  I learned all about this stuff as an engineering masters student, so I know how to design a study, for example.  Because of this background, my interest in this topic was piqued when I read Robert Parker’s description of how he scores a wine.  He and I agree that the 20-point systems don’t make sense, and to that I would add the Wine Spectator’s 16-point system (85-100 points possible in printed reviews).  But I have some questions about Parker’s system too, questions that I will investigate in another blog entry in the next few days.  In the meantime, you can see how he ranks his wines here.

8 Responses to “Quantitative methods for rating wines - a preliminary lit review”

  1. cml Says:

    In one of your other posts you refer to those that don’t share your conception of reality. From what I understand, that conception is rather more towards subjectivism than objectivism.

    I’m curious as to whether you think that any wine rating system could really get beyond the subjectivism inherent in taste (literally in this case). That is not to say that one person’s subjective appreciation of something is necessarily totally unknowable or that two people can’t have substantially similar tastes on specific matters (in fact, one might say that shared tastes lead to some people remaining friends for decades while others drift off, never to be seen again). Still, I wonder if it wouldn’t be necessary to have several different systems so that people could determine which system matched well with their own taste. Alternatively, the system would have to have greater dimension than the simple numerical scale that is generally utlized today. In a sense, just such a system is already in place in that wines are divided into various categories depending upon region and grape type and only then (hopefully only then) rated. However, it seems like some further categorization would be necessary, this time directed at the individual drinker rather than the wine.

    As Nietzche said, out of chaos comes order . . . wait, no, the other thing he said was that all systems are lies. So, it seems like the trick to making the rating system objective is to find the happy balance between particularization and abstraction. In my mind, that would include accounting for different individual tastes. Thoughts?

  2. huevosconvino Says:

    It is my perspective that objectivity is a myth, much the same as the idea of “normality” with regard to an individual as compared to an entire population. Nobody is “normal” in the sense that a person has no features that separate him or her from the rest of humanity. If each person is slightly different (abnormal) from every other person, there can be no objective point of view…only the subjective POV that each person maintains.

    Therefore, the idea that an objective review can be achieved is, in my mind, impossible. However, I believe that an empirically-derived and scientifically tested procedure for tasting and ratings wines would enable direct, accurate comparison among reviewers’ results. In other words, one 90-point review would be equal to any other 90-point review because the review criteria would be the same. Even if two reviewers arrived at the same score through slightly different measures (better or worse aroma vs. better or worse taste, etc.), the resulting scores would be comparable. All of the subjectivity would be assigned to the people reviewing the wines. Today, both the review systems AND the reviewers’ opinions are subjective. I think we can do better.

    You seem to be suggesting that wines have such variance in terms of their drinkability, so to speak, that several systems would be required for such a complex beverage. I’m not so sure, though. I think the place to account for individual taste is in the written review itself. The score, on the other hand, is ipso facto represented as a quantitative measure when it’s not, at least not yet.

    My essential point is that we should attempt a scientific study to see whether the process of assigning a point value to a wine can be made quantitative (objective) rather than qualitative (subjective).

  3. cml Says:

    I agree that one could devise standards that would allow people to reach similar results for the same wine. Although it would be difficult, they could calibrate their responses based off some sort of baseline (an analogue would be the way AP tests are graded–about ten people sit around a table and read the same exams and grade them until they all begin to arrive at the same scores).

    However, I don’t think simply being able to arrive at the same score necessarily addresses my concern, which is that although people can agree that a wine deserves a given score, if graded on a straight numerical scale the number will not adequately tell individuals what that number means to them unless they are calibrated as precisely so that they can understand what the rating means. They would have to know exactly what sort of taste the reviewers were looking for in assigning the score. That seems unlikely to me.

    As you suggest, the written description can cover some of this. However, the problem with that is it just shunts the system’s ambiguity off into another portion that is recognized as inherently ambiguous. We do that all the time, but it also gives the number a veneer of accuracy that it simply doesn’t possess–we’ve just hidden the fuzzy edges out of view.

    I think my main point is that perhaps such vagueness is inevitable when dealing with matters of taste, especially when you consider that the apparent quality of the wine is so dependent on factors outside the bottle. If you have an extra ten dollars, should you spend it on the wine or on getting a really high quality fourme d’ambert to eat along with the wine? And how about the walnuts? Where do you buy them? It seems like any sort of strict rating system is deracinated from the eating experience.

  4. huevosconvino Says:

    These are excellent points. In particular, I think you may be right with regard to the usefulness of a written review: I feel the current style of wine reviews seems to be a melange of several nouns, usually names of fruit, sprinkled with a selection from the same basic handful of adjectives (a “cherry oak hedonist’s delight” or “bursting with blueberry and aroma of cedar,” for example). For the average wine drinker who can’t really distinguish between the many aromas or flavors in a glass of wine, the choice comes down to something else: a score, perhaps, or most likely a combination of cost, high reviewer scores, and how cool the label looks.

    As for the usefulness of a numeric score, you may be right there too…I certainly think the general GRE exam is, for example, particularly silly as an indicator of preparedness to participate in graduate studies. But at the same time I think it would help homogenize the meaning, and possibly the reporting, of scores among professional wine reviewers. Even amateur reviewers could get in on the act if there was a rigorous methodology.

    Parker starts down this road by assigning points to color/appearance (5 points), aroma/bouquet (15 points), flavor/finish (20 points), and something he describes as “overall quality level or potential for further evolution and improvement—aging,” which “merits up to 10 points” (http://www.erobertparker.com/info/legend.asp). I think I’d like to see a more rigorous, empirically designed and proven version of this sort of scale.

    But, to address your main point: If you drink a glass of Stag’s Leap Chardonnay at home by yourself, and if you drink the same wine at a family barbecue, and if you drink the same wine at a restaurant with lousy service…well, you see where I’m going. It isn’t the same experience even if it’s the same wine. If objectivity truly is a myth, so is the possibility of tasting the same wine under the same conditions more than once. While a few tasting experiences may be nearly the same, they won’t be identical.

    For this reason, I would love to see an empirical rating system devised so someone like Robert Parker could blind-taste the same wine 50 times in 50 days, for example, noting his score every time. I would love to see what sort of variance appears in that set of scores and in his tasting notes.

  5. cml Says:

    When I get back to Seattle this fall, if you haven’t done so before then I’m going to set up just the test you propose in your last paragraph, except it’ll be you and not Robert Parker doing the tasting.

    I see what you mean about Parker’s system being a bit wishy-washy. If half as many points as are available for actual taste are assigned to how good the wine might be at some future point, that seems like a built-in bullshit factor. Since wines are consumed at a wide variety of ages, that part of the score means absolutely nothing. Plus, if you’re going to assign points for color, why not assign points for label design?

    Would you change the categories he uses entirely? Or do you think the main issue is more rigorous standards for judging within the different fields?

    I thought that you might be interested in my comment about the written portion. As you may have recognized, my comment is just a restatement of one of deconstructionism’s basic principles. I wouldn’t want to build my entire career around that interpretive method (partly because it seems a bit played out at this point), but it does offer some interesting insights when dealing with possible ambiguity and vagueness. Anyway, might be worth exploring a bit more when it comes time to write your article.

  6. huevosconvino Says:

    Ha! If I had the money to taste the same wine 50 times in 50 days, opening a new bottle every time, I’d be…well…Robert Parker.

    The strangest thing about Parker’s rating system is his note on the color of the wine. He says that nearly all modern wines get a 4 or a 5 on this scale (out of 5) because modern technology enables more successful wine production, which results in uniformly nice-looking wine. So why even have color as a rating method? It just inflates all the scores by a few points by default. And as you rightly point out, the color of a wine changes with age (dramatically so sometimes). But that doesn’t necessarily mean the wine tastes worse…in fact, it tastes better to a point.

    The big question to me is the categories. Which categories make sense, and which should be ignored? And then how would you weight each category so it’s in line with all the others? Frankly, once you get beyond the categories for aroma and flavor, it’s pretty tough to know what else to include in the score. The ability to age a wine plays into Parker’s score, but much like the wine’s color I don’t think that’s a useful measure. Who cares how long the wine can be aged? If I want to drink a 95-point wine tomorrow, it’s still a 95-point wine, right? But if it won’t age for 20 years, does that somehow give the wine a 92-point rating even if I want to open it now? I don’t think so.

    Personally, I think if you’re going to use categories, you should look at a couple of elements that don’t, at first, seem directly related to enjoyment of the wine. For example, for my $10 I would prefer to purchase a wine of which only 426 cases were released, rather than a wine that had 25,000 cases released. Why? Because the 426-case wine is a little harder to come by, it’s helping support the small regional grower who can’t possibly hope to get Trader Joe’s distribution. By supporting this type of winery, I am helping keep as much heterogeneity in the wine industry as possible, and that’s a very good thing in any industry, especially wine. so perhaps there are other aspects of a wine that merit attention…maybe not as part of a numeric rating system, but somehow they should be addressed.

    Finally, I think any numeric rating system for wine should be about 5-8 points total, give or take…and I think it should work backwards, like golf. Therefore, the best wines get a rating of 1 because they have the least aspects to detract from their enjoyment. The worst wines rank 7 or 8. Something like that. But I’d want to study this idea more before I make such a sweeping judgment.

  7. cml Says:

    So when making the evaluation, do you know what the wine is before you rate it? If so, that seems to automatically incorporate factors outside of flavor or aroma. If you go down that road, you’re incorporating social factors (like getting to tell people that they are drinking wine from a small vintner rather than a large one–which I would argue is directly related to your, and by “your” I specifically mean you, enjoyment of the wine–, having a prettier color wine to go with your centerpiece), and at that point you might as well add back in color and add label design. Not that those are necessarily wrong, it’s just that it seems you must make a choice when designing the rating system between socially neutral results and results that incorporate social factors.

    And don’t worry, we’ll just use a wine cube so we don’t have to open a new bottle every time . . . ha! Or maybe just do some number lower than 50.

  8. huevosconvino Says:

    Here’s what I’d suggest: a blind tasting of a flight of wines with a set of criteria based solely on the aroma and flavor of the wine, followed by another tasting using the same criteria, but the second time you can see the wine and the label, etc. I don’t know…I still need to think about how to do this, exactly, but I’m thinking of a system in which you do two tastes of the same wine but in a random order and with several other wines. Something like that. So I think the taste- and aroma-specific rating system would exclude everything else…but then maybe I’d add some context to the rating by looking at the number of cases produced (which I agree is something I personally care about, but not something that others care about), label design, Web presence, and god knows what else.

    Yes, let’s taste a box wine 50 times in 50 days…or, simply, 50 glasses in 5 days. Either way. *)

Leave a Reply

You must be logged in to post a comment.