
Additionally, I looked up 20 or so different enneagram experts and many have theology degrees, a few have degrees that are in the ballpark of personality science (Jerome Wagner & Beatrice Chestnut, Ph.D. clinical psychology; Don Richard Riso, M.A. in social psychology ), but none appear to have degrees that would offer through training in psychometrics, test construction, and personality, which are all necessary to evaluate the accuracy of such a tool. For example, Russ Hudson, president of the Enneagram Institute, describes himself as "one of the principal scholars and innovative thinkers in the enneagram world today," yet his LinkedIn page only lists a BA in East Asian Studies. This does not automatically mean the enneagram is invalid. I mention it because it helps explain why I hadn't heard of it.
Theological Considerations
There are some valid theological concerns about the enneagram, but most of the critiques boil down to its questionable origins. As a scientist, I'm not concerned about its origins because this has nothing to do with whether or not it's accurate (this is what philosophers call the genetic fallacy), so I'm not going to address those concerns. The only point I want to make theologically is that the Bible charges us to be wise (Matt 10:16), discerning (Phil 1:9-10), and to test everything (1 Thess 5:21 & 1 John 4:1), especially things concerned with spiritual matters like the enneagram.
On the one hand, many people personally testify to the usefulness of the enneagram. Even for me, when I read the description of my type, there are aspects of it that seem eerily accurate. On the other hand, there are aspects of other enneagram types that seem eerily accurate about me, the claims about the enneagram seem to be too good to be true, and the enneagram experts don't have the proper training to substantiate these claims.
So how are we to follow the biblical command to test it, especially when there are seemingly conflicting data about the enneagram? Enter science.
Science of Personality
I'm often told that the enneagram is not a personality test and that it cannot be tested scientifically. As a scientist who studies personality, I can tell you that both of these objections are plainly false. The enneagram makes the same kind of claims as every other personality test. There's nothing magical about it that makes it beyond the realm of science. Therefore, we can and should test it accordingly, which means we need to understand the science of personality.
Personality is hard to assess because it's easy to include non-personality factors into the test such as intelligence, education (correlated, but still different than intelligence), religious beliefs, identity, etc. Personality often correlates to these factors, but a good personality test will discriminate between personality and these other factors. Studying personality scientifically is important because it helps us remove our personal biases so we can accurately assess different measures. This allows us to consider multiple variables and see if it applies to large populations of people rather than being limited to a single person's experiences or best guesses.
The two personality tests that are usually considered the gold standard are the NEO, which assesses personality according to the big five traits, often called the big five, and the MMPI. Scientists debate which is better, but the big five is used more in research because it is more accessible. The MMPI is expensive and requires certification to administer or interpret the data. The Meyers-Briggs (MBTI) is the most popular among laypeople because it is simple and flashy, but most scientists don't typically use it because it's validity is questionable (in fact, it's not uncommon for psychologists to openly mock it).
Unfortunately, there is very little scientific data on the enneagram so it's hard to draw definitive conclusions about it. I could only find a handful of scientific studies that examined it. None of them were in top-tier journals and their methodology was questionable. This does not invalidate them but does raise more red flags. Either way, I will take these studies at face value and assume they are valid.
Scientific Evidence
Perhaps the most important factor for a personality test is test-retest reliability, which checks to see if the test can reliably reproduce consistent results when someone takes it more than once. Only one of the studies actually looked at this measure and they found 79-100% of participants, depending on the type, were in the same type at the pre- and post-tests. This really good, but it was also based on a biased sample of people who are trained in the enneagram and self-selected their own type both times.
As a comparison, the NEO PI-R which measures personality by the big five factors ranges from .86 to .91 after 3 months and .63 to .83 over 6 years. While personality is fairly stable over time, particularly in adulthood, it does gradually change so some changes should be expected for any personality test. Interestingly, one of the claims many people make about the enneagram is that your type does not change, even from childhood, which is opposed to other personality research.
Related to this is inter-rater reliability which looks to see if two people rate a person the same way. For the enneagram, the highest score for this came from people with at least 2.5 years experience with the enneagram and they only agreed 55% of the time. The scores only went down from there in other studies or when less experienced people were tested. In fact, one researcher who advocates for the enneagram states that trained enneagram practitioners are pretty good (although the data doesn't support this), but they are "not as good as they think they are!".
Another important factor, which is the most common, is the internal consistency (reported as Cronbach's alpha), which checks to see if the questions for each enneagram type are testing the same thing. An acceptable score is considered .70 or higher. The enneagram types ranged from .37 to .82 with at least three of the types falling below the .70 threshold. This means that 18-63% of the variation in scores is due to measurement error! For comparison, the NEO PI-R ranges from .86 to .92 (8-14% measurement error). Since the enneagram is ipsative, meaning the questions force you to choose between two answers instead of choosing the degree to which you agree, the low internal consistency means that most people typically have characteristics of multiple types.
The next factor is predictive validity which checks to see how well the test predicts behavior. One of the studies specifically compared the enneagram to the big five (Sutton, Allinson, Williams 2013), which is great in theory, but they compared apples to oranges so it's hard to draw conclusions. Unfortunately, they compared the enneagram to only single factors of the big five rather than combining scores across all five factors which would have enhanced the predictive utility of the big five. Even so, the big five still fared better even though they used it in a less than optimal way. The enneagram did as well as a single factor of the big five, and in one case, it did better. The authors should have used a multiple regression with the big five to incorporate all five factors before comparing it to the enneagram.
The final study looked at the organization of the types and the notion of having a "wing." One study had participants organize the types based on similarities and the results showed vastly different organizations from how the types are actually organized according to the enneagram. More research needs to be done here, but it does seem to suggest that even if the types are valid, the organization of them on the circle may simply be arbitrary.
Conclusion of Scientific Evidence
Overall, the psychometric properties of the enneagram are mixed. Some properties are below standard thresholds, a few are very good, and a lot of them are right around minimally acceptable standards. It's not a terrible test, but it's not good either. This won't change unless someone develops a revised version of it, in which case, it will be different from what anyone is currently using.
The Wagner Enneagram Personality Style Scales (WEPSS) appears to be a little more accurate than other versions but still has mixed or uncertain results. Additionally, where it improves in some areas, it creates other issues. I am still waiting to hear back from the company regarding the reliability and validity statistics so I can go see more than just the basic information that was reported in The Fifteenth Mental Measurements Yearbook.
Additionally, the current research only looks at the basic explanations and delineations of each type. The issue is that the enneagram is also supposed to tell a person what their sins and weaknesses are, how they can get healthy, and how they can best relate to other people. These are all additional claims that stem from assumptions about the types, meaning they will have the same degree of error as the type, plus more!
Think of it this way. If you are playing pool and you are off by a millimeter, you may still make the shot. But if you are off by a millimeter when you try to shoot a combo by banking one ball off of another, you will almost certainly miss because the first margin of error will affect the next ball, and multiply the error. Even more so if you try a banking two balls, and so on. This is how the enneagram is supposed to work. As a 5, I supposedly become more like a 7 when stressed and an 8 when I am relaxed. This is like a quadruple combo because it assumes each number is correct, plus the relationships between each number are correct.
My guess is that the sins associated with each type are probably only a little more accurate than a roll of a dice. Some are probably above chance while others are probably below chance. I suspect the same is true for how people are supposed to get healthy, what they do when stressed, the triads, or what their "wing" is (assuming it could theoretically be any number and not just a neighboring number).
Finally, there is no cross-cultural data on the enneagram, so even if it were valid in the U.S., it may not be in other cultures. The big five, however, has been tested in several cultures and has shown to reliably describe personality for people of all cultures. I'm not aware of any culture it does not apply to. The only caveat is that testing it in collectivist cultures has revealed there might be another factor pertaining to interpersonal relatedness.
General Conclusion
Unless you've done graduate work in psychometrics, the scientific data probably doesn't mean a whole lot to you (which is why there are two parts to this article). For those who have studied psychometrics, it's a no-brainer that the enneagram simply cannot do all its proponents claim it can. Any scientist who studies personality would simply look at the reliability scores and conclude the test is not accurate enough to be helpful, and therefore, they wouldn't use it because the potential for harm will be too high.
I hope this information is helpful and informative, for those who've been silently skeptical of the enneagram and for those who are fans of it. My goal was and is to be as objective as possible, which is why I included statistics that may have been hard to understand. In this article, I mostly wanted to get the data out. In part 2, I explain why the enneagram still seems to work (for some), why it matters if we use it or not, and offer recommendations for better tools that can be used as a replacement.
For thoughts on it from a theological perspective, consider this article from the Christian Research Journal.
Works Cited
Here's a list of scientific(ish) sources I consulted (it does not include the books and websites I used to personally understand the enneagram). Many of these sources are not actually peer-reviewed or they are in low level and inappropriate journals (meaning the reviewers may not be qualified to properly critique the methods, statistical analyses, or interpretation of results). This is due to the limited number of articles available that test the enneagram. Most of these are favorable to the enneagram and therefore, I am accepting these as more valid than I would otherwise to try to be fair and present the best possible case for the enneagram. There were also a few other peer-reviewed articles on the enneagram, but they were not looking at the validity of it so they are not included here.
- Bland, A. M. (2010). The Enneagram: A review of the empirical and transformational literature. The Journal of Humanistic Counseling, Education and Development, 49(1), 16-31
- Costa, P. T., & McCrae, R. R. (2010). The NEO Personality Inventory: 3. Odessa, FL: Psychological assessment resources.
- Edwards, A. C. (1991). Clipping the wings off the enneagram; a study in people's perceptions of a ninefold personality typology. Social Behavior and Personality: an international journal, 19(1), 11-20.
- Matise, M. (2007). The enneagram: An innovative approach. Journal of Professional Counseling: Practice, Theory & Research, 35(1).
- McCrae, R. R.; Costa, P. T. (1983). "Joint factors in self-reports and ratings: Neuroticism, extraversion and openness to experience". Personality and Individual Differences. 4 (3): 245–255.
- McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of personality and social psychology, 52(1), 81.
- McCrae, R. R., & John, O. P. (1992). An introduction to the five‐factor model and its applications. Journal of personality, 60(2), 175-215.
- McCrae, R. R., & Costa Jr, P. T. (1997). Personality trait structure as a human universal. American psychologist, 52(5), 509.
- McCrae, R. R., & Costa, P. T. (2003). Personality in adulthood: A five-factor theory perspective. Guilford Press.
- McCrae, R. R., Kurtz, J. E., Yamagata, S., & Terracciano, A. (2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and social psychology review, 15(1), 28-50.
- Newgent, R. A., Parr, P. E., & Newman, I. (2002). The Enneagram: Trends in Validation.
- Newgent, R. A., Parr, P. H., Newman, I., & Wiggins, K. K. (2004). The Riso-Hudson Enneagram type indicator: Estimates of reliability and validity. Measurement and Evaluation in Counseling and Development, 36(4), 226-237.
- Scott, S. A. (2011). An analysis of the validity of the enneagram. The College of William and Mary.
- Sutton, A. M. (2012). But Is It Real? A Review of Research on Enneagram. Enneagram Journal, 5.
- Sutton, A., Allinson, C., & Williams, H. (2013). Personality type and work-related outcomes: An exploratory application of the Enneagram model. European Management Journal, 31(3), 234-249.
- Wagner, J. P., & Walker, R. E. (1983). Reliability and validity study of a Sufi personality typology: The enneagram. Journal of Clinical Psychology, 39(5), 712-717.
- Yilmaz, E. D., Gençer, A. G., Ünal, Ö., & Aydemir, Ö. (2014). From enneagram to nine types temperament model: A proposal. Egitim ve Bilim, 39(173).
- Yilmaz, E. D., Gençer, A. G., Aydemir, Ö., Yilmaz, A., Kesebir, S., Ünal, Ö., ... & Bilici, M. (2014). Validity and Reliability and of Nine Types Temperament Scale. Egitim ve Bilim, 39(171).
Here's a link to my Google Drive folder with the Enneagram articles saved in case you want to read them,
Super interesting post, thanks, Jay!
ReplyDelete"Related to this is inter-rater reliability which looks to see if two people rate a person the same way. For the enneagram, the highest score for this came from people with at least 2.5 years experience with the enneagram and they only agreed 55% of the time."
55% actually seems surprisingly high to me. The random chance of two experts picking the same type would be 11% (1/9), right? So that's a lot better, far from random, no? What do you think?
"Another important factor, which is the most common, is the internal consistency (reported as Cronbach's alpha), which checks to see if the questions for each enneagram type are testing the same thing. An acceptable score is considered .70 or higher. The enneagram types ranged from .37 to .82 with at least three of the types falling below the .70 threshold. This means that 18-63% of the variation in scores is due to measurement error!"
Could you explain this a bit more? What exactly does the text do when you say it "checks to see if the questions for each enneagram type are testing the same thing"? How does that happen? Or which study should I look at to understand this?
Once again, thanks so much for this, really interesting!
Hi Nicolai. Thanks for reading and commenting. I apologize for the delay as I only just saw your comments yesterday.
DeleteYou are right that chance would be 11%. The enneagram experts in that study did do better than chance but it's still a far cry from reliable.
Internal consistency (Cronbach's alpha) is a measure of how consistent the questions of a scale are. If I create a scale to measure one thing, then people should answer most or all of those questions in the same way. For instance, if I ask 4 questions about whether you are an extrovert, you probably won't say "strongly agree" to all the questions, but most your answers will be close to the same (e.g. slightly agree, agree, strongly agree, agree). If everyone one question has a very low correlation with the other questions, then it is likely measuring something else and not extroversion. Cronbach's alpha is one of the ways we quantify this. It's not the only measure nor is it perfect, but it's pretty useful. Any stats textbook should have an overview of it and there are lots of good sources if you just Google it.
The review article you posted in your other comment is pretty good for the most part. They do a pretty comprehensive report of the studies with the enneagram but they overemphasize where it does well and gloss over or downplay some of the results or methods that are unfavorable to the enneagram. For instance, they report that the enneagram was supported in some studies when the studies found support for hypotheses unrelated to the enneagram and they don't discuss major methodological flaws with some of the studies.
I hope this is helpful and please let me know if you have other questions. Thanks again.
I'd also be interested, Jay, what you make out of this new review study in Clynical Psychology. I'll read it myself, but am nota psychologist nor a psychometrician, so would be interested in your expert opinion.
ReplyDeletehttps://onlinelibrary.wiley.com/doi/abs/10.1002/jclp.23097
I'm not necessarily opposed to the Enneagram being invalid and was very interested in reading what you had to say. But I'm not sure you understand that the Enneagram is not actually based on traits and personality elements. It's about a Core Fear and Core Desire (or, motivation). This is why Enneagram teachers say your type doesn't change, because while personality traits may evolve or change or not be common of a certain type, these core beliefs and feelings are what everything is based on. I don't ever recommend someone take a test to find out their type. Because it's ultimately not about traits in the first place, it's a self-discovery tool and only each individual can discover what their core fear and core desire is. What I like to do is sit down with someone and read through the Core Fears and Core Desires until one of them resonates with the person, and that's how they begin finding their type.
ReplyDeleteNow, is the Enneagram ultimately completely invalid? Or demonic? Not of God? Possibly. I just learned about these possibilities today - hence my research. But I've come across your kind of argument before about the tests and the personality traits, and it's simply a faulty premise. It's about motivations, not traits. My understanding is that it really was never meant to be a little test that someone takes in the first place. I wouldn't mind reading an argument against the Enneagram that takes into account how it's helped people and the fact that it's not ultimately based on traits.
Maybe in the end what I'm saying doesn't make any sense. But this is how I've always understood it, and it's also why I have preferred it above other personal-growth tools.
Thanks!
Thank you for your comment and sorry for the delayed response. I only just saw it today.
DeleteWhat you're saying about traits vs. motivations makes sense and with that understanding, I can see why you think I've missed the point. The enneagram uses different terminology than personality scientists so trying to translate between the two can lead to misunderstandings. In this case, what the enneagram refers to as core motivations, desires, or fears all fall under the umbrella of what personality scientists call traits. A trait is any stable or semi-stable pattern of thinking, feeling, or acting, which certainly includes what the enneagram refers to as core motivations or fears.
Even if you don't agree with me on that point, the enneagram does make claims about the traits that correspond to the core motivations and fears of each type. This means that if the enneagram is wrong the traits associated with each type, then we should ask whether it's right about core desires and fears and if we're still using the enneagram if we've disregarded a huge number of it's claims.
I'd be interested to hear what you think. Thanks again.