Tuesday, January 15, 2019

The Enneagram, Science, and Christianity - Part 1

If you're looking for part 2 of this article, which discusses some practical questions, please click here.

The enneagram is a psychological and spiritual personality assessment that has quickly become very popular, particularly among Christians as a tool for spiritual growth. I first heard a little over a year ago (mid to late 2017) and I'll be honest, I was a bit embarrassed that I didn't know what it was. Why? Well, I have an MA and BS in psychology, taught psychology for as an adjunct, worked as a behavioral scientist for nine years, and I do research in a psych lab at a university, yet all the people in the conversation knew so much more about the enneagram than I did.

Despite my embarrassment, I swallowed my pride and sought to learn more about it. I asked them about them to tell me more about it and when I got home, I researched it even further. I quickly discovered why I had never heard of it: the enneagram is not a scientifically valid tool for assessing personality and it is not used by professionals (clinicians or researchers). I searched the peer-reviewed scientific literature and I could only find a handful of studies that used it in any way. It wasn't in any textbooks on personality, psychology of religion, or counseling.

Image result for enneagram test


Additionally, I checked into the "experts" on the enneagram. Out of the 20 or so people I checked, not a single one of them has a PhD in psychology or in any field (I did later find a couple people with PhDs who advocate for it, one with a PhD in psychology, but they are not the big names in the enneagram world). Russ Hudson, the president of the enneagram Institute, describes himself as "one of the principal scholars and innovative thinkers in the enneagram world today," yet his LinkedIn page only lists a BA in East Asian Studies. Many others had some sort of theology degree, but still no psychology degree that I found (not even a bachelor's).

All this does not necessarily mean the enneagram is invalid; however, they are huge red flags that should make people go hmmmm. Why is it that all the people who are known for their enneagram expertise have no apparent training in test development, psychometrics, or psychology? Another red flag should be the grand claims of the enneagram. If it can accomplish so many great things, just about every therapist could be using it by now and there would be a multitude of research studies on it.

On the other hand, what about the people who've been helped by it? There are loads personal testimonies from these people. Even for me, when I read the description of my type (I'm a 5), it seems eerily accurate in some ways. So what are we to make of this seemingly conflicting data about the enneagram? Enter science.

Science of Personality
Personality is notoriously hard to assess because it's easy to include non-personality factors into the test such as intelligence, education (correlated, but still different than intelligence), religious beliefs, identity, etc. Personality often correlates to these factors, but a good personality test will discriminate between personality and these other factors. Studying personality scientifically is important because it helps us remove our personal biases so we can accurately assess different measures. This allows us to consider multiple variables and see if it applies to large populations of people rather than being limited to a single person's experiences or best guesses.

The two personality tests that are usually considered the gold standard are the the NEO, which assess personality according the big five traits, often called the big five, and the MMPI. Scientists debate which is better, but the big five is used more in research because it is more accessible. The MMPI is expensive and requires a person to be certified before they can administer or interpret the data. The Meyers-Briggs (MBTI) is the most popular among lay people and businesses because it is simple and flashy, but most scientists don't typically use it because it's validity is questionable (in fact, it's not uncommon for psychologists to openly mock it).

Unfortunately, there is very little scientific data on the enneagram so it's hard to draw definitive conclusions about it. I could only find a handful of peer-reviewed articles that examined the reliability and validity of the enneagram. None of them were in top-tier journals and their methodology was questionable. This does not invalidate them, but does raise more red flags. Either way, I will take these studies at face value and assume they are valid.

Scientific Evidence
Perhaps the most important factor for a personality test is test-retest reliability, which checks to see if the test can reliably reproduce consistent results when someone takes it more than once. Only one of the studies actually looked at this measure. On the one hand, the study showed high test-retest reliability (a little above .80 overall), but on the other hand, it was based on the people who are trained in the enneagram and self-selected their own type both times. The authors explicitly state they did not calculate a test-retest reliability score so they are admitting this statistic does not count. However, when they used a 135 question inventory, the average test-retest score for each individual question was only 55%.

As a comparison, the NEO PI-R which measures personality by the big five factors ranges from .86 to .91 after 3 months and .63 to .83 over 6 years. While personality is fairly stable over time, particularly in adulthood, it does gradually change so some changes should be expected for any personality test. Interestingly, one of the claims many people make about the enneagram is that your type does not change, even from childhood, which is opposed to other personality research.

Related to this is inter-rater reliability which looks to see if two people rate a person the same way. For the enneagram, the highest score for this came from people with at least 2.5 years experience with the enneagram and they only agreed 55% of the time. The scores only went down from there in other studies or when less experienced people were tested. In fact, one researcher who advocates for the enneagram states that trained enneagram practitioners are pretty good (although the data doesn't support this), but they are "not as good as they think they are!".

Another important factor, which is the most common, is the internal consistency (reported as Cronbach's alpha), which checks to see if the questions for each enneagram type are testing the same thing. An acceptable score is considered .70 or higher. The enneagram types ranged from .37 to .82 with at least three of the types falling below the .70 threshold. This means that 18-63% of the variation in scores is due to measurement error! For comparison, the NEO PI-R ranges from .86 to .92 (8-14% measurement error). Since the enneagram is ipsative, meaning the questions force you to choose between two answers instead of choosing the degree to which you agree, the low internal consistency means that most people typically have characteristics of multiple types.

The next factor is predictive validity which checks to see how well the test predicts behavior. One of the studies specifically compared the enneagram to the big five (Sutton, Allinson, Williams 2013), which is great in theory, but they compared apples to oranges so it's hard to draw conclusions. Unfortunately, they compared the enneagram to only single factors of the big five rather than combining scores across all five factors which would have enhanced the predictive utility of the big five. Even so, the big five still fared better even though they used it in a less than optimal way. The enneagram did as well as a single factor of the big five, and in one case, it did better. The authors should have used a multiple regression with the big five to incorporate all five factors before comparing it to the enneagram.

The final study looked at the organization of the types and the notion of having a "wing." One study had participants organize the types based on similarities and the results showed vastly different organizations from how the types are actually organized according to the enneagram. More research needs to be done here, but it does seem to suggest that even if the types are valid, the organization of them on the circle may simply be arbitrary.

Conclusion of Scientific Evidence
Overall, the psychometric properties of the enneagram are mixed. Some properties are below standard thresholds, a few are very good, and a lot of them are right around minimally acceptable standards. It's not a terrible test, but it's not good either. This could change with more research, but I doubt it will unless someone revises it and drastically improves it, in which case, it will be different from what anyone is now using.

**Update: The Wagner Enneagram Personality Style Scales (WEPSS) appears to be a little more reliable than other versions, but still has mixed or uncertain results. Additionally, where it improves in some areas, it creates other issues. I am still waiting to hear back from the company regarding the reliability and validity statistics so I can go see more than just the basic information that was reported in The Fifteenth Mental Measurements Yearbook.

Additionally, the current research only looks at the basic explanations and delineations of each type. The issue is that the enneagram is also supposed to tell a person what their sins and weaknesses are, how they can get healthy, and how they can best relate to other people. These are all additional claims that stem from assumptions about the types, meaning they will have the same degree of error as the type, plus more!

Think of it this way. If you are playing pool and you are off by a millimeter, you may still make the shot. But if you are off by a millimeter when you try to shoot a combo by banking one ball off of another, you will almost certainly miss because the first margin of error will affect the next ball, and multiply the error. Even more so if you try a banking two balls, and so on. This is how the enneagram is supposed to work. As a 5, I supposedly become more like a 7 when stressed and an 8 when I am relaxed. This is like a quadruple combo because it assumes each number is correct, plus the relationships between each number are correct.

My guess is that the sins associated with each type are probably only a little more accurate than a roll of a dice. Some are probably above chance while others are probably below chance. I suspect the same is true for how people are supposed to get healthy, what they do when stressed, the triads, or what their "wing" is (assuming it could theoretically be any number and not just a neighboring number).

Finally, there is no cross-cultural data on the enneagram, so even if it were valid in the U.S., it may not be in other cultures. The big five, however, has been tested in several cultures and has shown to reliably describe personality for people of all cultures. I'm not aware of any culture it does not apply to. The only caveat is that testing it in collectivist cultures has revealed there might be another factor pertaining to interpersonal relatedness.

General Conclusion
Unless you've done graduate work in psychometrics, the scientific data probably doesn't mean a whole lot to you (which is why there are two parts to this article). For those who have studied psychometrics, it's a no-brainer that the enneagram simply cannot do all its proponents claim it can. Any scientist who studies personality would simply look at the reliability scores and conclude the test is not accurate enough to be helpful, and therefore, they wouldn't use it.

I hope this information is helpful and informative, for those who've been silently skeptical of the enneagram and for those who are fans of it. My goal was and is to be as objective as possible, which is why I included statistics that may have been hard to understand. In this article, I mostly wanted to get the data out. In part 2, I explain why the enneagram still seems to work (for some), why it matters if we use it or not, and offer recommendations for better tools that can be used as a replacement.

For thoughts on it from a theological perspective, consider this article from the Christian Research Journal.

Works Cited
Here's a list of scientific(ish) sources I consulted (it does not include the books and websites I used to personally understand the enneagram). Many of these sources are not actually peer-reviewed or they are in low level and inappropriate journals (meaning the reviewers may not be qualified to properly critique the methods, statistical analyses, or interpretation of results). This is due to the limited number of articles available that test the enneagram. Most of these are favorable to the enneagram and therefore, I am accepting these as more valid that I would otherwise to be try to be fair and present the best possible case for the enneagram. There were also a few other peer-reviewed articles on the enneagram, but they were not looking at the validity of it so they are not included here.
  1. Bland, A. M. (2010). The Enneagram: A review of the empirical and transformational literature. The Journal of Humanistic Counseling, Education and Development, 49(1), 16-31
  2. Costa, P. T., & McCrae, R. R. (2010). The NEO Personality Inventory: 3. Odessa, FL: Psychological assessment resources.
  3. Edwards, A. C. (1991). Clipping the wings off the enneagram; a study in people's perceptions of a ninefold personality typology. Social Behavior and Personality: an international journal, 19(1), 11-20.
  4. Matise, M. (2007). The enneagram: An innovative approach. Journal of Professional Counseling: Practice, Theory & Research, 35(1).
  5. McCrae, R. R.; Costa, P. T. (1983). "Joint factors in self-reports and ratings: Neuroticism, extraversion and openness to experience". Personality and Individual Differences. 4 (3): 245–255.
  6. McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of personality and social psychology, 52(1), 81.
  7. McCrae, R. R., & John, O. P. (1992). An introduction to the five‐factor model and its applications. Journal of personality, 60(2), 175-215.
  8. McCrae, R. R., & Costa Jr, P. T. (1997). Personality trait structure as a human universal. American psychologist, 52(5), 509.
  9. McCrae, R. R., & Costa, P. T. (2003). Personality in adulthood: A five-factor theory perspective. Guilford Press.
  10. McCrae, R. R., Kurtz, J. E., Yamagata, S., & Terracciano, A. (2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and social psychology review, 15(1), 28-50.
  11. Newgent, R. A., Parr, P. E., & Newman, I. (2002). The Enneagram: Trends in Validation.
  12. Newgent, R. A., Parr, P. H., Newman, I., & Wiggins, K. K. (2004). The Riso-Hudson Enneagram type indicator: Estimates of reliability and validity. Measurement and Evaluation in Counseling and Development, 36(4), 226-237.
  13. Scott, S. A. (2011). An analysis of the validity of the enneagram. The College of William and Mary.
  14. Sutton, A. M. (2012). But Is It Real? A Review of Research on Enneagram. Enneagram Journal, 5.
  15. Sutton, A., Allinson, C., & Williams, H. (2013). Personality type and work-related outcomes: An exploratory application of the Enneagram model. European Management Journal, 31(3), 234-249.
  16. Wagner, J. P., & Walker, R. E. (1983). Reliability and validity study of a Sufi personality typology: The enneagram. Journal of Clinical Psychology, 39(5), 712-717.
  17. Yilmaz, E. D., Gençer, A. G., Ünal, Ö., & Aydemir, Ö. (2014). From enneagram to nine types temperament model: A proposal. Egitim ve Bilim, 39(173).
  18. Yilmaz, E. D., Gençer, A. G., Aydemir, Ö., Yilmaz, A., Kesebir, S., Ünal, Ö., ... & Bilici, M. (2014). Validity and Reliability and of Nine Types Temperament Scale. Egitim ve Bilim, 39(171).
Here's a link to my Google Drive folder with the Enneagram articles saved in case you want to read them,