Is the Enneagram Legitimate for Spiritual Growth?

 This article is the paper I read at the Evangelical Theological Society annual conference in 2021. It's pretty long but is organized with lots of headings. It has also been adapted for oral presentation so citations may have been removed in adapting this paper as these are not always relevant for oral presentations, but I still have attempted to cite my work to avoid the appearance of plagiarism. Where I am merely reporting what others have done or said compared to my own ideas should be clear from the context and content of each statement. It is a product of my current research and is still limited in the scope of its analysis of the enneagram.


I’m assuming that if you are here, you have probably heard of the enneagram by now. For the few who may not have heard of it, the short answer is that it’s a personality typing system and personal growth tool. In the past 5 years, there has been an explosion of books, podcasts, businesses, and YouTube channels devoted to the enneagram. Available resources claim that it can improve every area of your life, including, but not limited to your own well-being, marriage, sex life, leadership, business practices, parenting, and most relevant for this conference, your spiritual growth and discipleship. 

Within the church, people are mixed. Some ardently oppose it while others elevate its status to somewhere between Jesus and the apostle Paul. I’m only partially kidding about the fervor people have for it, particularly Christians. There are seminaries using it, churches are teaching or hosting seminars on it, and enneagram books are often found in the Christian section of bookstores. Most of the Christian critiques of the enneagram focus on its suspicious or occultic origins and the theological issues with relating a self-help tool to salvation. Some of these critiques are valid concerns while some are probably better left as issues for an individual’s own conscience as Paul explains in Romans 14 and 1 Corinthians 8.

I want to take a different approach to understanding the enneagram and evaluate it based on its own merits. I was first introduced to the enneagram while in seminary in 2017 by friends who were using it to aid their evangelism efforts. It was explained to me as a new and revolutionary approach to understanding people that was way better than anything else. At the time, I already had a master’s degree in psychology and had spent the last few years teaching college psychology courses, so I was a little embarrassed that I had not heard of it. Still, I just assumed I had missed a major new development and so I set out to learn more about it.

As I sought to learn about the enneagram, my primary concern was whether it accurately describes people, and this is the same approach I will take for this paper. I will only mention issues of its origins as they relate to its accuracy. The degree to which the enneagram can be successfully used for spiritual growth is largely dependent on its accuracy and other potential mechanisms for growth. In other words, if it makes false statements about the way people are, how can we expect it to change people for the better? Before discussing the evidence, I will explain in more depth what the enneagram is as this will be important to understand why it is subject to scientific testing and now it can be done.


What is the enneagram?

In the most basic sense, an enneagram is a shape. The prefix ennea- is nine in Greek just like hexa- is six and penta- is five in Greek, so enneagram just means nine-sided figure. The enneagram as a personality system gets its name because it has 9 different personality types, which are organized in ascending order on a circle. Think of it as a clock with 9 at the 12 o’clock position. 

The types are the primary aspects of the system but also just the beginning of it because the organization of the types is also important. See figure 1 for a visual (or your phone). Each type is connected to several other types in different ways. Every type has two wings, which can be thought of as a subtype. An important aspect of enneagram theory is that the wings can only come from one of the two adjacent types. For instance, a 2 can only have a 1 or 3 as their wing and is often written in short form as 2w1 and 2w3. 

The next most prominent feature of the enneagram are the intersecting lines connecting the types. These are the directions of growth and stress, which correspond to the direction people move to when they are under stress or when they are healthy. These are also referred to as the direction of integration and disintegration. For example, a type 1 will be more like a type 4 when stressed, but more like a type 7 when healthy. 

According to enneagram theory, your type (and to some extent, your wing) tells you who you are, which includes your strengths and sinful tendencies. The directions of growth are then the solution to overcoming your sinful ways so that you can mature personally, relationally, and spiritually. 

The final major aspect of the enneagram I will discuss are the three triads, which describe the general way of thinking for the three types included in each triad (Riso & Hudson, 1996), and are represented by the lines that look like an upside-down peace sign on the diagram. The three triads are the instinctual types (Types 8, 9, and 1), the feeling types (2, 3, and 4), and the thinking types (5, 6, and 7; Wagner 2021). 

There are many versions of the enneagram but for the most part, they’re all the same. They may use slightly different terminology, often swapping synonyms to describe the types and the system, but there are some minor disagreements regarding the content of the types. The main aspects of the system are largely consistent from one version to another (Wagner, 2021).


Type 1 Description

For the sake of time and lack of necessity, I won’t go through all nine types but I will give a brief description of types 1s to give a better idea of how the system describes people and how it functions. Type 1s are referred to as The Reformers (Riso & Hudson, 1996), The Good Person (Wagner 2021), and The Perfectionist (Cron & Stabile, 2016; Palmer 1988), but they all describe type 1s as good, conscientious, perfectionistic, and idealistic, amongst a long list of other attributes. 

A more detailed description according to Riso and Hudson’s EnneagramInstitute, says that type 1s are “conscientious and ethical, with a strong sense of right and wrong. They are teachers, crusaders, and advocates for change always striving to improve things, but afraid of making a mistake. Well-organized, orderly, and fastidious, they try to maintain high standards, but can slip into being critical and perfectionistic. They typically have problems with resentment and impatience. At their Best: wise, discerning, realistic, and noble. Can be morally heroic.”

Basic Fear: Of being corrupt/evil, defective 
Basic Desire: To be good, to have integrity, to be balanced 
Key Motivations: Want to be right, to strive higher and improve everything, to be consistent with their ideals, to justify themselves, to be beyond criticism so as not to be condemned by anyone. 
Addictions: Extreme dieting (fasts, diet pills, enemas) and in extreme cases anorexia and bulimia or alcohol to relieve tension. 

For personal growth, they suggest that type 1s “Learn to relax. Take some time for yourself, without feeling that everything is up to you or that what you do not accomplish will result in chaos and disaster…Your Achilles' heel is your self-righteous anger…Try to step back and see that your anger alienates people so that they cannot hear many of the good things you have to say. By now we should have a pretty good idea of what the enneagram is and how it works so we turn to the question of whether it can be scientifically tested.


Scientific Testing

Psychologists study a lot of different things: personality, IQ, depression, religiosity, well-being, marital satisfaction, memory, and just about everything else you can think of that pertains to people. As you can imagine, the better we can quantify these constructs, the better we can use them to understand people, predict their behavior, and improve their well-being. Every measure we use is tested for accuracy, which is part of the reporting standards for what to include in journal articles for every scale used in a study. 

Despite the claims of some enneagram advocates, there is nothing magical about the enneagram that makes it immune to scientific evaluation. In fact, every study testing the enneagram, except one, has been done by a proponent of the enneagram, which is evident by their hypotheses that predict the enneagram will be supported. The EnneagramInstitute proudly proclaims that their enneagram scale, the RHETI 2.5, “has been independently scientifically validated.” 

As an example, I counted the number of empirically testable claims just in the short description of type 1s and there were 41, not including relationships between claims. Claiming that type 1s are conscientious and impatient can be empirically tested by comparing type 1s to other types on conscientiousness and impatience. These are two direct claims, but there’s a third, indirect claim that stems from these two. If types 1s are impatient and conscientious, there should be a higher correlation between these two traits than for traits associated with other types. I did not count these hidden or indirect claims because that would have led to an astronomically large number of claims. 

These are just claims about a single type. The enneagram also makes claims about how the types relate to each other, how people act when they are stressed, how people make decisions, and how people can become healthier or more integrated, all of which are scientifically testable claims that are very similar to the types of things psychologists test every day.



When psychologists evaluate a scale for accuracy, they immediately look at the consistency of the scale. Wagner and Walker (1983) were among the first to develop and test their own version of the enneagram, the WEPSS. Their results yielded low consistency within the types with scores ranging from α = .37 to .78. For reference, .70 is typically considered the lowest acceptable value for this metric. 

Newgent et al. (2004) examined the reliability of the Riso-Hudson Enneagram Type Indicator (RHETI) and found reliability coefficients for the nine types in their study ranged from α = .56 to .82 with three types falling below α = .70. Dameyer (2001) found the internal consistency for each type on the WEPSS and RHETI ranged from α = .35 to .84.

In contrast, Tastan (2019) and Yilmaz et al. (2014) developed novel enneagram scales and found slightly higher average values for their scales (α = .84 and α = .75, respectively). Yilmaz et al. (2014) only had one type fall below α = .70, but it’s unknown how many were below .70 for the Tastan (2019) scale because values for each type were not reported. Demir et al. (2020) also tested the Tastan enneagram scale and found lower values for internal consistency, averaging α = .76. Sharp (1994) conducted a factor analysis on each of three enneagram scales and found only five factors rather than nine.

However, Wagner (WEPSS Manual, 1999) found a 9-factor solution for his scale, but the Mental Measurements Yearbook notes that important details of the analysis were omitted, and the analysis was likely incorrect (exploratory factor analysis with forced factors and no confirmatory factor analysis). Yilmaz et al. (2014) also did a factor analysis with his version of the scale and found nine factors, but the scale items for his version and Wagner’s are not published in peer-reviewed journals so they cannot be properly critiqued.

While this may sound like a minor detail, transparency for the questions is vital for evaluation. For her dissertation, Scott (2011) helped create a new version of the RHETI, which is the basis for The Enneagram Institute’s claim that their test has been “independently scientifically verified.” She conducted several factor analyses to revise the scale and eventually achieve a nine-type solution. This is a good and acceptable process when done correctly, but the methods she used were suspicious at best. Not only did several questions fail to group with other questions for that type, some grouped with the wrong type. Rather than stating that The Enneagram Institute incorrectly predicted personality for those types, she simply used these questions as items for the other type and The Enneagram Institute did not change any of their claims.

Whether Yilmaz et al. (2014) and Wagner (1999) used similar or other questionable practices to increase the scores of their scales cannot be known without access to these scales and further investigation. The takeaway from these studies is that the enneagram types are not clearly delineated types. Imagine a painter’s palette with 9 colors on it. Enneagram theory claims there are nine distinct colors that have very little mixing between them. However, the results of these studies suggest the types are more like 9 shades of brown with just a hint of remaining color still discernable. In other words, the enneagram types are largely overlapping with very little distinctiveness to each type.


Multi-test Consistency

Another common way to evaluate a scale is by checking to see how consistent it is between multiple tests or human raters. If enneagram theory is correct, there should be very high agreement. If the enneagram types are more like shades of brown, low agreement would be expected. 

When examining how people score when taking the same test twice, Wagner and Walker (1983) found correlations between pre- and posttests that ranged from r = .17 to .78 with an average of r = .53 (SD = .11), which is very low. For reference, scores from r = .50 to .74 are considered poor to moderate (Portney & Watkins; 2015). When Wagner and Walker (1983) compared how people typed themselves after an enneagram training compared to how the scale typed them, they found inter-rater reliability between κ = .28 and .40 for the pre- or posttest. 

Demir et al. (2020) found virtually non-existent agreement between the Tastan enneagram scale and his own enneagram scale (κ = .11). For reference, scores between κ = .21 and .39 indicate minimal agreement (McHugh, 2012). Dameyer (2001) directly compared two enneagram scales (RHETI and WEPSS) and three independent enneagram experts (Don Riso, Jerry Wagner, and Virginia Price who is an associate of Helen Palmer). She found that only 42% of participants were classified as the same type by both scales, which is higher than chance but still low. She also found that the correlations between three expert judgments about the attributes of each type ranged from r = .09 to .94, ranging from virtually no correlation to high. 

These results indicate that the people do change their types, the types do not accurately describe people, the types are very similar to each other, or any combination of the three. Regardless of which is the case, they all show that the enneagram makes several false claims about personality. 


In addition to testing the consistency of the enneagram, other tests have sought to examine its validity. 

Koocher et al. (2015) polled 150 doctoral-level mental health experts to rate the degree to which they felt various psychological treatments or assessments have been discredited. The enneagram was rated as the second most discredited tool on the list of 89 items, behind the Szondi personality test (but ahead of the Rorschach inkblot test). 

In the only critical study of the enneagram, Edwards (1991) tested to the concept of wings by asking participants to arrange the types in a circle so that adjacent types would be connected and found that neighboring types were not placed together more frequently than chance. 

Maxon and Daniels (2008) conducted a twin study on enneagram types and found that twins did not share the same type more frequently than chance, which did not support their hypothesis, nor does it agree with other literature showing robust results for the heritability of personality traits among twins (Polderman et al., 2015). 

When relating enneagram types to other measures of personality, the enneagram does a little better (Hook et al., 2020). The results generally show that the enneagram types correlate with the other personality measures as hypothesized, but not all expected correlations are supported, and the strength of the correlations have been lower than expected. 

Once again, these studies all failed to support the claims of the enneagram, suggesting that the claims are exaggerated or plainly false.


Applied Tests

A handful of studies have sought to test the efficacy of the enneagram in practical situations. Godin (2013) found no significant effect on psychological well-being or unconditional self-acceptance after training participants on the enneagram. 

A study by Daniels et al. (2018) hypothesized that training participants on the enneagram system would lead to a stronger sense of identity but found no differences between the experimental and control group. Thrasher (1994) and Twomey (1995) both tested participants under stress and found that they did not act according to their enneagram stress direction.

Sutton (2013) compared the enneagram to the Big 5 for how well it could predictive other employment outcomes. She found that the Big 5 better predicted job self-efficacy and perceived stress by large margins whereas the enneagram better predicted job involvement; however, she did not use all five factors of the Big 5 for her comparison, making the results tentative at best because she compared the full enneagram to only a portion of the Big 5. 

The enneagram system, the experts who profit from it, and the scales used to type people have been tested in several ways and every single test contradicts the grand claims of the enneagram. The types seem to correctly identify how some people are, but in general, they are inaccurate and overly simplified descriptions of people. The secondary aspects of the system, including the directions of growth, which is the alleged mechanism for spiritual growth, seem to be no more accurate than random chance.


Why does the enneagram seem to work?

At this point, you might be wondering, if the scientific evidence against the enneagram is so strong, then why are so many people so strongly convinced that it works and how has it withstood the test of time? Once again, we can find answers to these questions from psychological science. However, to do this, I want to try something a little out of the ordinary for this type of forum. Rather than just tell you, I want to show you. As I read the following list of personality descriptions, count how many of them apply to you.
  1. You have a tendency to be critical of yourself.
  2. You have a great deal of unused capacity which you have not turned to your advantage.
  3. While you have some personality weaknesses, you are generally able to compensate for them.
  4. Disciplines and self-controlled outside, you tend to be worrisome and insecure inside.
  5. You pride yourself as an independent thinker and do not accept others’ statements without satisfactory proof.
  6. At times you are extroverted, friendly, sociable, while at other times you are introverted, wary, reserved.
Bertram Forer (1949) took this list and seven similar statements from a newsstand astrology book and used it for a study on personality. Students were given a personality test and a week later, they received this list as the description of their personality, which they thought was unique to them. All 39 students rated the descriptions positively and 38 out of 39 rated it as very good (4) or perfect (5). This has come to be known as the Forer effect, and it describes our tendency to rate general personal statements highly accurate of us individually.

Here’s a fun video from an old Dateline episode showing the same thing.

For this next one, I’m going to need everyone to participate by trying to remember a list of words. When I’m done reading them, write down as many as you can remember.
















Quickly, try to write as many as you can remember.

Let’s skip right to the point. Please raise your hand if you had the word window on your list. If you did, you are like 84% of people. Unfortunately, window was not on the list. All the words related to window, but I didn’t say it. This is known as the DRM procedure, and it demonstrates how easily our memory can be selective or misled.

There are countless other ways that our minds are unconsciously tricked. When we watch a video of someone saying fa-fa-fa but the sound is dubbed over with the sounds ba-ba-ba, our brain overrides the auditory signal for the visual signal and we hear the F-sound even when we know it’s incorrect. This is called the McGurk effect. Then there’s confirmation bias, self-fulfilling prophecy, the false-consensus effect, belief bias, the backfire effect, the placebo effect, attribution errors, and several other observed effects that can explain how and why so many can believe something that is demonstrably false. 

The book Thinking, Fast and Slow is perhaps the best-known book that popularizes these types of cognitive biases but plenty of others do the same and discuss different biases (see You and Not So Smart, Think Again, Predictably Irrational, Blindspot, Fooled by Randomness, and many more).

Putting these biases together, several simpler explanations seem adequate to explain the popularity and alleged efficacy of the enneagram.

  1. The enneagram hasn’t really helped people as much as they think it has.
  2. Introspection and talking to others about their strengths and weaknesses has helped people, the enneagram was just the thing that led them to do that but wasn’t the actual cause of the growth.
  3. People in the most need of change are most likely to improve even if doing nothing (regression to the mean).
  4.  Coincidence. Even a broken clock is right twice a day so for some people, it likely has had revolutionary benefits for some people.
These biases also explain how the enneagram has withstood the test of time, however, there’s an even simpler answer for that. It hasn’t. While many proponents of the enneagram claim it is an ancient system, the evidence says otherwise. In the second edition of their book, enneagram leaders Riso and Hudson say they were mistaken about the ancient origins of the enneagram, attributing the system to Ichazo and Naranjo. Agreeing with this, Naranjo said in an interview that he and Ichazo made up its ancient origins to convince people it was accurate.

Negative Effects

Anyone who’s heard of the enneagram has probably heard several anecdotal stories about how helpful it has been, but what about stories of its harm? In 1949, Egas Moniz won the Nobel prize in medicine for the prefrontal lobotomy. It helped a huge number of people, but at the same time, it did severe damage to others. Any potential good that might come from the enneagram has to be balanced in light of the potential harm. Unfortunately, the people it harms are unlikely to come forward in a group of people who rave about it. Since there is no empirical research on the harms of the enneagram, we have to rely on indirect evidence and the same type of personal testimonies used to promote it. 

Since I have started to publicly speak out against the enneagram, I’ve had several people confide in me and relay other testimonies. The most drastic is a woman from North Carolina whose husband left her after a 22-year marriage after he got involved in the enneagram and learned his “true self.” While this is an extreme example, others have relayed similar stories of the enneagram causing relationships to end because one person realized their types are not compatible. Others have felt defective because none of the enneagram types described them. Even though it’s meant to help people grow, humans have a stronger proclivity to stereotype and judge, so the enneagram becomes an excuse for their own actions and a convenient way to place blame on others for their actions.



I once mediated a disagreement where two of the people were heavily into the enneagram. Their knowledge of the enneagram, and the other person’s type, led them to believe they understood what the other person was really trying to say. After the meeting, they felt the conversation went well and explained how the enneagram had helped them get to the bottom of the issue. On the other hand, when I spoke to the person who was stereotyped by the enneagram, he felt like he wasn’t heard or understood by the other two. Same conversation, two drastically different interpretations of the outcome because a tool that promises to give understanding led to false beliefs and expectations. 

Is the enneagram legitimate for spiritual growth? That is a question that is highly dependent on our own views of what is or is not legitimate. The overwhelming majority of scientific evidence shows that the enneagram does not accurately describe how people are, how they think, how they act, or how they grow. It has negative effects that are often overlooked while most of its apparent positive effects can be explained through other psychological mechanisms.



Enneagram Types from the Riso & Hudson (Enneagram Institute)


The Rational, Idealistic Type: Principled, Purposeful, Self-Controlled, and Perfectionistic


The Caring, Interpersonal Type: Demonstrative, Generous, People-Pleasing, and Possessive


The Success-Oriented, Pragmatic Type: Adaptive, Excelling, Driven, and Image-Conscious


The Sensitive, Withdrawn Type: Expressive, Dramatic, Self-Absorbed, and Temperamental


The Intense, Cerebral Type: Perceptive, Innovative, Secretive, and Isolated


The Committed, Security-Oriented Type: Engaging, Responsible, Anxious, and Suspicious


The Busy, Fun-Loving Type: Spontaneous, Versatile, Distractible, and Scattered


The Powerful, Dominating Type: Self-Confident, Decisive, Willful, and Confrontational


The Easygoing, Self-Effacing Type: Receptive, Reassuring, Agreeable, and Complacent

Summary of scientific tests for accuracy of the enneagram

  1. Internal consistency: Questions for same types should receive similar scores.
    1. Mixed results ranging from bad to good.
  2. Test-rest reliability: People should be typed the same when retested.
    1. Mixed results from low to acceptable.
  3. Interrater reliability: Different sources (people or scales) should type people the same way.
    1. Scores were generally very to low
  4. Convergent validity: The enneagram types should correlate with other personality measures in expected way and not correlate with measures it should be different from.
    1. Results are mixed. The enneagram types often correlate with personality measures that they should correlate with, but it fails to correlate with some, the strength of the correlations are lower than expected.
  5. Predictive validity: The enneagram should predict behaviors and attitudes of people better than other personality systems.
    1. Results are mostly negative. The enneagram weakly predicted some outcomes, but typically, it did not and other tests typically performed better.

