Circular definitions. A definition is circular if it uses what you are trying to define. Example 1: Suppose someone asks you: what is the function ln(x)? If you say: Definition: the function ln(x) is the function ln(x) then that is a circular definition: this won't tell anyone what ln(x) is who didn't already know what ln(x) is! Example 2: Let S be the statement that says: S: The statement S is false. What would happen if math allowed circular definitions? In Example 1, we simply wouldn't make progress. But Example 2 is much worse because it undermines: Key Goal: Every statement in math should have precisely one truth-value: true, or false. We do not want any statements that are both true and false at the same time! We need this Key Goal. We do not want "true" to be the same as "false" because then logic and reasoning would be meaningless. Consider the truth-value of the statement S from Example 2. If S is true, then use S to conclude that S is false. But if S is false, then use "not S" to conclude that S is true! Either way we get a contradiction: If S is true then S is false, but if S is false then S is true. In other words: S <==> not S. In other words: true is the same as false. So if we allowed circularly defined statements like Example 2, then the Key Goal is not possible. Example 3: Let S be the set of all sets. In order to understand this "definition" of S, the set of all sets, we would already have to know what S is, because S would be one of those sets! So at first sight it looks like Example 1. But actually, it is much worse! I'll explain why Example 3 is as bad as Example 2: The "set" S in Example 3 is an element of itself. I have never seen a set with this property (a set that is an element of itself). However, just because I have never seen something, that is not enough to prove that something is impossible. So we need to study this property more carefully. In general, the best way to study a property is to give it a name: Definition: Lets call a set A an abnormal if A is an element of A, and normal if A is not an element of A. All sets that I have ever seen are normal, but that does not prove that abnormal sets do not exist. Now look at the set S from example 3. It is abnormal, but it must have many normal elements (including: all sets I've ever seen). So lets take the following subset of S: Let B = {A in S | A normal} = { all normal sets }. We now get the same problem as we had in Example 2! If B is normal, then B is an element of the set { all normal sets }. But that's B. So then B is an element of B, which means B is abnormal. But if B is abnormal, then B is an element of B, then B is an element of { all normal sets } which means that B is normal! So the definition of the set B leads to the same problem as Example 2. Example 2 had: S <==> not S and in Example 3 we get: "B is normal" <==> not "B is normal". For logic and math to make sense, we can't have statements like that. But the only way to prevent: "B is normal" <==> not "B is normal" while still allowing the usual operations on sets, is to forbid circular definitions like in Example 3. Example 4: Lets play a game where we all do our best to define the biggest integer that we (students and teacher) are able to define in the next 5 minutes in class. Whoever defined the biggest integer wins a prize. Now I "define" B to be: that biggest integer plus 1. Notice that my "definition" of B in Example 4 is circular; the "definition" of B is 1 more than the biggest integer we were able to define, but one of those integers is B itself! So B is bigger than B. Surely that will cause problems! Example 4 is just as circular as Examples 1, 2, 3. If we allowed circular definitions, then the Key Goal would be immediately undermined. Note that my "definition" in Example 4 feels like cheating the game! Also, consider the possibility that several students had written down this same "definition". If we accepted that "definition", then each of those students would be the one winner, because they each beat all players (including themselves!). In summary: The Key Goal requires that we forbid self-referential statements in math. The Axioms of Set Theory (ZFC) are designed in such a way that it forbids circular definitions, while still allowing all sets that mathematicians need. Example 2 shows that if you allow a self-referencing statement, then we get a statement that if true, is false, and if false, is true. That would turn "true" and "false" into meaningless concepts. Example 3 shows that if you allow a self-referencing set, then we also get a statement that if true, is false, and if false, is true. This is why ZFC, the axioms of set theory, do not allow self-referencing definitions. Conclusions: If the collection of all sets exists, then it can't be a set! More generally, the collection of all sets with property P, that may or may not be a set (depending on P), but either way, the collection of all sets with property P is not a set with property P. If we take 1 plus the maximum number we can define in the next 5 minutes, then that's not a number that we can define in the next 5 minutes! The reason that the number B in example 4 is cheating the game is because it's not defined during the game. It's only defined after the game! Now lets go back to Example 3: The collection of all sets, calling that a set is self-referential and causes contradictions. So we may not call that collection a set. But there's nothing wrong with calling it something else! There is an extension of ZFC set theory that has not only sets, but also classes. One can prove that if ZFC is consistent, then so is that extension. That means: it is OK to use classes. In this extension, it is OK to talk about the "class of all sets". But you can not define the class of all classes! (that would cause the same problem as in Example 3). When you allow classes, then you get a nice short definition of cardinality: Definition C: The cardinality of a set A is defined as the class of all sets B for which there exists a bijection between A and B. Thus, two sets have the same cardinality if and only if there exists a bijection between them if and only if they are members of the same class in the above definition. If we restrict to ZFC, then we don't have classes, and then definition C is not valid. But one can modify definition C to obtain a definition that is valid in ZFC (see Von Neumann cardinals on Wikipedia). That definition is a bit more technical, so I prefer definition C (which uses an extension of ZFC, but it has been proven that it is OK to use that extension (to be precise: if ZFC is consistent, then so is that extension).