I describe my implementation of this system in a transitions course and an undergraduate real analysis course. I discuss how the system works: how I developed the grading objectives, how individual assignments are assessed, and the collation of each student’s work into a final portfolio. I provide a theoretical assessment of techniques grading within the mastery grading framework, and some evidence from student surveys and my own impressions that the system meets its goals: improving the quality of student work; increasing student satisfaction; reducing grade-grubbing; and instilling mindfulness, good work practices, and pride.

Loading...

**Definition. **If $X\approx \mathbb{N}$, we call $X$ **denumerable**, and we call any bijection $\psi:X\rightarrow \mathbb{N}$ a** denumeration of $X$**. We call $X$ **countable** if it is either finite or denumerable. Sometimes denumerable sets are called** countably infinite**.

**E.g. **$\mathbb{N}$ is denumerable.

**Theorem. **Any subset of a denumerable set is countable.

**Proof. **Let $X$ be denumerable and $A\subseteq X$. Assume that $A$ is not finite; we’ll show that $A$ is denumerable.

Since $X$ is denumerable, there is a bijection $\psi:X\rightarrow\mathbb{N}$. We’ll construct a denumeration of $A$ using induction.

For the base case: by the Well-Ordering Principle, there is a least element of $\psi_*(A)$. By injectivity, this element is the image of a unique element of $A$, call it $a_1$. Observe that $A\setminus\{a_1\}$ is not empty, so $\psi_*(A\setminus\{a_1\})$ is not empty.

Suppose we’ve denumerated $k$ elements of $A$ so far, say $a_1,\ldots, a_k$. Since $A$ is not finite, $A\setminus\{a_1,\ldots a_k\}$ is not empty. So $\psi_*(A\setminus\{a_1,\ldots,a_k\})$ is a nonempty subset of $\mathbb{N}$, hence by the Well-Ordering Principle has a least element. By injectivity, this element is the image of a unique $a_{k+1}\in A$, which is distinct from all the $a_1,\ldots, a_k$.

Proceeding in this way, we construct a bijection $\mathbb{N}\rightarrow \{a_1,\ldots, a_n,\ldots\}$. Moreover, any $a\in A$ will be one of the $a_k$, since $\psi(a)$ will occur in the list of values. So we have $\mathbb{N}\approx A$. $\Box$.

**Corollary. **The following sets are equivalent to $\mathbb{N}$:

- The set of prime numbers.
- The set of even natural numbers.
- The set of odd natural numbers.
- The set of positive powers of 2.
- The set of positive powers of 3.

**Proof. **These are all infinite subsets of $\mathbb{N}$. Since they’re not finite, they must be denumerable. $\Box$.

**Theorem. **Any subset of a countable set is countable.

**Theorem. **If $B$ is countable and there is an injection $g:A\rightarrow B$, then $A$ is countable.

**Theorem. **If $A$ is countable and there is a surjection $g:A\rightarrow B$, then $B$ is countable.

One weird thing about denumerable sets — actually any infinite set — is that if **you add a single element, the set doesn’t get any larger.**

**Theorem. **$\mathbb{N}\approx\mathbb{N}\cup\{0\}$.

**Proof. **The bijection is given by sending $n\in\mathbb{N}$ to $n-1\in\mathbb{N}\cup\{0\}$. $\Box$.

Okay, but what if we tried to add an entire additional copy of $\mathbb{N}$ — say, the negative numbers.

**Theorem. **$\mathbb{Z}\approx \mathbb{N}$.

**Proof. **We’ll give a bijection $\mathbb{N}\cup\{0\}\rightarrow \mathbb{Z}$. Try \begin{align*}\mathbb{N}\cup\{0\}&\rightarrow \mathbb{Z}\\n&\mapsto\begin{cases}-\frac{n}{2} & \text{ if }n \text{ is even}\\\frac{n+1}{2} & \text{ if }n\text{ is odd}\end{cases}\end{align*}

As an exercise, show that this function is a bijection. Actually, give*two* proofs: one by showing it is injective and surjective, and one by building the inverse function.$\Box$.

Notice that this is a bit weird: $\mathbb{Z}=\mathbb{N}\cup(\mathbb{N}\cup\{0\})$ is the union of two disjoint sets, each of which is equivalent to $\mathbb{N}$. But it’s no bigger than $\mathbb{N}$.

**Theorem. **If $A$ is denumerable and $B$ is denumerable, then $A\times B$ is denumerable.

**Proof**. We’ll show first that $\mathbb{N}\times\mathbb{N}$ is denumerable.

We can represent $\mathbb{N}\times\mathbb{N}$ as a grid of points in the plane:

\begin{tikzpicture}

\foreach \n in {(1,1),(1,2),(2,1),(2,2),(1,3),(2,3),(3,3),(3,2),(3,1),(1,4),(2,4),(3,4),(4,4),(4,3),(4,2),(4,1),(1,5),(2,5),(3,5),(4,5),(5,5),(5,4),(5,3),(5,2),(5,1)}

\node at \n [circle,fill,inner sep=1.5pt]{};

\node at (5.5,3){$\cdots$};

\node at (3,5.5){$\vdots$};

\node at (1,5.5){$\vdots$};

\node at (5,5.5){$\vdots$};

\node at (5.5,1){$\cdots$};

\node at (5.5,5){$\cdots$};

\end{tikzpicture}

and then denumerate them in any order we like, starting perhaps from the origin and snaking around like so:

\begin{tikzpicture}

\foreach \n in {(1,1),(1,2),(2,1),(2,2),(1,3),(2,3),(3,3),(3,2),(3,1),(1,4),(2,4),(3,4),(4,4),(4,3),(4,2),(4,1),(1,5),(2,5),(3,5),(4,5),(5,5),(5,4),(5,3),(5,2),(5,1)}

\node at \n [circle,fill,inner sep=1.5pt]{};

\node at (5.5,3){$\cdots$};

\node at (3,5.5){$\vdots$};

\node at (1,5.5){$\vdots$};

\node at (5,5.5){$\vdots$};

\node at (5.5,1){$\cdots$};

\node at (5.5,5){$\cdots$};

\draw [thick, ->] (1,1) –(1,2);

\draw [thick, ->] (1,2) –(2,2);

\draw [thick, ->] (2,2) –(2,1);

\draw [thick, ->] (2,1) –(3,1);

\draw [thick, ->] (3,1) –(3,2);

\draw [thick, ->] (3,2) –(3,3);

\draw [thick, ->] (3,3) –(2,3);

\draw [thick, ->] (2,3) –(1,3);

\draw [thick, ->] (1,3) –(1,4);

\draw [thick, ->] (1,4) –(2,4);

\node at (.75,.75) [shape=circle,draw,inner sep=1pt]{$1$};

\node at (.75,2.25) [shape=circle,draw,inner sep=1pt]{$2$};

\node at (2.25,2.25) [shape=circle,draw,inner sep=1pt]{$3$};

\node at (1.75,.75) [shape=circle,draw,inner sep=1pt]{$4$};

\node at (3.25,.75) [shape=circle,draw,inner sep=1pt]{$5$};

\node at (3.25,2) [shape=circle,draw,inner sep=1pt]{$6$};

\node at (3.25,3.25) [shape=circle,draw,inner sep=1pt]{$7$};

\node at (2.25,3.25) [shape=circle,draw,inner sep=1pt]{$8$};

\node at (.75,3.25) [shape=circle,draw,inner sep=1pt]{$9$};

\node at (.75,4.25) [shape=circle,draw,inner sep=1pt]{$10$};

\node at (2.25,4.25) [shape=circle,draw,inner sep=1pt]{$11$};

\end{tikzpicture}

It’s clear that we’ll eventually get all the dots numbered (so our denumeration is onto), and that dot will only get one number (so our denumeration is one-to-one).

Now let’s go from this special case to the general case.

**Lemma. **If $A\approx B$ and $X\approx Y$, then $A\times X\approx B\times Y$.

**Proof.** This is homework.$\Box$.

To complete the proof: we know that $A$ and $B$ are denumerable, so we have $A\approx \mathbb{N}$ and $B\approx \mathbb{N}$. So by the lemma $A\times B\approx \mathbb{N}\times\mathbb{N}$. But we showed $\mathbb{N}\times\mathbb{N}\approx\mathbb{N}$. So by transitivity of $\approx$, we have $A\times B\approx\mathbb{N}$. $\Box$.

**Theorem. **If $A$ is countable and $B$ is countable, then $A\times B$ is countable.

**Proof. **We have the cases when both sets are finite and both sets are denumerable. So we only need to handle the case when one set is finite and the other is denumerable. This is an exercise. $\Box$.

**Theorem. **$\mathbb{Q}$ is denumerable.

**Proof. **By the theorem above, $\mathbb{Z}\times\left(\mathbb{Z}\setminus\{0\}\right)$ is countable. There is a surjection from $\mathbb{Z}\times\left(\mathbb{Z}\setminus\{0\}\right)$ to $\mathbb{Q}$ given by $(m,n)\mapsto \frac{m}{n}$, so $\mathbb{Q}$ is countable. But we have an injection from $\mathbb{N}$ to $\mathbb{Q}$ given by $n\mapsto\frac{1}{n}$, so $\mathbb{Q}$ is infinite.$\Box$.

Let’s say we want to count the (living) former Presidents of the United States in this photograph:

We’re so good at counting that we’d probably just glance and say “5”. But if we were three or four years old, we’d probably have to count more carefully, which means doing something like placing our index finger on Obama and pronouncing “one”, shifting the index finger to W. Bush and pronouncing “two”, shifting the index finger to Clinton and pronouncing “three”, shifting it to H. W. Bush and pronouncing “four”, and finally shifting to Carter and pronouncing “five”.

In other words, we would construct the following explicit bijection:

**Definition. **We call the set *A* **finite **if either *A* is empty, or there is some and a bijection . The number is called the **cardinality **of *A*. The cardinality of the empty set is defined to be 0.

We’ve been dealing with finite sets our whole lives, and much of set theory is inspired by them. We’ll take the time to record some important facts, which may seem obvious but are worth noting.

**Theorem. **If $A$ is finite and there is a surjection $g:A\rightarrow B$, then $B$ is finite.

**Proof. **We’ll start with a lemma:

**Lemma. **If there is a surjection $g:A\rightarrow B$, then there is an injection $h:B\rightarrow A$.

**Proof. **Since $g$ is a surjection, for each $b\in B$, there is some $a\in A$ with $g(b)=a$. Pick one of them and call it $h(b)$.

We claim $h:B\rightarrow A$ is an injection. $\Box$.

So now we have an injection $h$ from $B$ to a finite set. Further, there is some bijection $\psi:A\rightarrow\{1,\ldots,n\}$. Then $\psi\circ h:B\rightarrow \{1,\ldots, n\}$ is an injection. But every function is a surjection onto its range, so $B$ is bijective with a subset of $\{1,\ldots,n\}$, hence must be finite. $\Box$

**Corollary. **If $A$ is finite and there is an injection $h:B\rightarrow A$, then $B$ is finite.

**Corollary. **Any subset of a finite set is finite.

**Proof.** If $X\subseteq Y$ and $Y$ is finite, consider the function $\iota_X:X\rightarrow Y$ given by $\iota_X:t\mapsto t$. This is an injection, so the previous corollary applies.$\Box$.

**Definition. **We call $B$ a **proper subset of $A$** if $B\subseteq A$ but $B\neq A$; that is, if every element of $B$ is an element of $A$, but there is some element of $A$ which is not an element of $B$.

**Theorem. **If $A$ is finite, then $A$ is not equivalent to any of its proper subsets.

**Proof. **Suppose not. Then there is some finite set $A$ and a proper subset $B$ of $A$ with $B\approx A$. So there is some $a_0\in A$ with $a_0\notin B$.

Now let $\psi: A\rightarrow \{1,\ldots, n\} $ be the bijection which shows that $A$ is finite, and arrange so that $\psi(a_0)=n$. Then the range of $\psi|_B$ is a subset of $\{1,\ldots, n-1\}$. But then $\{1,\ldots, n\}$ would be equivalent to a subset of $\{1,\ldots, n-1\}$, which is absurd. $\Box$.

Perhaps the most surprising part of this theorem is that we have the assumption “$A$ is finite”. You might think that there’s no way a set could be equivalent to the set you get after you remove an element. But! Infinite sets are kind of weird; or at least we aren’t used to dealing with them so our intuitions are sometimes wrong.

**Definition.** We call a set $A$ **infinite** if it is not finite.

**Theorem. **If $A$ is equivalent to one of its proper subsets, then $A$ is infinite.

**Proof. **This is the contrapositive of the last theorem.

This theorem (though you might think its hypothesis never applies) turns out to be a very useful way to prove that a particular set is infinite.

**Theorem. **$\mathbb{N}$ is infinite.

**Proof. **Consider the function $f:n\mapsto n+1$. This is a function from $\mathbb{N}$ to $\mathbb{N}\setminus\{1\}$. Notice that there is a function $g:\mathbb{N}\setminus\{1\}\rightarrow \mathbb{N}$ given by $g(p)=p-1$, which has\begin{align*}g(f(n))&=g(n+1)=n+1-1=n\\f(g(p))&=f(p-1)=p-1+1=p\end{align*}so that $g$ serves as an inverse to $f$. Therefore $f$ is a bijection, and so $\mathbb{N}\approx\mathbb{N}\setminus\{1\}$. $\Box$

We can also give another proof, which is how most proofs that something is infinite go. Since *infinite* just means **not finite**, we challenge the reader to try and tell us how many elements the infinite set has. Then we show we can always defeat them, by finding an element they left out of their count.

**Another proof that $\mathbb{N}$ is infinite.**

Suppose $\mathbb{N}$ had finitely many elements, say $K$ of them. Then there would be a bijection $\psi:\{1,\ldots,K\}\rightarrow \mathbb{N}$.

Let $L=\psi(1)+\psi(2)+\cdots+\psi(K)$. Then $L$ is strictly greater than any of the $\psi(i)$. Therefore $L\notin \operatorname{Range}(\psi)$. so $\psi$ is not onto. This contradiction establishes the theorem. $\Box$.

The following proof of this type is the oldest recorded *reductio ad absurdum*.

**Theorem. **(Euclid, ca. 300 BCE) There are infinitely many prime numbers.

**Proof. **Suppose there were finitely many prime numbers, say $K$ of them. List them all: $p_1,\ldots, p_K$. Consider the numbers \begin{align*}L&=p_1p_2\cdots p_K\\M&=L+1 \end{align*}

Now we know that $M$ has at least one prime factor; call that factor $q$. Observe that $q$ could not be 2, because $L$ is even, so $M$ is odd. Further, $q$ could not be 3, because $L$ is a multiple of 3, so $M$ has remainder 1 when we divide by 3. Proceeding in this way, we find that $q$ cannot be any of the prime numbers we listed.

So we have found a prime that was not on our list, which we supposed to be complete. This contradiction establishes the theorem.$\Box$.

We can also show that a set is infinite by finding an infinite subset:

**Theorem. **If $A\subseteq B$ and $A$ is infinite, then $B$ is infinite.

**Proof. **Assume $A\subseteq B$. We will show that if $A$ is infinite, then $B$ is infinite. The contrapositive of this claim is: **If $B$ is finite, then $A$ is finite**, which we already showed.$\Box$

**E.g. **$\mathbb{R}$ is infinite.

Actually this can be souped up:

**Theorem. **If $A$ is infinite and there is an injection $\psi:A\rightarrow B$, then $B$ is infinite.

**E.g. **$\mathbb{N}\times\mathbb{N}$ is infinite.

**Proof. **Consider the function $\psi:\mathbb{N}\rightarrow \mathbb{N}\times\mathbb{N}$ given by $\psi(k)=(k,k)$.

If $\psi(k_1)=\psi(k_2)$, then $(k_1,k_1)=(k_2,k_2)$, so $k_1=k_2$. This shows that $\psi$ is an injection, so the above theorem shows that $\mathbb{N}\times \mathbb{N}$ is infinite. $\Box$

$\mathbb{N}$ is in fact the *smallest* infinite set, in the following sense:

**Theorem. **Any infinite set has a subset which is equivalent to $\mathbb{N}$.

**Proof. **Let $X$ be an infinite set. We’ll show, by induction, that we can pick a distinct element $x_n\in X$ for each $n\in\mathbb{N}$.

For the base case, pick $x_1\in X$. Note that $X\setminus\{x_1\}$ is not empty (otherwise $X$ would have had only one element to begin with).

For the inductive step, suppose we’ve picked $x_1,\ldots,x_k$ distinct elements of $X$ already. Then $X\setminus\{x_1,\ldots, x_k\}$ is not empty (otherwise $X$ would have only $k$ elements), so there is some $x_{k+1}\in X$ which is distinct from $x_1,\ldots,x_k$.

Mathematical induction allows us to continue this way to define $x_n$ for any $n\in\mathbb{N}$, thus building a bijection $\mathbb{N}\rightarrow \{x_1,\ldots,x_n,\ldots\}$. $\Box$.

]]>**Definition. **If $R$ is a relation from $A$ to $B$, then $R^{-1}$ is a relation from $B$ to $A$, defined by $(b,a)\in R^{-1}$ iff $(a,b)\in R$.

Thus (in our formalism) **every function has an inverse**. The inverse of a function so defined, however, may not be a function itself! (Remember that most operations we do to sets or to relations will not preserve functionhood.) So let’s discover when taking the inverse (“swapping $x$ and $y$”) actually does result in a function.

Recall that $g:A\rightarrow B$ means that two conditions hold:

- $\operatorname{Domain}(g)=A$
- $\left((a,b)\in g\text{ and }(a,c)\in g\right)\Rightarrow b=c$.

**Exercise 1. **Suppose $f:X\rightarrow Y$. By substituting $f^{-1}$ for $g$, $X$ for $A$, and $Y$ for $B$, write what it means for $f^{-1}:Y\rightarrow X$.

**Exercise 2. **Use the definition of $f^{-1}$ to rewrite your conditions solely in terms of $f$ (with no $f^{-1}$ anywhere).

**Definition. **Let $f:X\rightarrow Y$. We call $f$ onto or surjective if $\forall y\in Y, \exists x\in X: y=f(x)$. We call $f$ one-to-one or injective if for any $x,z\in X$, we have that $f(x)=f(z)$ guarantees $x=z$.

**Definition. **A function which is both one-to-one and onto is called a one-to-one correspondence or a bijection.

The following theorem is important, but its proof is just the combination of Exercises 1 and 2.

**Theorem. **Let $f:X\rightarrow Y$. Then $f^{-1}:Y\rightarrow X$ iff $f$ is both one-to-one and onto.

**Theorem. **Suppose $f:X\rightarrow Y$ is one-to-one and $g:\operatorname{Range}(f)\rightarrow Z$ is one-to-one. Then $g\circ f:X\rightarrow Z$ is one-to-one.

**Proof. **Note that the condition that $g$ be defined on the range of $f$ is necessary in order to compose the two functions as functions.

Let $x,z\in X$. Suppose $g\circ f(x)=g\circ f(z)$. Applying the fact that the composition of functions is a function, we have \begin{align*}g(f(x))=g(f(z))\end{align*}

Since $g$ is one-to-one, we have $f(x)=f(z)$. Since $f$ is one-to-one, we have $x=z$. Since $x,z$ were arbitrary, we’ve shown $g\circ f$ is one-to-one. $\Box$.

**Theorem. **Suppose $f:X\rightarrow Y$ is onto and $g:Y\rightarrow Z$ is onto. Then $g\circ f$ is onto.

**Proof.** This is an exercise for you.

**Definition.** Given sets $A$, $B$, define \begin{equation*}A^B=\left\{f:B\rightarrow A\right\}\end{equation*} to be the set of all functions from $B$ to $A$.

This notation connects with our powerset notation by writing $2$ for the set $\{0,1\}$ (which has two elements).

**Theorem.** If $A$ has exactly $n$ elements and $B$ has exactly $m$ elements, then $A^B$ has exactly $n^m$ elements.

**Proof.** To define a function $B\rightarrow A$ requires choosing, for each $b\in B$, one of the $n$ elements of $A$.

Each time we make such a choice, there are $n$ options; and we have to make the choice $m$ times. So in total there are $n\cdot n\cdot\cdots \cdot n=n^m$ possibilities. $\Box$

]]>Plato, *The Allegory of the Cave*

[/callout]

A function $f:X\rightarrow Y$ gives us a way of taking elements in $X$ and *pushing them over* to $Y$. We call $f(x)$ the *image* of $x$ under $f$. In fact, we can also think of $f$ as giving a way to push a set of elements of $X$ over to $Y$.

**Definition. **Given $f:X\rightarrow Y$ and $A\subseteq X$, we define $f_*A=\left\{f(x)\middle\vert x\in A\right\}$. We call $f_*A$ the image of $A$ under $f$ or the pushforward of $A$ by $f$.

Notice that by definition, $f_*A$ is a subset of the target $Y$.

**E.g. **If $f:t\mapsto t^2$, and $A=[-2,3]$, then $f_*A=[0,9]$.

**Proof.** ($\Rightarrow$) Let $y\in f_*A$. Then $\exists x\in A: f(x)=y$. So $-2\leq x\leq 3$.

If $-2\leq x\leq 0$, then $y=x^2$ is between $0$ and $4$. So $y\in[0,9]$.

If $0\leq x\leq 3$, then $y=x^2$ is between $0$ and $9$. So $y\in[0,9]$.

($\Leftarrow$) Let $y\in[0,9]$. Then $0\leq y\leq 9$. Let $x=\sqrt{y}$. So $0\leq x\leq 3$. Then $f(x)=\left(\sqrt{y}\right)^2=y$. Since $0\leq x\leq 3$, $x\in[-2,3]$.

**Notation.** Most sources write $f(A)$ for the image of $A$ under $f$. This is not an entirely unreasonable choice, but it means that the symbol $f(A)$ has two interpretations — if $A$ is an element of $X$, then $f(A)$ is an element of $Y$; if $A$ is a subset of $X$, then $f(A)$ is a subset of $Y$. Our notation $f_*A$ is always going to be a set. But you should know, for your future mathematics courses, that $f(A)$ is a standard notation.

**Theorem/Realization.** If $f:X\rightarrow Y$, then $f_*:2^X\rightarrow 2^Y$.

That is, $f_*$ takes subsets of $X$ and yields subsets of $Y$.

**Definition.** Given $f:X\rightarrow Y$ and $B\subseteq Y$, we define $f^*B=\left\{x\middle\vert f(x)\in B\right\}$. We call $f^*B$ the preimage of $B$ under $f$ or the pullback of $B$ by $f$.

Note that $f^*B$ is a subset of the domain $X$.

**E.g. ** If $f:\mathbb{R}\rightarrow\mathbb{R}$ is given by $t\mapsto t^2$, then $f^*((-2,-1))=\varnothing$ and $f^*([-4,4))=[0,2)$.

**Theorem/Realization.** If $f:X\rightarrow Y$, then $f^*:2^Y\rightarrow 2^X$.

**Notation**. Just like with the image, there is a standard-but-terrible notation for the preimage. Most sources write $f^{-1}(B)$ for the preimage of $B$ under $f$. This is just nutty, because the object in question doesn’t have anything to do with inverses. Nevertheless, you should expect at least once in your mathematical life to have to deal with the string of symbols $f^{-1}(B)$, where $B$ is some set. We live in a fallen world.

**Theorem.** Let $f:X\rightarrow Y$, $A\subseteq X$, $B\subseteq X$, $C\subseteq Y$, $D\subseteq Y$. Then

- $f_*(A\cup B)=f_*(A)\cup f_*(B)$
- $f_*(A\cap B)\subseteq f_*(A)\cap f_*(B)$
- $f^*(C\cup D)=f^*(C)\cup f^*(D)$
- $f^*(C\cap D)=f^*(C)\cap f^*(D)$
- $f^*(C^c)=\left(f^*(C)\right)^c$

**Proof. **This is an exercise, but it’s just an exercise in proving set equalities.

Recall that for $S$ to be a set means that “$x\in S$” is an open sentence. We can view this in terms of functions.

**Definition.** Given a set $A$, the characteristic function of $A$ is the function \begin{equation*} \chi_A(x)=\begin{cases}1 &\text{ if } x\in A\\0&\text{ otherwise}\end{cases} \end{equation*}

Technically in order to define this function, we need to specify a universe for $x$ — but notice that it really doesn’t much matter, because as long as $x\notin A$, $\chi_A(x)=0$.

**Exercise.** If $A$ and $B$ are subsets of some common universe $U$, then for any $x\in U$, we have:

- $\chi_{A\cap B}(x)=\chi_A(x)\chi_B(x)$,
- $\chi_{A\cup B}(x)=\max\{\chi_A(x),\chi_B(x)\}$
- $\chi_{A^c}(x)=1-\chi_{A}(x)$.

**Theorem.** $\chi_A^*(\{1\})=A$.

This says that $A$ determines its characteristic function, and $\chi_A$ determines $A$ as well.

]]>Functions are sets, so we can try to combine them like we combine sets. That is, we can ask the following questions:

- Is the complement of a function necessarily a function?
- Is the intersection of two functions necessarily a function?
- Is the union of two functions necessarily a function?
- Is the set product of two functions necessarily a function?

Let’s investigate the first three; the last one I’ll leave for homework.

To start with, let’s pick two functions and run with them. How about: $f:x\mapsto x+1$ and $g:t\mapsto t^2$.

The complement of $g$ is $g^c=\left\{(x,y)\middle\vert y\neq x^2\right\}$, shown here in hatching:That’s definitely not a function: it badly fails the vertical line test.

What about intersection?

The intersection is just two points; so $f\cap g$ is not a function from $\mathbb{R}$ to $\mathbb{R}$. But $f\cap g$ *is* a function from a smaller set.

Let’s look at $f\cup g$:

It also fails the vertical line test. But:

**The Pasting Lemma (in the category of** **sets)**. Let $f:X\rightarrow Y$ and $g:A\rightarrow B$ be two functions. Then $f\cup g:X\cup A\rightarrow Y\cup B$ if and only if $f|_{A\cap X}=g|_{A\cap X}$.

**Proof. **This is a biconditional, so we have two directions:

($\Leftarrow$) Assume $f|_{A\cap X}=g|_{A\cap X}$. We need to show $f\cup g$ is a function; to do this, there are two clauses.

(clause 1: domain) Let $t\in X\cup A$. Then either $t\in X$ or $t\in A$.

If $t\in X$, then because $f:X\rightarrow Y$, $\exists y\in Y: (t,y)\in f$. Then $(t,y)\in f\cup g$.

If $t\in A$, then because $g:A\rightarrow B$, $\exists b\in Y: (t,b)\in g$. Then $(t,b)\in f\cup g$.

In either case, there’s an element of $z\in Y\cup B$ so that $(t,z)\in f\cup g$. That’s clause 1 of the definition of function.

(clause 2: unambiguousness) Let $t\in X\cup A$ and $z_1,z_2\in Y\cup B$ be such that $(t,z_1)\in f\cup g$ and $(t,z_2)\in f\cup g$. There are four cases:

If $(t,z_1)\in f$ and $(t,z_2)\in f$, then the fact that $f$ is a function implies $z_1=z_2$.

If $(t,z_1)\in g$ and $(t,z_2)\in g$, then the fact that $g$ is a function implies $z_1=z_2$.

If $(t,z_1)\in f$ and $(t,z_2)\in g$, then we have $t\in \operatorname{Domain}(f)=X$ and $t\in\operatorname{Domain}(g)=A$, so $t\in X\cap A$. But then $z_1=f(t)=f|_{X\cap A}(t)=g|_{X\cap A}(t)$. So $(t,z_1)\in g|_{X\cap A}\subseteq g$. So $(t,z_2)\in g$. Since $g$ is a function, $z_1=z_2$.

If $(t,z_1)\in g$ and $(t,z_2)\in f$, then we have $t\in \operatorname{Domain}(f)=X$ and $t\in\operatorname{Domain}(g)=A$, so $t\in X\cap A$. But then $z_2=f(t)=f|_{X\cap A}(t)=g|_{X\cap A}(t)$. So $(t,z_2)\in g|_{X\cap A}\subseteq g$. So $(t,z_1)\in g$. Since $g$ is a function, $z_1=z_2$.

In any of these four cases, $z_1=z_2$, which is what we needed for clause 2 of the definition of function.

($\Rightarrow$) In this direction, we need to show two functions are equal. So we should probably use the TAWFAE. I’ll leave this direction to you.

Why does this theorem bear such a silly name? Consider this diagram, which tells you how to build a cube from a piece of paper. In order to attach the sides, you have to paste the red tab to the back of one of the appropriate square (I’ve indicated where the red tab gets attached by coloring it green). We won’t succeed in pasting unless the red tab and the green target match up.

That’s what the condition $f|_{X\cap A}=g|_{X\cap A}$ ensures: that the functions $f$ and $g$ agree on the overlap of their domains.

Again, an example from precalculus mathematics: when we define a function piecewise, like $$h(x)=\begin{cases}x^2 & \text{ if } x\leq 3\\ -x+1&\text{ if } x>3\end{cases}$$we usually want to make sure that the two pieces have *nonoverlapping* domains. This corresponds, in the Theorem above, to the case that $X\cap A=\varnothing$. If we weren’t so careful, we might have a situation like this:$$h(x)=\begin{cases}x^2 & \text{ if } x\leq 3\\ -x+1&\text{ if } x>1\end{cases}$$where it’s ** ambiguous **what $h(2)$ is:

Because functions are relations, we can consider their composites. We have the following nice theorem:

**The Composition of Functions Is a Function. **If $f:X\rightarrow Y$ and $g:Y\rightarrow Z$, then $g\circ f:X\rightarrow Z$.

**Proof. **There are two things to show:

(clause 1: domain) Let $x\in X$. Then, because $f$ is a function, $\exists y\in Y:(x,y)\in f$. Since $y\in Y$ and $g$ is a function, $\exists z\in Z:(y,z)\in g$. Then $(x,z)\in g\circ f$. $\Box$.

(clause 2: unambiguousness) Given $x\in X, z_1,_2\in Z$ so that $(x,z_1)\in g\circ f$ and $(x,z_2)\in g\circ f$. We need to show $z_1=z_2$.

Since $(x,z_1)\in g\circ f$, there is some $y_1\in Y$ with $(x,y_1)\in f$ and $(y_1,z_1)\in g$. We also have $y_2\in Y$ with $(x,y_2)\in f$ and $(y_2,z_2)\in g$.

Since $f$ is a function, $y_1=y_2$. So we have $(y_1,z_2)\in g$. But since $g$ is a function, $(y_1,z_2)\in g$ and $(y_1,z_1)\in g$ forces $z_1=z_2$. $\Box$.

**Important Formula.** If $f$ and $g$ are functions, so that $g\circ f$ is a function, then $(g\circ f)(x)=g(f(x))$.

**Proof. **What does it mean for $z=(g\circ f)(x)$? It means $(x,z)\in g\circ f$. But that means $\exists y: (x,y)\in f$ and $(y,z)\in g$. So $y=f(x)$ and $z=g(y)$. By substituting, we obtain $z=g(f(x))$. $\Box$

This formula explains the “backwards” looking choice of how we write compositions: even though $f$ come first, we write $g\circ f$. The only way out of this would be to switch our function notation to something like $(x)f$, so that the composite would be $((x)f)g$. (Some people do this, and it has a name: *reverse Polish notation*. But most normal people don’t do it.)

**Theorem About When Functions Are Equal** (TAWFAE)**.** Let $f$ and $g$ be functions. Then $f=g$ if and only if:

- $\operatorname{Domain}(f)=\operatorname{Domain}(g)$, and
- $\forall x\in \operatorname{Domain}(f), f(x)=g(x)$.

**Proof. **This theorem sort of looks like it will be complicated to prove, but let’s keep our wits about us and rely on logic to help us out.

Globally, this is a biconditional proof so it has two directions:

($\Rightarrow$) Assume $f=g$. We need to prove two things:

(*clause 1*) This is the claim that two sets are equal, so we have to prove each is a subset of the other:

($\subseteq$) Let $t\in \operatorname{Domain}(f)$. Then $\exists y: (t,y)\in f$. Since $f=g$, this means $(t,y)\in g$. So $t\in\operatorname{Domain}(g)$.

($\supseteq$) Let $t\in \operatorname{Domain}(g)$. Then $\exists y: (t,y)\in g$. Since $f=g$, this means $(t,y)\in f$. So $t\in\operatorname{Domain}(f)$.

(*clause 2*) This is a universal claim, so we start the same way as always: Let $x\in \operatorname{Domain}(f)$. Then $\exists y: (x,y)\in f$. We can rewrite this as $y=f(x)$. On the other hand, we know from clause 1 that $\operatorname{Domain}(g)=\operatorname{Domain}(f)$, so $\exists z: (x,z)\in g$. We can rewrite this as $z=g(x)$. But since $g=f$, we actually have $(x,z)\in f$.

Since $(x,y)\in f$ and $(x,z)\in f$, and $f$ is a function, $y=z$.

So $f(x)=y=z=g(x)$.

($\Leftarrow$) Assume both clauses 1 and 2 hold. In this direction, we need to prove two sets are equal.

($\subseteq$) Let $Q\in f$. Then $Q=(x,y)$ for some $x$ and some $y$. $x\in\operatorname{Domain}(f)$, so by clause 1, $x\in\operatorname{Domain}(g)$. Then there’s some $z$ with $(x,z)\in g$. We can rewrite in function notation as $y=f(x)$ and $z=g(x)$. By clause 2, we know $f(x)=g(x)$, so $y=z$. That is, $Q=(x,y)\in g$.

($\supseteq$) This is so similar to $\subseteq$, I’ll let you work it out.

Since we proved both directions, we’re done. $\Box$.

This theorem gives us a new way to prove two functions are equal, should we ever need it: we establish clause 1 (“they have the same domain”) and clause 2 (“they have the same rule”).

In middle and high school algebra courses, we often ask questions like “What’s the domain of $f(x)=\frac{sin(x)}{x^2-25}$?”. According to the TAWFAE, this is actually a dumb question: TAWFAE says in order to know what a function is, we need to know what the domain is. So a question like doesn’t actually contain enough information to be answerable.

What the question is really asking, though is: *what’s the most natural*, or maybe *what’s the largest possible* domain on which this rule makes sense?

As a middle-school student, I was very confused by questions like this one:

Find the domain of $f(x)=\frac{x^2-9}{x+3}$.

My answer would usually go like this: that function is really the same as $x-3$, the domain of which is all of $\mathbb{R}$. And I would always get that question wrong. Why?

The functions $\psi:t\mapsto \frac{t^2-9}{t+3}$ and $\varphi:x\mapsto x-3$ do indeed have the same rule, in the sense that, whenever $\psi(t)$ makes sense, we have

$$ \psi(t)=\frac{t^2-9}{t+3}=\frac{(t-3)(t+3)}{t+3}=t-3=\varphi(t)$$

**But **the natural domain of $\psi$ is $\mathbb{R}\setminus\{-3\}$ and the natural domain for $\varphi$ is $\mathbb{R}$; so that means the domains are different — hence by the TAWFAE, we know $\psi\neq\varphi$. Nevertheless, the two functions *are* related.

**Definition.** If $f:X\rightarrow Y$ and $A\subseteq X$, we define the *restriction* of $f$ to $A$ as

\begin{align*}f|_{A}:A&\rightarrow Y\\ a&\mapsto f(a)\end{align*}

That is, $f|_A$ has the same rule as $f$, just we only care about inputs that come from $A$.

Equipped with this notion, we can describe the relationship between $\psi$ and $\varphi$:

$$\psi=\varphi|_{\mathbb{R}\setminus\{-3\}}$$

**Definition.** If $f:X\rightarrow Y$, and $X\subseteq Z$, and there is $g:Z\rightarrow Y$ with $g|_X=f$, we call $g$ an *extension *of $f$.

Here’s an illustration: if we let $f$ denote the function displayed in green, $g$ denote the function displayed in blue, and $h$ the function displayed in red, then we have that:

- $f|_{[-2,\infty)}=g$
- $f|_{[3,\infty)}=h$
- $g|_{[3,\infty)}=h$
- $f$ is an extension of $g$, from $[-2,\infty)$ to all of $\mathbb{R}$
- $g$ is an extension of $h$, from $[3,\infty)$ to $[2,\infty)$

]]>

- $\forall x\in X, \exists y\in Y: (x,y)\in f$, and
- $\forall x\in X, y_1,y_2\in Y, \left[(x,y_1)\in f \wedge (x,y_2)\right]\in f\Rightarrow y_1=y_2$

That is, a function assigns to *each* (clause 1) element of $X$ an element of $Y$, and that assignment is unambiguous (clause 2). Since we could rewrite clause 1 as $\operatorname{Domain}(f)=X$ (**why?**), we’ll refer to clause 1 as the “domain clause” and clause 2 as the “unambiguousness clause”.

**Notation. **Instead of the mouthful $f$ is a function from $X$ to $Y$, we write the shorthand $f:X\rightarrow Y$.

We also have this familiar notation:

**Notation. **If $f$ is a function from $X$ to $Y$, and $(x,y)\in f$, we write $y=f(x)$.

Observe that if $(x,y_1)\in f$ and $(x,y_2)\in f$, then this notation reads as follows: $y_1=f(x)$ and $y_2=f(x)$, so we’d *darn well better* have $y_1=y_2$, or else something is very wrong.

There is another way to represent a function, by specifying its domain and its rule:

**Notation. **If $f$ is a function from $X$ to $Y$, we write $f:X\rightarrow Y$. To specify the rule that $f$ uses, we write \begin{align*}f:X&\rightarrow Y\\x&\mapsto f(x) \end{align*}

For example, we could denote the function which takes in a real number as input, and squares it by

\begin{align*} sq:\mathbb{R}&\rightarrow\mathbb{R}\\ t&\mapsto t^2 \end{align*}

Notice that the arrow between the sets is a different shape from the one between elements of the set. As usual we want to let go of our $x$s, so we could as well have written

\begin{align*} sq:\mathbb{R}&\rightarrow\mathbb{R}\\ z&\mapsto z^2 \end{align*}

or

\begin{align*} sq:\mathbb{R}&\rightarrow\mathbb{R}\\ Q&\mapsto Q^2 \end{align*}

This frees us from the tyranny of always writing $y=f(x)$.

]]>Adam Smith

[/callout]

What is a function? The notion of a function is fundamental to mathematics, but like many fundamental aspects of mathematics, its definition was not nailed down until relatively recently. Mathematicians were using functions without thinking all that hard about what functions were! Let’s rectify that.

The definition that I prefer to use is the following:

**Definition. **A *function* from the set $X$ to the set $Y$ is an unambiguous rule which assigns to each element of $X$ an element of $Y$.

By unambiguous, I mean that each $x\in X$ has only one element of $Y$ assigned to it. This may seem obvious, but in fact there are some interesting mathematical operations that turn out to be a little ambiguous, for example:

**Definition. **We say that $t$ is a square root of $x$ if $t^2=x$.

It’s not hard to see that both $-7$ and $7$ are square roots of $49$, so the phrase “the square root of $49$” is actually ambiguous. There are various ways to deal with such ambiguity: for example, we could just content ourselves with the symbol $\pm 7$, representing the two-element set $\{-7,7\}$, but then *the square root of a number isn’t a number!. *The most straightforward way is just to exclude it by assumption, as I’ve done with the definition of function I gave above.

To functionize the notion of square root, we usually do the following:

**Definition. **We say that $t$ is the principal square root of $x$, and write $t=\sqrt{x}$, if $t^2=x$ and $t\geq 0$.

That is, if there are the two possible choices for square root, we decide always to take the positive one. This notion of “the square root” is now unambiguous, i.e. a function.

Let’s connect this idea back to our formalism with relations. It’s clear that $t^2=x$ expresses a relationship between $t$ and $x$, so we can hope that our work on relations will help us develop a good definition of function as a kind of relation. For a function from $X$ to $Y$, we will want to consider pairs $(x,y)$, so our relation (let’s be traditional and call it $f$) will be a subset of $X\times Y$.

What would it mean for the rule to be ambiguous? There would be some $x\in X$ with $(x,y_1)\in f$ and $(x,y_2)\in f$, where $y_1$ and $y_2$ are distinct. That is, $f$ being ambiguous means

\begin{equation*}

\exists x\in X,\exists y_1\in Y, \exists y_2\in Y: (x,y_1)\in f \wedge (x,y_2)\in f\wedge y_1\neq y_2

\end{equation*}

**Activity. **Write a denial of the above logical sentence. This is what it means for $f$ to be unambiguous.

**Activity. **If your denial only includes $\sim$ and $\vee$, alter it using logical facts so that it contains a $\Rightarrow$.

- Ambiguity (this page)
**The basics of functions****.**Functions as sets and as relations.**Domains and Rules.****Operations on functions.**How to combine functions.**Functions and Sets.****Sets of Functions.****Functions whose Inverses are Functions.**