The reason that emails contain only text is because all parts of the email look the same, “10110100” and a bunch more of it.
There must exist a method of telling what byte means what: Does that byte stand for a character, describe a pixel, describe formatting such as bold, or describe some custom animation in a PowerPoint presentation? It all looks the same…
In order to do this, you have to tag everything. A certain tag means that the next few bytes are a letter, or a Chinese character, or whether they are something like a text color or other type of formatting.
But now here is another problem: How do you know whether something is a tag or not? Obviously you can’t tag the tags, or you don’t solve the problem. You must make everything the same length. This helps because, by knowing exactly where the tags begin and end, you can always figure out where they are. Problem solved.
Or is it? By limiting the size of the tags, you create a second problem: you can only create so many different tags of the same length. This means that you can only tag so many different things. What happened when a new image format comes along? You won’t be able to create a tag that says “this next part of the email is in ___ image format”. What do you do when you want to send an image?
Here’s the way it works:
On the sender’s end, wen you want to send an image, or a PowerPoint, or a Word document, what you do is use a standard (agreed-upon) method to convert the data into text, which can be sent. Then, the person on the other end can decode this, and assuming they know what to do with it, they can regenerate the file.