Reverse Engineering Moderna’s SARS-CoV-2 Vaccine

A closer look at the mRNA patent reveals an incredible technology safe for human use

Introduction

In this article, we are going to have a closer look at the mRNA technology used for Moderna’s SARS-CoV-2 vaccine. We are going to decode the sequence disclosed by Moderna’s patent and go through it bit by bit. We are going to learn the basics of our genetic code and why an mRNA vaccine is effective and safe for human use.

Wait — I said we are going to decode a sequence — why so? Overall the vaccine is a liquid that gets injected into your arm. Well, that’s a good question to start with. But to answer it we need to get through a little bit of the RNA basics.

DNA and RNA: The Basics

DNA is pretty much like a digital code. In computers, we store information in 0 and 1 (a bit) — or the presence or absence of a charge. That is the basic flow of how information is transferred in digital systems.

Nature stores its basic information in 4 different molecules: A, C, G and U/T — the ‘nucleotides’. These 4 nucleotides make out the chains of our DNA or RNA:

In computers, we group 8 bits (0 or 1) as one byte. The byte is the common unit in which information is processed. Historically, the byte was the number of bits used to encode a single character of text or a symbol in a computer.

Nature groups 3 nucleotides into a ‘codon’, which is the typical processing unit in nature. The codon is the number of nucleotides used to encode into one of 20 different proteinogenic amino acid. These amino acids make up every single one of our proteins and enzymes.

Pretty digital so far.

Nature stores these codes in its DNA in the nucleus of every cell. If the information is needed it gets then ‘transcribed’ into a messenger RNA (mRNA). This is somewhat a short-lived version of our DNA, a blueprint our protein machinery can read to synthesize the things that we are made of: proteins.

How Do Vaccines Work?

To understand Moderna’s mRNA code we must first understand how vaccines work: Vaccines teach our immune system what a pathogen looks like so it can develop antibodies against it.

Pathogens are bacteria, virus, fungi or parasites that can cause disease within our body. These pathogens are made of several subparts, called antigens, that are unique to each specific pathogen. Our immune system can learn to recognize these little subparts of the pathogens and produce antibodies in an immune response. These little antibodies will attach to the pathogen and help our immune cells to fight it.

The idea behind vaccines is to teach our immune system what the antigen looks like, without encountering the pathogen or getting ill. Historically, this has been done using inactivated or weakened pathogens, or parts of the pathogen (antigens) that trigger an immune response. Modern vaccines contain the blueprint (e.g. mRNA) to the antigen, rather than the antigen itself. Our cells are then ‘transfected’ with the blueprint of the antigen using a nanoparticle drug delivery (LNP) system. This way our cells can temporarily produce the antigen themselves and prompt our immune system to respond and produce antibodies against it.

BioNTech’s BNT162b2 and Moderna’s mRNA-1273 vaccine are two of these newer ‘blueprint’ vaccines. Both of the vaccines contain a volatile genetic blueprint — an mRNA — to encode the well-known SARS-CoV-2 ‘spike’ protein. And this SARS-CoV-2 spike protein is the antigen to stimulate the immune response.

However, our cells are very unenthusiastically about foreign genetic blueprints thrown at them, so there was a lot of modification and technology needed to bring the blueprint to work. So let’s have a closer look at Moderna’s mRNA patent.

The Code: Moderna’s mRNA-1273 SARV-CoV2 Vaccine

Unlike BioNTech, Moderna has not published the exact code of its mRNA vaccine. But Moderna holds a patent for its vaccine, which is published and contains the most information about the technology. Hence, it is possible to reverse engineer the structure of the vaccine with good accuracy.

A closer look at the patent US 10,702,600 shows us the following:

They use a highly modified mRNA encoding the full-length SARS-CoV-2 spike protein together with some other functional elements, we will have a closer look at later. If we draw a scheme with all the elements covered by the patent, we get this:

This is our code on a high level. But what does it all mean?

The Cap: Marks the RNA as What it is

We’ll start with the cap which is depicted as a little hat. The cap has the following composition:

In some embodiments, a 5’ terminal cap is 7mG(5’)ppp(5’)N1mpNp.

7mG(5′)ppp(5′)N1mpNp translates into the following nucleotide code: GNN, whereas G is altered (7-methylated), N is altered (1-methylated) and N being just any possible nucleotide.

This standard three-nucleotide sequence is virtually found in all eukaryotes (e.g. animals, plants, fungi) and most viruses and is an evolutionarily conserved modification of eukaryotic mRNA. The mRNA cap has multiple functions: It is essential for the initiation of the translation — so the start of the protein synthesis from the blueprint. Further, it makes the mRNA look legit for our cells and prevents it from degradation. It marks the mRNA as coming from the cell’s nucleus, however, that’s not the case for our vaccine.

The cap can be compared to the flag on a ship: identifying where it comes from and what it is. The same as pirates use other flags, viruses use our natural flag to deceive and disguise.

The “five-prime untranslated region” (5’-UTR)

The 5’-UTR region is just another standard sequence found in every mRNA. The 5’ marks the reading direction of the sequence. Just as we read from left to right, mRNA is read from 5’ to 3’.

In some embodiments, an in vitro transcription template encodes a 5’ untranslated (UTR) region, contains an open reading frame, and encodes a 3’ UTR and a polyA tail.

Untranslated region means, that this part of the code does not translate in any part of the protein, but will be left untranslated. However, its main function is the provision of a sequence that our protein factory (the ribosome) can grab onto and hold firm: the (ribosomal) binding site. Further, it contains the start sequence for the initiation of the protein synthesis: the Kozak sequence.

The Signal Peptide

We have to go through one last layer of informational sequence before we continue with the antigen itself.

Once the ribosome has produced the protein, the protein still needs to go somewhere. And this is the function of the signal peptide. The signal peptide is attached to the protein and contains information about further processing (e.g. cutting, folding, cell exiting) in the cells post-production department — the endoplasmatic reticulum.

In the case of the virus, the ‘S glycoprotein signal peptide’ sends the protein immediately in the endoplasmatic reticulum for further processing and afterwards to the cell membrane for the virus assembly.

BioNTech’s vaccine uses the natural virus’ signal peptide for the protein. On the contrary, Moderna’s vaccine uses a different signal peptide but displaying the same functions. It uses one of the following:

HuIgGk IgE signal peptide, heavy chain epsilon-1 signal peptide, Japanese encephalitis PRM signal sequence, VSVg protein signal sequence or Japanese encephalitis JEV signal sequence.

These signal peptides are either derived from our cells (IgE & IgG) or different viruses. The utilized signal peptides are shorter and display a highly efficient assembly and secretion of the produced viral particles. During the process, the signal peptide is cleaved of the protein by enzymes.

The Actual (Modified) Spike Protein

Up to this point we just encountered instructive sequences for the identification, production, processing and secretion of the protein. Now let’s go to the actual antigen: the SARS-CoV-2 Spike protein. The spike protein sits in the lipid membrane (basically a fat droplet) of the virus and binds to the host cell receptor to induce fusion of the membranes.

So here Moderna did publish the exact sequence of the SARS-CoV2 Spike protein. (It can be verified in the UniProt database using the identifier P59594)

But the illustrated sequence you see is not the mRNA Sequence (remember: A, C, G and U), but the resulting sequence of amino acids.

So, if the mRNA is our blueprint, this sequence is the finished product and not the underlying code any more. The reason why Moderna chose to publish the amino acid sequence instead of the mRNA sequence is, that they don’t want to disclose all the details of their code.

But just using a sequence that is already published, wouldn’t make a great innovation. So Moderna made at least three states of the art changes to their sequence that can be found in the patent:

1. Modification: Disguise the mRNA

The human body established a pretty powerful anti-virus system. Our cells are extremely unhappy about foreign RNA and try their best to get rid of it before it does anything.

This is the main problem with using an mRNA as a vaccine — it needs to sneak past our immune system. Overcome this challenge is one of the selling gimmicks of many RNA technology companies.

So how does Moderna sneak its mRNA past the antivirus system? The answer is found in the patent and is an exceptionally clever bit:

In some embodiments, 100% of the uracil in the open reading frame have a chemical modification. In some embodiments, a chemical modification is in the 5-position of the uracil. In some embodiments, a chemical modification is a N1-methyl pseudouridine.

Every Uracil (‘U’) was replaced by a 1-methyl-3’-pseudouridine. It is slightly chemically altered to decrease its susceptibility to degradation by our enzymes. It cannot be attacked by degrading enzymes (nucleases) as the alteration has made it to slippy for them. However, our protein machinery (ribosome) is not that picky, when it comes to protein synthesis and is still able to use it:

The use of 1-methyl-3’-pseudouridine is currently the most promising nucleotide substitution that substantially outperforms all other modified nucleotides studied.

2. Modification: Maintain the Proteins Natural Structure

The next modification is also a particularly smart one. In the patent, we find claims, that the used amino acid sequence can vary as much as 20% to the natural sequence.

In some embodiments, the amino acid sequence of the SARS-CoV antigenic polypeptide is, or is a fragment, or is a homolog or variant having at least 80% (e.g. 85%, 90%, 95%, 98%, 99%) identity to, the amino acid sequence identified by any one of SEQ ID NO 29, 32 or 34.

Unfortunately, how much it does vary is not disclosed by the patent. For one reason, because they do not want to disclose the details of their technology and further for extending the scope of the patent.

But it’s likely they used state of the art alteration in the protein sequence same or similar to the one used by BioNTech. If we have a look at their sequence we find, that two amino acids — a lysine and a valine — have been replaced by two proline.

It turns out that these two changes are inevitable to maintain vaccine efficiency. Why so? If we look at an electron microscope image we see that the Spike protein is usually incorporated in the shell of the virus together with some other proteins.

In the case of the vaccine, all these crucial parts of the natural virus are missing. This leads to a different conformation or shape of the spike protein in our cells. This altered shape leads to a huge loss of vaccine efficiency. Our body would develop an immune response against a wrong shaped protein, hence not being able to identify the viral protein when encountered.

To prevent this, Moderna did alter two amino acids to maintain the natural structure of the protein.

3. Optimizing the Protein Synthesis

The last change from the original nucleotide sequence Moderna made is called ‘Codon Optimization’. They optimized the sequence to achieve a much higher production rate of the spike protein. Remember the goal of the vaccine is to get the cell to produce high amounts of the SARS-CoV-2 spike protein, to trigger a high immune response with minimal mRNA required.

So let’s get quickly back to the basics: nature groups 3 nucleotides into 1 codon to translate into one of 20 different proteinogenic amino acid. But wait, with 4 possible nucleotides (A, C, G and U/T) and a length of 3, we have 4³=64 possible combination, but only 20 amino acids to encode. That means, that multiple codons can encode for the same amino acid. This is often described as the genetic code to be degenerate, or redundant, because a single amino acid may be coded for by more than one codon.

For this reason, it is possible to change the codon (triplet) but still encode for the same amino acid.

But why would someone change the code, if the result, the translated protein, would still be the same?

It turns out that changes in the RNA characters can make our machinery (enzymes) translate the code faster and more accurately. One of the reasons for this is that our machinery uses a pool of “transfer blocks” (tRNA pool), that transfer the right amino acid based on the matching codon. However, some of these transfer blocks are more abundant and more frequently used, thus accelerating the translation.

But there are many other reasons, e.g. RNA with higher amounts of ‘G’ and ‘C’ is converted more efficiently into proteins.

But why uses nature not always the most efficient and accurate code?

There are many reasons why nature uses inefficient sequence pattern. One way our cells regulate how many proteins are synthesized is by simply altering the speed of the production. Another reason is, that protein synthesis fits in a complex network of other processes, some of them being slower or needing more time (e.g. protein folding). As many of the processes are sequential, it can be beneficial to slow things down to not waste resources.

However, this is not the case for the Moderna vaccine. The vaccine needs to display a high efficiency to produce enough antigen to trigger an immune response. Moderna did the codon optimization by using algorithms and services from GeneArt (Life Technologies) and DNA2.0 (Menlo Park Calif).

The “three-prime untranslated region” (3’-UTR)

Much like our ribosome machinery needed a starting point to lead into the sequence (5’-UTR) — we find a similar sequence at the end of the RNA — the 3’-untranslated region. Just like the 5’-UTR this region is not translated into any part of the protein but does contain many elements that play a crucial role in gene expression by “influencing the localization, stability, export, and translation efficiency of an mRNA.” Many words could be said about the variety of mechanisms of how the 3’-UTR works, but probably the most predominant one being a sequence that can be targeted by a small interfering RNA (siRNA). These little siRNA molecules contain a complementary sequence and can thus interfere with the mRNA and block it — we call this ‘silencing’.

However, certain 3’-UTR sequences are very successful at enhancing protein expression and increasing RNA stability. These sequences are a nice add-on to any RNA therapeutic, and as their use is described in the patent, it is likely Moderna chose some of them.

The end of it: AAAAAAAA

The end of every human mRNA is polyadenylated. That means that the sequence ends on many AAAAAAAs or it has a poly(A) tail. The poly(A) tail is important for nuclear export, translation and stability of the mRNA. It is another tag on the mRNA that marks it as what it is and aids it to get out of the nucleus of the cell. But probably its most important function is to give the RNA an expiry date, whereas the shorter the tail the more likely it is to expiry. The poly(A) tail is shortened over time, and, when it is short enough, the mRNA has expired and is enzymatically degraded.

Summary

With this, we know how Moderna mRNA vaccine was engineered to display an efficient production of the SARS-CoV-2 spike protein. And for most parts, we understand why they have been used. If we add the function of each element of the mRNA to our first scheme of the mRNA, it looks like this:

The mRNA vaccine displays the incredible knowledge researchers have gathered over the past years and a product that is considered safe and effective for human use.

Concerns

But wait, one commonly raised concern hasn’t been addressed so far: Can an mRNA vaccine interfere with or alter our genetic information, our DNA? This is a far-fetched argument and there are many reasons why it cannot:

Even if RNA and DNA are chemically similar, RNA is yet too different to be integrated into our genetic pool: single-stranded vs double-stranded, uracil vs thymine to point out just a few differences. In over 60 years of RNA research, it has not been observed, that mRNA could integrate into our DNA.

Further RNA cannot be easily transferred to DNA and there are just a few exceptions to that central dogma of molecular biology. One being the human immunodeficiency virus (HIV) that can reverse encode its RNA to DNA. But it does not do it randomly to every mRNA. Specific elements and aiding molecules (primers) need to be present for this to work.

But even if RNA could do this, it would not be relevant to us at all. Our genetic information sits well safe in our nucleus. Everything that the virus does and everything the vaccine does, happens outside the nucleus. As we remember the mRNA is tagged multiple times for transport outside of the nucleus and does not contain any elements that could provide its entrance in the closely sealed nucleus.

But even if RNA could do all that and also overcome the last barrier of our cells and get into our precious nucleus, it would still be not relevant to us. In the end, it does only contain natural RNA elements with few viral RNA elements — the same our cells encounter with every common cold or contain naturally. If you fear to get RNA into your cells, well its too late for that: in this very moment hundreds of thousands of common viral RNAs enter and exit our cells — not harming our DNA at all. The mRNA vaccine is no exception to that.

Those who claim that vaccination could change our genes must also claim that infection with the SARS-CoV-2 itself or any other virus could change our genes. We are not turning into genetic mutants if we get just a common cold.

If you want to hear more about this amazing technology, just let me know. I am going to write a second part about Moderna’s mRNA vaccine that covers all the additional adjuvants and excipients to get the RNA into our cells.

Lastly, I want to acknowledge the author Bert Hubert, who recently wrote a similar article about the BioNTech mRNA vaccine, which provided me with the basis for this article. I highly recommend reading it.

I write adventures instead of dull coding tutorials in Full Stack Web and C# Development. Udemy Instructor & Diploma Engineer. Udemy: https://bit.ly/32qGFP1

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store