Sequencing Proteins With Nanopores Is Hard... But How Hard Exactly?

May 2024 · 3 minute read

Someone on the Discord was asking me about nanopore protein sequencing, and I made the following off-the-cuff comment:

DNA sequencing on protein nanopores is hard, because you're cramming 4^5 states into ~20 picoamps.”… “with protein sequencing you're looking at a LOT more states (3200000 for 20 amino acids)”

While I think it makes the point, as someone else pointed out. This is actually not a very useful statement, the problem being that amino acids and nucleotides differ in size quite a bit!

In a previous post I said this, which is far more reasonable I think:

Conservatively, the single amino acid change contributes to differences in at least 3 steps of the trace. When building out a complete platform we can expect >8000 different current levels, over what we assume will be a 10pA range, ~0.001pA (1fA) between states. This is similar to what we saw with Dreampore.

To work this though a bit further I decided to got hunting for amino acid lengths and came across this table:

In MspA our construction region covers about 2nm:

Generally it seems like ~5 nucleotides contribute to the signal in Nanopore DNA sequencing. This would be 0.34*5 = 1.7nm, so based on these observations, we’ll call it 2nm.

With this in hand, we can write some (awful) python code to print all possible combinations that fit within 2nm. AAs don’t need to completely fit to contribute to the signal. They can also repeat so 0.7 0.7 0.85 = 2.25 nm is a valid combination, with one of the AAs overhanging by 0.25nm.

This analysis is quite limited. Obviously overhanging AAs and those not directly over the minimal constriction point will contribute less to the overall signal. Luckily this is a newsletter not a scientific paper, and we can play things fast and loose to get a general idea of how things might play out.

Here’s the code:

import itertools from copy import deepcopy AA = [5.5, 11.0, 7.5, 6.5, 6.4, 3.9, 8.0, 9.0, 8.5, 8.5, 8.5, 11.3, 10.3, 9.7, 6.2, 6.1, 6.1, 10.9, 10.4, 7.0] def addone(l): sum = 0 for i in l: sum += i if sum >= 20: for i in l: print(i, " ", end='') print() else: for j in AA: nl = deepcopy(l) nl.append(j) addone(nl) addone([])

When run we see that an MspA like pore gives us a total of 27228 possible states, still a lot bigger than 1024, but a lot less than the number I originally quoted. A lot of these are amino acid combinations that only overhang by 0.1nm. So let’s have a look at how this varies with constriction size:

Even at 1.8nm, we’re still looking at ~10,000 states. We can see what happens at even smaller pore sizes by plotting this on a log scale:

To reach a “DNA equivalent” of 1024 states we’d need a roughly 1.2nm constriction (though I’m sure in practice other factors come in to play).

Overall, I’m still largely of the opinion that we’re looking at a problem which is an order of magnitude harder than DNA sequencing. But perhaps not several orders of magnitude harder.

ncG1vNJzZmiZo5q%2Bb7%2FUm6qtmZOge6S7zGinaKuVpsKmusKipaBloKe8tbHIp6pmr5mptW66wKemqaeimsA%3D