Aligning large numbers of reads

Hello all. I have been working with the results of a minION experiment with ~30 samples. The result of this experiment is ~2 million reads each around 350bp spread across the 30 samples fairly evenly (humble brag). I work at a small institution with limited computing resources and as such I have had to limit myself to 1000 reads per sample so as not to overwhelm my desktop (!) computer’s memory capacity and/or the mafft server. I have been doing this by randomly sampling reads after filtering, demultiplexing etc. Does anyone have suggestions for computational resources or strategies that might allow me to actually use all (or at least more of) the 2 million reads? I am just not enjoying the idea of leaving data on the table.

Hi @Maxy,

If you can get permission to host the data outside your institution, AWS might be a good solution for you. It’s essentially a supercomputer you can rent out for a short period of time. You do need to deal with uploading/downloading the data, but it may be right. If you’re working with human samples, you may need to discuss whether its appropriate for you, but if you can get permissions, AWS may be one of your easiest solutions.

Depending on where you are in the world, there may also be national or regional compute infrastructure available, so it might be good to look into that as well. I know several US states and at least a few European countries have networks where it’s possible to get access to computational infastructure even if you’re not at a a big school.

Best,
Justine

1 Like