Analyzing random sequences

Hello @llenzi @colinbrislawn ,

Can I know if there is a way to randomly pick 1000 denoised sequences of a sample and get it's taxonomic classification?

Thank you in advance,

Hi @Brigitta1,

What I do in this case, is to use seqtk (GitHub - lh3/seqtk: Toolkit for processing sequences in FASTA/Q formats), you can install into your qiime 3 environment with: conda install -c bioconda seqtk (run this with the environment active!)

You can run something as the following:
seqtk sample read1.fq 10000 > sub1.fq

For the taxonomic assignment of the subsampled reads, you may try:

Hope it helps

1 Like

Hi @Brigitta1,

In addition to @llenzi's suggestion you can also make use of: qiime rescript subsample-fasta ...

If you'd like to subsample about 5% of the sequences, you'd use the following command:

qiime rescript subsample-fasta \
    --i-sequences seqs.qza \
    --p-subsample-size 0.05 \
    --p-random-seed 1234 \
    --o-sample-sequences sub-sampled-seqs.qza

Note, in this example you may not always get exactly 5% sequences as your output. That is we worked to make this fast and memory efficient. This command it will iterate through each sequence and pick a random value between 0 and 1. If that value is less than the --p-subsample-size then the sequence is written to file. Thus, if you had a 100 sequences, and wanted to subsample ~5% of them you might end up with 4, 5, or 6 sequences in your output.


1 Like

Always forgot how magig is recript lol

1 Like

Thank you so much @llenzi @SoilRotifer

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.