Hello @llenzi @colinbrislawn ,
Can I know if there is a way to randomly pick 1000 denoised sequences of a sample and get it's taxonomic classification?
Thank you in advance,
Brigitta
Hello @llenzi @colinbrislawn ,
Can I know if there is a way to randomly pick 1000 denoised sequences of a sample and get it's taxonomic classification?
Thank you in advance,
Brigitta
Hi @Brigitta1,
What I do in this case, is to use seqtk (GitHub - lh3/seqtk: Toolkit for processing sequences in FASTA/Q formats), you can install into your qiime 3 environment with: conda install -c bioconda seqtk (run this with the environment active!)
You can run something as the following:
seqtk sample read1.fq 10000 > sub1.fq
For the taxonomic assignment of the subsampled reads, you may try:
https://library.qiime2.org/plugins/q2-metaphlan2/12/
Hope it helps
Luca
Hi @Brigitta1,
In addition to @llenzi's suggestion you can also make use of: qiime rescript subsample-fasta ...
If you'd like to subsample about 5% of the sequences, you'd use the following command:
qiime rescript subsample-fasta \
--i-sequences seqs.qza \
--p-subsample-size 0.05 \
--p-random-seed 1234 \
--o-sample-sequences sub-sampled-seqs.qza
Note, in this example you may not always get exactly 5% sequences as your output. That is we worked to make this fast and memory efficient. This command it will iterate through each sequence and pick a random value between 0 and 1. If that value is less than the --p-subsample-size
then the sequence is written to file. Thus, if you had a 100 sequences, and wanted to subsample ~5% of them you might end up with 4, 5, or 6 sequences in your output.
-Cheers!
-Mike
Always forgot how magig is recript lol
Thank you so much @llenzi @SoilRotifer
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.