Hello,
so I am working with a subset of my dataset (10 libraries out of 200). I have demultiplexed them using the process_shortreads from the "Stacks" pipeline. Afterwards, I uploaded them into Qiime, and I cut the primers using Cutadapt. Then I denoised the sequences with DADA2. After checking the dada2 stast I see that most of my reads did not pass the quality filtering steps and I ended up with very few rep-seqs and low total feature count. The commands I used were the following ones:
The visualization for the demux-seqs is the following one: demux-ITS-2.qzv (311.6 KB)
Then I trimmed the primer with cutadapt with the following command:
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-ITS-2.qza
--p-adapter-f ^CAWCGATGAAGAACGYAGC...GAAYCATCGARTCTTTGAACGC
--p-adapter-r ^RGTTTCTTTTCCTCCGCTTA...CCTTGTTACGACTTYTMCTTCC
--o-trimmed-sequences demultiplexed_trimmed_sequences-ITS-2.qza
Then I denoised the sequences with the following command:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demultiplexed_trimmed_sequences-ITS-2.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-pooling-method "pseudo"
--p-chimera-method "consensus"
--p-n-threads 10
--o-table feature-table-ITS-2-dada_2.qza
--o-representative-sequences rep-seqs-ITS-2-dada_2.qza
--o-denoising-stats dada-2-stats-ITS-2.qza
I have seen other post related to this question, however they were dealing with the trunc length of the sequences. So I am wondering if there are other ways to solve the problem since I am not truncating the sequences. I believe that changing the default values for "--p-max-ee-f", "--p-max-ee-r", and "--p-trunc-q" would help me. However, I do not fully understand their meaning.
Your quality drops a lot by 140 in the forward read and by 160 in the reverse read.
Because of this, you may want to truncate your reads at this location
--p-trunc-len-f 140
--p-trunc-len-r 160
Try some of these settings, especially the --p-trunc-len-* settings, and let us know what you find!
Thank you for your feedback. I will try to apply the truncation of the reads and see the what the results are!! I will also read the information you provided me with regarding the expected error parameters. I will keep you posted!!
However, I have incurred into another problem when performing the taxonomic assignment. Firstly I downloaded all the ITS-2 sequences for Magnoliopsida from genebank with Rescript. The command I used was he following:
qiime rescript get-ncbi-data
--p-query 'txid3398[ORGN] AND (ITS2 OR Internal Transcribed Spacer 2) NOT environmental sample[Title] NOT environmental samples[Title] NOT environmental[Title] NOT uncultured[Title] NOT unclassified[Title] NOT unidentified[Title] NOT unverified[Title]'
--p-n-jobs 5
--o-sequences ncbi-refseqs-unfiltered_magnoliopsida.qza
--o-taxonomy ncbi-refseqs-taxonomy-unfiltered_magnoliopsida.qza
Then, I continued to taxonomic assignment with the following command:
Everything ran perfectly, however the resolution fo the taxonomic assignment was really poor. Most of the representative sequences were just assigned to the level of order.
Agreed! About 20% of the reads cannot be joined, but this is pretty good given the quality.
ITS classification is harder than 16S classification for a number of reasons. For a first run with a custom database (that you just made!), I think getting to order is pretty good!
Yes. It's all about classify-consensus-vsearch.
Aftering finding hits to the database with vsearch, the taxonomy of these hits are compaired. For each level where >50% of the hits agree a classification is given, until you reach a level where there is not longer a consensus.
For your data, the level in which hits no longer agree is mostly Order.
Your progress here is awesome. If I were you, I would keep these results and continue with the paper. If reviewer 3 thinks there is a better way to classify ITS-2, they will certainly tell you.
EDIT: If you do want to improve the database, try using RESCRIPt's 'extract-seq-segments', or adding a few "out-group" taxa to your database in case you amplified non-Magnoliopsida taxa.
@colinbrislawn
Thanks for your feedback! I will have a look to what you suggest and come back to you with the results. I hope I can get further than order since it is essential to our study to reach to the genus level at least!! Luckily for us we used two markers that can be used in combination to reach further taxonomic levels. We are studying plant-bird interactions so stopping at the order level it is a bit meaningless.
Thank a lot again!! I am very grateful to Qiime2 community and specially to you moderators