I have some doubts with the taxonomic results I am getting. At first taxonomical level, it seems the analysis has been done correctly (the unassigned % is not too big, aprox 10%).
However, when I go through deeper levels, even if the unassigned % remains the same, there is an assignation that is called K_bacteria, which comprises almost the 80-90% of the assignations, that does not give more information (I attach some pictures, so the problem can be better understood):.
The version of Qiime2 I am using is QIIME 2 Core - 2020.6 and it is installed in Oracle VM VirtualBox.
The analysis I have performed is based on the "Moving Pictures" tutorials, using DADA2.
The samples correspond to feces of patients and controls.
Don't know if more information (exact commands or more pictures) is necessary in order to make easier to help me.
Hi!
I think you should provide additional information so the Qiime2 team members could help you to resolve your issue.
Did you train your classifier by yourself or you downloaded already pretrained?
Are you sure that you used the classifier that trained for the same rRNA region as your amplicons?
Yes! You are right! i'm sorry, but I have already started using qiime2 some days ago and I'm pretty lost.
The classifier I used was Greengenes 13_8 99% OTUs and I downloaded it by the following commands, as it is described in the "Moving Pictures" tutorial.
For the amplicons, I used the Ion 16S Metagenomics Kit from ThermoFisher, that amplifies 7 of the hypervariable regions, using two primers (Primer 1: V2, V4, V8 and Primer 2: V3, V6-7, V9).
Now I am a bit confused, because I don't actually know in which region does the classifier work.
I am not really familiar with the kit you used, but do you know the length of the expected amplicons? You mentioned you followed the moving picture tutorial,
How many sequences passed the denoising step?
One way to analyse these data may be to use a closed reference clustering instead the de-novo clustering approach you used (there may be other way but I can not think any that does not imply the extraction of the sequences for each amplicons ), please refer to: Clustering sequences into OTUs using q2-vsearch — QIIME 2 2020.8.0 documentation.
The following link provides you more information about the kit Ion-16S Metagenomic Kit. In that flyer it seems that the kit works with a length of 400bp. But, the rep-seqs.qzv that I get as a result of DADA2, shows that the reads have a length of 175.
I guess that to have just a %60 of sequences that passed the filter is not a good point.
Finally, Does the De-Novo clustering step appear in the Moving Picture Tutorial? I have checked if I performed that step and I did not. In which part of the process should I apply these clustering processes (even De-Novo or Closed-Reference)? After DADA2?
I forgot the Moving Tutorial works by denoising single (either with dada2 or deblur).
So you are working with R1 only, that is why all sequences are 175 bp, which I guess is your trimming length. Do you have paired end reads? If so and you want to merge them, please have a look at the ATACAMA soil tutorialhttps://docs.qiime2.org/2020.8/tutorials/atacama-soils/
What length are your sequences? If you have 2x250bp, with an amplicon of about 400bp, you should be able to get enough overlap to merge them.
On the percentage of sequences passing the filters, it is a bit on the low side but it may be enough to work with, depend on the complexity of your data.
Sorry for the confusion on de-novo or closed reference! What I meant with de-novo is on the fact that you assigned taxonomy to the denoised sequences, as opposite to clustering your sequences by aligning them to your reference database (and specifying a minimum similarity threshold) as described in the tutorial with vsearch. That would be my last resource if all the other possibilities fail!
As for the classifier you used, it was trained on the v4 region only and that would certainly explain the result you seeing! You may try to train your own classifier, I suppose using the whole genes (that is skipping the 'qiime feature-classifier extract-reads' step).
Yes exactly, the --p-trun-len I chose is 175.
At first we thought that the reads were paired-end, but when I tried to convert the .bam file I was given, to 2 .fastq files, it was impossible. So I started reading some reviews and I found that, while Illumina offers single or paired-end, IonTorrent does not have the option of paired end. So I guess the reads are Single-end.
I'm sorry but I don't know where to find out the length of the sequences. Is it in this table?
I will try with the vsearch tutorial (WHERE CAN I FIND IT?) and training my own classifier (I Don't really know how I'm supposed to do this neither) then and tell you what happens