Taxa Barplot classification levels

horsemant · July 13, 2021, 2:49pm

Hi,

I am running a QIIME2-2021.4 Virtual Box with linux environment on WSL2. I am attempting to duplicate the results of a paper for a larger meta-analysis effort. Their methods are as follows:

Primers, 27F and 534R were used. Sequences were quality trimmed (Q20) and reads shorter than 200 bases were removed. Due to amplicon size and quality trimming, forward and reverse reads could not be consistently merged, thus only the forward read was used for analyses. Subsequently, each sample sequence set was sub-sampled to the smallest sample size to avoid analytical issues associated with variable library size. Sub-sampled data were pooled and renamed and clustered into operational taxonomic units (OTU) at 97% similarity.

When I looked at the distribution of reads, the overall quality looked poor so in DADA2 I decided to --p-trim-left 5 \ and --p-trunc-length 95. I also tried to truncate at 120, 150 and 200 to troubleshoot.

Downstream when I attempted to visualize using qiime taxa barplot, only 2 samples had any sub-classification beyond domain. I also tried to cutadapt to remove non-biological sequences which wasn't fruitful.

Any suggestions on troubleshooting would be welcomed!

timanix · July 13, 2021, 7:18pm

Hello!
To help the community to troubleshoot this issue, it would be very useful also to provide an information about how you performed taxonomy classification, against which database and targeting region. Did you train the classifier, or used pretrained one?

horsemant · July 13, 2021, 7:50pm

Hi,
I assigned taxonomy to my representative sequences using a pre-trained 16S rRNA full sequence Greengenes classifier:
qiime feature-classifier classify-sklearn
--i-reads rep-seqs.qza
--i-classifier gg-13-8-99-nb-classifier.qza
--o-classification taxonomy.qza

timanix · July 21, 2021, 7:13am

Hi @horsemant
I am apologizing for a long silence.
I asked for advise from other moderators and now will copy and paste some hints I received:

Instead of full length classifier, you could try to use one that was trained on the targeting region (tutorial)
You should check if these samples actually are what you expect to see as far as microbial composition. If it's a low biomass sample with tons of non-target host cells then it may be just a matter of removing those non-target reads.
Expand on the non-biological sequence removal process. This is certainly necessary and failing to remove those before DADA2/OTU clustering can lead to the issue you observed. What exactly you have done when you say it wasn't fruitful?
Since you are using DADA2 to denoise, it is better not to do any initial preproccesing like q20 filtering or trimming and instead let DADA2 handle all that (in case if you did something like this).
Also, if nothing will work for you, you can share your actual table and taxonomy artifact provenance with us, that will help to troubleshoot it further.

Thanks @Mehrbod_Estaki