Analysis of fungi at LSU (28S)


(Kohei Takahashi) #1

Hi.
I tired to get taxonomic information of LSU fungal community using “qiime feature-classifier classify-sklearn”.
I got the database from SILVA, release 119.
I trained the database by using this tutorial (https://docs.qiime2.org/2018.6/tutorials/feature-classifier/). The commands I used were bellow.

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path SILVA_119_LSURef_trunc_sequences.fasta
–output-path 97_otus.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–source-format HeaderlessTSVTaxonomyFormat
–input-path SILVA_119_LSURef_trunc_taxonomy.txt
–output-path ref-taxonomy.qza

However, after I got the results (taxa-barplot), most sequences were not assigned at phylum level, but shown as “Eukaryota” (please see attached figure).
I think that it is something wrong.
When I searched some unassigned feature in BLAST, some of them had high similarity to known species.!

LSU|625x500


(Nicholas Bokulich) #2

(Nicholas Bokulich) #3

Hi @F18D029G,
It looks like your barplot did not attach — could you please try again? It may be too large to upload here — you may need to produce a smaller example barplot (filter out samples first), or post to dropbox etc and share the link here.

Most or all? The difference is important — if all, this is usually human error (e.g., the wrong amplicon site was used). If most, then the query sequences usually have something wrong with them. Once I see your barplot I can diagnose a little better.

Are your sequences in mixed orientations? classify-sklearn cannot handle mixed-orientation reads currently — mixed orientation reads are also bad news for denoising and OTU clustering (since you effectively duplicate all unique sequence variants), so the best approach is to re-orient these reads prior to importing to QIIME 2 (we do not yet have a method in QIIME 2 to re-orient these).

Could you try classifying with qiime feature-classifier classify-consensus-blast? That will also help diagnose if this is an issue with the classifier or reference sequences.


(Nicholas Bokulich) #4

(Angus Bishop) #5

out of interest, what orientation tool pre-qiime2 import would you recommend for mixed orientation reads?


(Nicholas Bokulich) #6

See this topic.


(Kohei Takahashi) #7

Sorry, I missed upload.

Can you see my barplot?


(Nicholas Bokulich) #8

Thanks for uploading! I am guessing that this is a level 2 view and e.g., Dikarya are classified more deeply.

Since you say those unclassified sequences BLAST to known species, I suspect this may be a mixed read orientation issue. You can try re-orienting reads, or use one of QIIME 2’s other taxonomy classifiers, which are not sensitive to read orientation. At the very least, try this and if you get similar results then there is a different issue afoot (probably issues with the reference since you made your own custom reference):

Let us know if that works!


(Kohei Takahashi) #9

I tried feature-classifier classify-consensus-blast.

I got simillar result, so I have issue about my reference.

I used SILVA119 database for qiime. (https://drive.google.com/drive/folders/0Bz1utnb_nbhIfmFJRWxJeFdMOEk2emVBbWxxNy1LZVVBRllkeVdXbWtvMVJIZDdKYjNPV3c)

Does anyone who is analyzing LSU know the most suitable reference of LSU?


(Evan Bolyen) #10

(Nicholas Bokulich) #11

Hi @F18D029G,
Indeed, looks like this is probably an issue with the reference. What steps did you take to format the reference sequences and taxonomy?

I imagine SILVA is fine for LSU analysis but I am not sure. RDP also has a fungal LSU database that you could download for this. I have not used either of these so cannot comment on pros/cons.

Another possibility is that you are getting ~50% unclassified sequences because you are amplifying non-target DNA. Do you know if your primers amplify non-fungal DNA?


(Nicholas Bokulich) #12

(Kohei Takahashi) #13

Hi

I used command this site.
(https://docs.qiime2.org/2018.8/tutorials/feature-classifier/)

I tried finding RDP fungal LSU database, but I couldn’t find.
I want to know method of download RDP fungal LSU database.

I use fungal specific primer.
So, my sequence data is maybe fungal DNA.


(Evan Bolyen) #14

(Nicholas Bokulich) #15

What I meant is: did you just download the SILVA LSU database and import to QIIME 2 or did you take any other steps to format those files? Could you send me the fasta and taxonomy file that you imported to QIIME 2? I can take a look to make sure the format is correct.

You should contact the RDP admins for that — I do not know if it is available for download, but I do know that they have a LSU reference for their webtool classifier.

Could you pull out some of the sequences that are not classifying, and run an NCBI blast search on them to see what these seqs could be? (this will just confirm that these are in fact fungal LSU sequences)

Thanks!


(Nicholas Bokulich) #16

(Kohei Takahashi) #17

Hi

I got SILVA119 database for qiime. (https://drive.google.com/drive/folders/0Bz1utnb_nbhIfmFJRWxJeFdMOEk2emVBbWxxNy1LZVVBRllkeVdXbWtvMVJIZDdKYjNPV3c )

I used command upload txt.
txt.txt (2.1 KB)

I tried NCBI blast search Unassigned sequence.
I got Camptobasidium, Lactifluus,Uncultured Glomeromycota etc…
They are fungi sequence.

Thanks
Kohei


(Nicholas Bokulich) #18

could you please share ref-seqs.qza and SILVA_119_LSURef_trunc_taxonomy.txt?

I wonder if something is going wrong at the extract-reads step. Is it possible that the SILVA 199 database does not contain full LSU and does not overlap your amplicon region?


(Kohei Takahashi) #19

Hello

I share ref-seqs.qza and SILVA_119_LSURef_trunc_taxonomy.txt bellow link.
https://www.dropbox.com/s/v6k2c85cqb5sjtw/ref-seqs.qza?dl=0
https://www.dropbox.com/s/s80ati69vyhnfyk/SILVA_119_LSURef_trunc_taxonomy.txt?dl=0

Thank you for your help!
Kohei


(Matthew Ryan Dillon) #20