Make phylogenetic tree based on sequence

Nelli · December 1, 2022, 9:06pm

Hey there,

I have used the dada2 algorithm for analyzing fastq files. Now I want to switch to qiime2 for downstream analysis. I have changed the ASV table into a feature table.qza format, and also I have metadata.tsv file. However, for generating alpha and beta diversity metrics I need to make a phylogenetic tree. I have a data frame with the information about taxa, their sequence, and from which sample coming to this taxa.
My question is: Is it possible to create a phylogenetic tree based on this information using such a code?

qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences filtered-sequences/filtered-rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza

Mehrbod_Estaki · December 1, 2022, 9:50pm

Hi @Nelli,

In order to build a phylogenetic tree in QIIME 2 you need a set of representative sequences from your ASV table of type FeatureData[Sequence]. If you run DADA2 in QIIME 2 it automatically generates this on top of your ASV table, however, if I understand correctly you did your DADA 2 step in R?
If so, you can go back to your original ASV table In R (make sure you use the one that has actual sequences as feature names, not taxon names) and export a rep set like this

uniquesToFasta(yourasvtable, fout='rep-seqs.fna',ids=colnames(yourasvtable))

Then import that into QIIME 2

qiime tools import \
--input-path rep-seqs.fna \
--output-path rep-seqs.qza \
--type "FeatureData[Sequence]"

Now you can build a tree using this.

Hope that helps

Nelli · December 2, 2022, 2:03pm

Thank you for your help

Nelli · December 3, 2022, 3:34pm

Yes, you are right I'm trying to convert the result of DADA 2 in R to qiime2 readable file. Have tried to use the ASV table (where row names are sequences, and column names are samples) and I got this error:

Error in getUniques(unqs, collapse = FALSE) :
Unrecognized format: Requires named integer vector, fastq filename, dada-class, derep-class, sequence matrix, or a data.frame with $sequence and $abundance columns.

Then I found this solution but it takes only the sequence column which is why I can't import rep-seqs.fna into QIIME 2.

I also tried what said error just create $sequence and $abundance columns but it also didn't work.

Duplicate sequences detected.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'writeFasta': invalid class “ShortRead” object: sread() and id() length mismatch: 120855, 2

Mehrbod_Estaki · December 5, 2022, 6:19am

Hi @Nelli,

Can you give us a little more information about your your ASV table. How you made it and perhaps show us the first few lines? Also, what exact commands you are running.

The second error is suggesting somehow you have duplicated sequences which I'm not sure how you ended up with cause the output of DADA2 would be an ASV table without any duplicated sequences. Have you done some modifications after denoising in creating your ASV table?

system · January 5, 2023, 12:19pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.