Visualization of SampleData[Sequences] file

Jilim97 · February 12, 2024, 7:39am

Hello I am now running the QIIME2 and my file is now one big fan file.
So I used
qiime tools import
--input-path /home/koh/01-raw/sequences.fna
--output-path /home/koh/AGT_file/demux-sequences.qza
--type 'SampleData[Sequences]'
command to make qza file for further processing.
During this process, I am now having problem to visualize it, I mean converting it to qzv file cuz
qiime demux summarize doesn't work with 'SampleData[Sequences]' file format.
Is there any method to visualize it?

colinbrislawn · February 12, 2024, 2:22pm

Hello @Jilim97,

Welcome to the forums! :qiime2:

What would this visualization look like?
The function qiime tools import simply takes your .fna file and wraps it into a .qza file. I could also ask how would you like to visualize your sequences.fna file?

Jilim97 · February 13, 2024, 2:18pm

Hello, Thank you for quick reply.

But I just found other ways so this problem was solved.

But can I ask you other questions?

I am tried to replicate the data from paper.

They used American Gut Project.

"We downloaded the latest version of the processed OTU count table (similarity level 97%) which includes 19,524 samples and 36,405 OTUs from ftp://ftp.microbio.me/AmericanGut/ag-2017-12-04/0 3-otus.zip/100nt/gg-13_8-97-percent/otu_table.biom"

From this statement I tired to make 100nt OTU but at this point, I have problem.
When I visualize my data 11% of sequences are 126nt and the others are 151 or 150nt.
I guess that barcode and primer sequences were already removed for those data since length of barcede + primer is 33.
At this point is it ok to just trim first 20 to make it 100nt? or do I need to use cut adapt to remove those sequences?

And after this process, is it ok to use DADA2 cuz it makes ASV, not OTU.
Finally there is only GreenGenes 13-8 99% OTU classifier. BUt the paper used 97% similarity. How can I deal with this problem.

Thank you in advance.

Jilim97 · February 13, 2024, 2:18pm

To fix some questions, for the 97% similarity should I use Vearch with GreenGene reference to completely replicate the data?

colinbrislawn · February 13, 2024, 2:34pm

Sure, I can help answer more questions.
(Often it is best to open separate threads for separate questions.)

This is the biggest question:

Are you trying to replicate this paper or reproduce this paper? When it was published, OTU clustering methods were common, but now denoising methods that make ASVs are used because they are much better.

If you are trying to use new methods, check out this tutorial: “Moving Pictures” tutorial — QIIME 2 2023.9.2 documentation

If you are trying to replicate parts of this old paper, you could download the processed data and start from there.

Here are some smaller questions:

This is from the Illumina sequencing more than 7 years ago. The original reads were only 100 or 151 base pairs long. Dealing with varying read depth is a challenge. Trimming everything to 100 bp long is one option.

I'm not sure where to trim to make the longer reads match. You may have to trim 51 bases from the end to make the regions match. Try it and find out!

These numbers do not have to match!

In 2017, 97% was the threshold used for OTU clustering. Now, I would make ASVs (100%).
In 2017, GreenGenes 13-8 was used, clustered at 99%. Now I would use GreenGenes2 or Silva.

Jilim97 · February 14, 2024, 12:10am

Wow. I am really appreciated to your reply!!!!
It is really clear and helpful!!!!!!
Thanks a lot again.

system · March 16, 2024, 6:10am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.