Using Qiime2 to view the sequences in fastq files

linjie_0911 · June 17, 2021, 6:15am

Hi everyone, this is my first time posting a topic, so I hope I did it the right way and can get some help from the community!
I did my Qiime 2 analysis of my 16s Illumina results. I have got the community taxonomy. I further isolated some strains from my community and did Sanger sequencing on them to get the nucleotides base pairs.
Now, I want to see whether the strain I isolated is the abundant stain in the community analysis from Illumina. Is there a way to do it? Or is there a way to extract the sequencing data from fastq file so that I can do a search and pair with my Sanger results?

Many thanks!

wburgess · June 17, 2021, 11:27am

Welcome! I'm not an expert, but I'll do my best with your summary.

First, a FASTQ file is just a special type of text file, so if you want to view it directly, you ought to be able to do so. My own files usually are usually something like filename.fastq.gz. In Linux, all I have to do is type gunzip filename.fastq.gz at the command line and then the unzipped file can be handled by any text editor, or bioinformatics programs (e.g., UGENE or something like it)---but you don't need Linux---any modern computer ought to be able to unzip it, and then it's just a very big text file.

Second, at some point in your QIIME 2 pipeline, you probably ran either Deblur or DADA 2. Both of those commands output (amongst other things) something probably called representative_sequences.qza. Or somewhere, you should have a QIIME file of the type FeatureData[Sequence], and you can find out what type a qza file is by typing qiime tools peek filename.qza. Once you've finished processing it however you want to, you can run something like qiime tools export --input-path processed_sequences.qza...., and get a FASTA file of the sequences of the features (strains, in your case) that your QIIME pipeline found.

Or, you can run something like qiime feature-table tabulate-seqs --i-data processed_sequences.qza...., then view the resulting visualization---that gets you summary statistics as well as a convenient button to download a FASTA file. From there, the BLASTing to compare with Sanger sequences is up to you.

I hope this helped a little, @linjie_0911!

linjie_0911 · June 18, 2021, 12:33am

Thank you so much, wburgess!
Your suggestion look promising! I can't wait to try it out!