I am new to QIIME 2 & bioinformatics in general. I'm in need of assistance to figure out how I need to alter my pipeline.
I have 430 fungal ITS2 Sanger sequences produced through a clone library & I am wanting to analyze the community structure through ESVs as opposed to OTUs. I have found that I need to use q2-itsxpress as my primers amplified from the beginning of the 5.8S gene all the way to the LSU. I figure I compare those ITS2 isolates to the UNITE database, then run that through q2-ghost-tree, as recommended here.
So, I have imported my trimmed & QC'd (via Geneious 10.2.6) data as a .FASTA file, as described in this Q2 forum post. I have been looking at the Fungal ITS analysis tutorial, but after building their UNITE classifier database & importing the mock community data, they denoise those sequences with Dada2. From what I understand, Sanger data isn't supported by Dada2 or Deblur. Additionally, those require .FASTQ files. Can I still get through the pipeline with .FASTA files? Also, is denoising an important step in the pipeline, or would obtaining the artifact from the q2-ghost-tree plugin give me the necessary file type to move forward with diversity analyses?
My apologies if this is a really basic question, but I'm a bit lost in the pipeline &, again, bioinformatics is still quite new to me.
That's correct. If you do not want to use OTU clustering, you can just use qiime vsearch dereplicate-sequences to dereplicate — that will give you a feature table and representative sequences artifact.
Yes... see this tutorial. Just follow the import and dereplicate-seqs steps in that section of that tutorial, then proceed through the ITS tutorial.
The feature table you get from dereplicate-sequences is all you need to proceed to diversity analyses. Denoising is important for correcting errors from next-generation sequencing methods, but since you have Sanger data and are doing a Sanger-specific QC protocol with Geneious, then denoising should not be necessary anyway.
Thank you @Nicholas_Bokulich for your reply! So sorry for the delay in response.
I have imported my data & have tried to dereplicate the sequences with vsearch. However, when I try to run that command I get an error:
Plugin error from vsearch:
list index out of range
I'm not sure why I would get that error message. I checked on the forum for previous issues with this & found that another user had encountered this issue with another issue. However, the user fixed their issue with a simple character replacement. I don't have issues with that, however. Here are the first couple of lines of my .FASTA file:
Hi @jhines1,
could you please report the full error traceback? Use the --verbose flag in your command to print the full traceback in your terminal, and please post that here, along with the full command.
Do all of your sequence IDs fit that same pattern? You should review all IDs to make sure there are not IDs with special characters or an unusual pattern.
311 unique sequences, avg cluster 1.1, median 1, max 3
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
Traceback (most recent call last):
File "/Users/haselkornlab/anaconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "<decorator-gen-128>", line 2, in dereplicate_sequences
File "/Users/haselkornlab/anaconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/haselkornlab/anaconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/Users/haselkornlab/anaconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 134, in dereplicate_sequences
table = _parse_uc(out_uc)
File "/Users/haselkornlab/anaconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 70, in _parse_uc
@Nicholas_Bokulich Something just occurred to me. Some of my sequences have 'N' in place of normal bases. Would this give vsearch, or Q2 in general, reasons to throw errors?
@thermokarst Yes, I will have to get back to the lab before I can get to the files. I was working on them a bit earlier but haven't saved them to an exterior source yet (e.g. could/thumb drive). I am out of the lab for the day, but will be back first thing tomorrow morning. I will send them on over then.
Hey there @jhines1! It was a problem with file line-endings --- your fasta file had a mix of LF and CRLF, which vsearch wasn't cleaning out, so when q2-vsearch was parsing the intermediate files created by vsearch there were all kinds of crazy going on in the file. I converted the file to use only unix-style line-endings and have returned it to you in a DM. Happy QIIMEing! :qiime2: