Hi All,
I have already demultiplexed, paired-end .fastq.gz files from a Miseq PE300 run. There doesnt seem to be a specific tutorial covering this setup, but I did find some help on the forums. I am running into an error when trying to map my reads and am unsure if its due to an single problem with the step, or previous steps.
Ill do my best to outline what I have done:
This run on a Ubuntu server running Linux with q2cli version 2020.6.0.
Files from the seq facility, there are 106 samples:
ED1127vert_R1_001.fastq.gz ED1372vert_R1_001.fastq.gz
ED1127vert_R2_001.fastq.gz ED1372vert_R2_001.fastq.gz
ED1128vert_R1_001.fastq.gz ED1373vert_R1_001.fastq.gz
ED1128vert_R2_001.fastq.gz ED1373vert_R2_001.fastq.gz
Here is the pipeline that I've gleaned from several forum posts
#paired R1 and R2
qiime tools import
--type SampleData[PairedEndSequencesWithQuality]
--input-path mts_fastq_manifest_paired
--output-path ./import_paired/paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33
qiime demux summarize
--i-data ./import_paired/paired-end-demux.qza
--o-visualization ./import_paired/paired-demuxed.qzv
#I get an output that shows the following for F reads (can only upload one image)
import custom reference
qiime tools import
--input-path 16s_ref_nodupes.fasta
--output-path 16s_ref_nodupes.qza
--type 'FeatureData[Sequence]'
#questionable on the trimming and truncating values. Primers are 20 and 21 bases for F and R.
qiime dada2 denoise-paired
--i-demultiplexed-seqs ./import_paired/paired-end-demux.qza
--p-trim-left-f 20
--p-trunc-len-f 300
--p-trim-left-r 21
--p-trunc-len-r 300
--o-representative-sequences ./import_paired/clean_300/rep-seqs-dada2.qza
--o-denoising-stats ./import_paired/clean_300/stats-dada2.qza
--o-table ./import_paired/clean_300/table-dada2.qza
#map to reference
qiime vsearch cluster-features-closed-reference
--i-sequences ./clean_300/rep-seqs-dada2.qza
--i-table ./clean_300/table-dada2.qza
--i-reference-sequences ./../16s_ref_nodupes.qza
--p-perc-identity 0.95
--output-dir ./map_95/map_95.qza
--o-unmatched-sequences ./map_95/unmatched_95.qza
--p-strand both
--verbose
#here is the error after running the vsearch script:
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --usearch_global /tmp/tmphjr4ecyt --id 0.95 --db /tmp/qiime2-archive-i5k0jgis/3b8b748e-ce9d-4e73-b11c-385f454aa85c/data/dna-sequences.fasta --uc /tmp/tmpeqoqp1kj --strand both --qmask none --notmatched /tmp/tmpbvuaibot --threads 1 --minseqlength 1 --fasta_width 0
vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 80 cores
Reading file /tmp/qiime2-archive-i5k0jgis/3b8b748e-ce9d-4e73-b11c-385f454aa85c/data/dna-sequences.fasta 0%
Fatal error: Invalid FASTA - header must be terminated with newline
Traceback (most recent call last):
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in cluster_features_closed_reference
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 263, in cluster_features_closed_reference
run_command(cmd)
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/home/tangled/miniconda2/envs/qiime2-2020.6/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--usearch_global', '/tmp/tmphjr4ecyt', '--id', '0.95', '--db', '/tmp/qiime2-archive-i5k0jgis/3b8b748e-ce9d-4e73-b11c-385f454aa85c/data/dna-sequences.fasta', '--uc', '/tmp/tmpeqoqp1kj', '--strand', 'both', '--qmask', 'none', '--notmatched', '/tmp/tmpbvuaibot', '--threads', '1', '--minseqlength', '1', '--fasta_width', '0']' returned non-zero exit status 1.
Plugin error from vsearch:
Command '['vsearch', '--usearch_global', '/tmp/tmphjr4ecyt', '--id', '0.95', '--db', '/tmp/qiime2-archive-i5k0jgis/3b8b748e-ce9d-4e73-b11c-385f454aa85c/data/dna-sequences.fasta', '--uc', '/tmp/tmpeqoqp1kj', '--strand', 'both', '--qmask', 'none', '--notmatched', '/tmp/tmpbvuaibot', '--threads', '1', '--minseqlength', '1', '--fasta_width', '0']' returned non-zero exit status 1.
See above for debug info.
So, it says "Fatal error: Invalid FASTA - header must be terminated with newline" which I am unsure is due to how my original samples were imported or the reference.qza. I had no errors importing either, so I would think that I would have had the header error prior to the vsearch step.
Please let me know if you think other steps are necessary, should be swapped, or have an error. I have already read through many of the forum help posts about this and the links to the tutorials, so I need a bit more specific help.