Trouble importing sequences

victoriamesa · April 23, 2020, 11:21pm

Dear all,

My Version of QIIME 2 is qiime2-2020.2 installed in conda.
I have two fastq files PairEnded 2X250 bp. Sequences are demultiplexed because barcodes are not contained in the sequence but were in the header.

I have imported sequences through the format Casava 1.8 paired-end demultiplexed:
qiime tools import **

--type 'SampleData[PairedEndSequencesWithQuality]' **

--input-path Gut_microbiota/Sequences/ **

--input-format CasavaOneEightSingleLanePerSampleDirFmt **

--output-path Gut_microbiota/MicrobiotaPerro/Sequences/demux-paired-end.qza

But I got the following error:
Quality score length doesn't match sequence length for record beginning on line 11450401

Then, I tried deleting all the information about the barcodes in the header (this information removed: orig_bc=AGCAGAACATCT new_bc=AGCAGAACATCT bc_diffs=0) and I did the importing sequences again but this time for (single-end) format because I just removed the barcodes information in the header in one of the two fastq files. But this time I got the following error:
There was a problem importing Gut_microbiota/Sequences/:
Gut_microbiota/MicrobiotaPerro/Sequences/OS253_S0_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

Lowercase case sequence on line 42

I don't see anything strange in the lines near 42:

What can I do with these files? Where do I start?
Thank you in advance

jwdebelius · April 24, 2020, 3:06pm

Hi @victoriamesa,

This looks like a problem with your core sequence file. If they're available, I would start by contacting your sequencing provider and letting them know that you have sequences where quality scores don't match the length of the sequence.
This is a red flag that there may have been something wrong in the demultiplexing before you got the data and you should check with them to make sure the files you have aren't corrupted.

I would wait to hear from them before proceeding

Best,
Justine

victoriamesa · April 25, 2020, 9:27pm

**Hi @jwdebelius **
I really appreciate your help. Thank you so much.

**I have contacted the sequencing provider but I don't have answer yet. **
During this I I've been able to import sequence in manifest format with the following qzv result:

qiime tools import **

--type 'SampleData[PairedEndSequencesWithQuality]' **

--input-path Copia_se-33-manifest **

--output-path pair-end-demux2.qza **

--input-format PairedEndFastqManifestPhred33V2

Imported Copia_se-33-manifest as PairedEndFastqManifestPhred33V2 to pair-end-demux2.qza.

But when I try to do denoise with the following code, I got the following error:

qiime dada2 denoise-paired **

--i-demultiplexed-seqs pair-end-demux2.qza **

--p-trim-left-f 13 **

--p-trim-left-r 13 **

--p-trunc-len-f 200 **

--p-trunc-len-r 200 **

--o-table table.qza **

--o-representative-sequences rep-seqs.qza **

--o-denoising-stats denoising-stats.qza **

--verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/forward /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/reverse /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/output.tsv.biom /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/track.tsv /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/filt_f /var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/filt_r 200 200 13 13 2.0 2.0 2 consensus 1.0 1 1000000

R version 3.5.1 (2018-07-02)
Le chargement a nécessité le package : Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.3 / RcppParallel: 4.4.4

Filtering
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Mismatched forward and reverse sequence files: 78208, 77691.
Ejecución interrumpida
Traceback (most recent call last):
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
run_commands([cmd])
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/forward', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/reverse', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/output.tsv.biom', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/track.tsv', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/filt_f', '/var/folders/j1/8tw8sl_17bgbhrlsc47l4xcr0000gp/T/tmpcyf1slci/filt_r', '200', '200', '13', '13', '2.0', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-455>", line 2, in denoise_paired
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/Users/victoriamesa/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

**I don’t know how to do the next step.
Can you please give me some advice and what means mismatched file: Mismatched forward and reverse sequence files: 78208, 77691?

jwdebelius · April 26, 2020, 4:18pm

Hi @victoriamesa,

This means that you've got a different number of sequences in your forward and reverse file. It again points to problems with your file that you should discuss with your sequencing provider. They may need to re-demultiplex, or possibly re-sequence and so they're the best people to offer advice. If you change the files, you may change the integrity of your data.

Best,
Justine

victoriamesa · April 30, 2020, 10:34pm

Hi @jwdebelius

I have the original files without demultiplexing. Here the files:

I have successfully imported them with the following code:
qiime tools import
--type MultiplexedPairedEndBarcodeInSequence
--input-path MULTIPLEXED/
--output-path MULTIPLEXED/multiplexed-seqs.qza

Imported MULTIPLEXED/ as MultiplexedPairedEndBarcodeInSequenceDirFmt to MULTIPLEXED/multiplexed-seqs.qza

But for to do the following code, to extract barcodes (qiime cutadapt demux-paired) I have some doubts:
How do I know if the barcodes are in the forward or reverse?
I only have a list of barcodes associated with the samples and the 2 fastq files

Also, I have tried to search some barcodes in the files and I see only a few in the sequences (around 6)? for example:

Thank you so much,
Best
Victoria

jwdebelius · May 1, 2020, 5:05pm

Hi @victoriamesa,

Personally, I would try both and see what gives the best results, especially if this is your first time with this protocol/machine/provider. I will just say it's hard to determine what exactly is correct for your specific data/protocol/provider because everyone has something that's a bit different.

Best,
Justine

victoriamesa · May 3, 2020, 10:49pm

Hi @jwdebelius, thank you for your previous reply.
I would like you to review the procedure that I have developed. I have some doubts about the results.

qiime cutadapt demux-paired \ --i-seqs multiplexed-seqs.qza \ --m-forward-barcodes-file metadata.tsv \ --m-forward-barcodes-column Barcode \ --p-error-rate 0 \ --o-per-sample-sequences demultiplexed-seqs.qza \ --o-untrimmed-sequences untrimmed.qza \ --verbose

It worked fine. Later for the step of trim-paired adaptador. I have only information about LinkerPrimerSequence:ATACCATTAACACACGGTCGKCGGCGCCATT). I use all this sequence as follows?

qiime cutadapt trim-paired
--i-demultiplexed-sequences demultiplexed-seqs.qza
--p-front-f ATACCATTAACACACGGTCGKCGGCGCCATT
--p-front-r ATACCATTAACACACGGTCGKCGGCGCCATT
--p-error-rate 0
--o-trimmed-sequences trimmed-seqs2.qza
--verbose

I got the following result: Total Samples: 48 are correct but is it correct the number of frequencies?

I continue with DADA2 denoise:

qiime dada2 denoise-paired **
--i-demultiplexed-seqs Gut_microbiota/MicrobiotaPerro/MULTIPLEXED/trimmed-seqs2.qza **
--p-trim-left-f 20 **
--p-trim-left-r 20 **
--p-trunc-len-f 200 **
--p-trunc-len-r 200 **
--p-n-threads 2 **
--o-table MULTIPLEXED/table.qza **
--o-representative-sequences MULTIPLEXED/rep-seqs.qza **
--o-denoising-stats MULTIPLEXED/denoising-stats.qza **
--verbose

With the following outputs:
denoising-stats.qzv

table.qzv

rep-seqs.qzv

I don't know if the procedure is adequate? is the frequencies strange?
Thank you in advance for your guidance

jwdebelius · May 4, 2020, 8:17pm

Hi @victoriamesa,

You may want to consider your quality filtering parameters; it looks like you're losing a lot of sequences there. You might want to trim the sequences shorter, although you generally have low quality reads.

Best,
Justine.

victoriamesa · May 4, 2020, 8:46pm

Thank you @jwdebelius,

What kind of parameters could you recommend me?

jwdebelius · May 4, 2020, 9:14pm

Hi @victoriamesa,

You need to look at your data, and consider what will work for you. There is no one correct answer. You want to a balance sequence length and sequence quality. I would consider where you see large drop-offs in quality and use that to trim.

Best,
Justine

jwdebelius · May 10, 2020, 4:48pm

A post was split to a new topic: Vsearch Concensus is giving lots of unassigned reads

system · June 10, 2020, 10:48pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.