Ion Torrent data denoise with dada2

Marek_Koutny · February 27, 2018, 3:03pm

Hello qiime2 team,

I was happy to import my ion torrent data with help of this post:

and obtained single-end-demux.qza artifact file.

But now I can not proceed through dada2 tool:
qiime dada2 denoise-single \

--i-demultiplexed-seqs single-end-demux.qza
--p-trim-left 15
--p-trunc-len 250
--o-representative-sequences rep-seqs.qza
--o-table table.qza

It does not like my data:
Plugin error from dada2:

Argument to parameter 'demultiplexed_seqs' is not a subtype of SampleData[PairedEndSequencesWithQuality | SequencesWithQuality].

Debug info has been saved to /tmp/qiime2-q2cli-err-8hez8tae.log

There is not much help in the log file. It says basically the same.

Thanks,
Regards,
Marek

colinbrislawn · February 27, 2018, 7:41pm

Hello Marek,

Thanks for posting all of this information. I think we are pretty close to figuring this out.

In Matt's post on that other thread, he suggests

qiime tools import \
 --type 'SampleData[JoinedSequencesWithQuality]' \
 --input-path se-33-manifest.csv \
 --output-path single-end-demux.qza \
 --source-format SingleEndFastqManifestPhred33

Did you use this command, or have you tried using another --type?

Colin

thermokarst · February 27, 2018, 7:50pm

Hi @Marek_Koutny, to elaborate a bit on @colinbrislawn's suggestion, what format are your reads in, and how did you import them? Please copy and paste your import command here. Also, what are the results when you run:

qiime tools peek single-end-demux.qza

That error message you posted means that the file single-end-demux.qza doesn't contain SampleData[PairedEndSequencesWithQuality] or SampleData[SequencesWithQuality], which leaves @colinbrislawn's suggestion that these data are SampleData[JoinedSequencesWithQuality]. If that is the case (and the peek command above will confirm that for us), that means you shouldn't use DADA2, because that technique assumes that your paired-end reads aren't pre-joined. You can look at using q2-deblur, check out this tutorial for more info. Thanks!

Marek_Koutny · February 27, 2018, 7:51pm

Hi Colin,

Thanks for answering. I have used exactly that one you quote.

Marek

Marek_Koutny · February 28, 2018, 12:27pm

Hi Mattew,

Thanks for helping.

The result of peek is:
qiime tools peek single-end-demux.qza
UUID: 88af3b88-0836-4069-96bc-8dc7a51f28c3
Type: SampleData[JoinedSequencesWithQuality]
Data format: SingleLanePerSampleSingleEndFastqDirFmt

and I have used this for import:
qiime tools import
--type 'SampleData[JoinedSequencesWithQuality]'
--input-path manifest.csv
--output-path single-end-demux.qza
--source-format SingleEndFastqManifestPhred33

I assume I have pre-joined data
I have constructed manifest as recommended with forward at the end for all fastq files.

Regards,

Marek

thermokarst · February 28, 2018, 12:30pm

Hi @Marek_Koutny --- cool, this all makes sense, it looks like you had pre-joined reads, and you imported them as SampleData[JoinedSequencesWithQuality]. So as I mentioned before, you won't be able to work with DADA2, because it operates on paired-end reads before they are joined, but you have a few other analysis options available to you. Please check out the tutorial I linked to above. Keep us posted!

Marek_Koutny · March 6, 2018, 3:04pm

Hi Matthew. With you help I have get little bit further. I was able to get my data through deblur and get feature-table and rep-seq file. I have still several issues. In the feature table I can see that my feature sequences are in one sample only, which is weird, these are samples are from the same soil in different times. Please, can I create some table with feature sequences, frequencies and sample numbers? I cannot see which feature was found in which sample. Thank you, Marek.

thermokarst · March 6, 2018, 10:11pm

Hi @Marek_Koutny, can you please provide the visualization from demux summarize, as well as the commands you ran for processing through deblur? Please include any intermediate commands, too. Thanks!

MMC_northS · March 13, 2018, 9:58am

Hello @thermokarst

I am trying similar things that @Marek_Koutny but I imported my data using --type 'SampleData[SequencesWithQuality]' and manifest_file.

i used that one because I do not have paired-end sequences so I assume that they do not pre-joined (nothing to join). In this case I can use dada2 for single-end option? or it is most adequate use deblur denoise?

Other question that I have is if it is good use deblur or I can use vsearch with open-reference-otu-picking directly after filtering by quality. Because I do not want trim my sequences and I can filter by number of reads or abundance later.

Thank you very much for your help in advance!
MMC

thermokarst · March 13, 2018, 9:43pm

Hi @MMC_northS!

If they are single end, but pre-joined from the ion torrent processing step, you will need to import them as SampleData[JoinedSequencesWithQuality], otherwise you can import them as SampleData[SequencesWithQuality] --- we have no way of know about what kind of data your reads represent, so this is something you will need to take care of understanding prior to importing.

You are good to go either direction. Just to clarify though, there is no requirement that you trim your reads, per se - its just that deblur expects your reads to be a consistent length. Depending on the sequencing technology used, you can get this without any trimming.

Hope that helps!

MMC_northS · March 14, 2018, 9:36am

Hi @thermokarst

Thanks for your answers!

I understand that my data are not pre-joined. I used one single-end kit for PGM and the sequencer, after quality sequencer filters, made the demultiplexing cutting adapters and barcodes and giving us one fastq file per sample. I understand from that processing that my sequences were not pre-joined, so I used SampleData[SequencesWithQuality]. Is ths right?

Regarding the deblur (or dada) analyses, I do not understand what do you mean with "consistent length"? I understand you are speaking about the trimming parameters in those analyses but I would prefer do not cut my sequences. In that case could I use the dereplication step (Vsearch plugin) and then directly open-reference otu picking?

Thank you so much for your clarifications!
MMC

thermokarst · March 15, 2018, 1:03am

Yes (please see my answer above for more details).

Some sequencing technologies/amplicons/primers/etc produce consistent length sequences (as in, all sequences are the same length, while some technologies/amplicons/primers/etc don't produce consistent length sequences. As I mentioned above - there is no strict requirement for DADA2 or deblur to trim your sequences ("there is no requirement that you trim your reads"), but often times people need to in order to process using those tools (because the nature of sequences are so variable, as I just mentioned). So, you don't have to trim, but those parameters are there for people that need it. I highly recommend you spend some more time reading about DADA2, deblur, and the QIIME 2 docs --- all of your questions have answers in those documents.

This is unrelated to trimming - you can use vsearch cluster-features-open-reference regardless of if you want to trim your sequences or not. I answered this above when I said "You are good to go either direction," my apologies if that wasn't clear.

Marek_Koutny · March 15, 2018, 7:19am

Hello @THERMOKARST and others. Sorry for not responding for some time. I have get somewhat further. Now I have my taxonomy assigned and I can explore my result. Great! The main issue I had to solve was that my data still contained barcodes (i was told that they were trimmed) I have found this by visualizing the sequences with jalview, it is a great tool.
I had to use greengene classifier, the both Silva classifiers provided produced errors, actually also Atacama data provided gave the same error with Silva.
Details:
(qiime2-2018.2) qiime2@qiime2core2018-2:~/AtacamaSoil$ qiime feature-classifier classify-sklearn \

--i-classifier silva-119-99-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza
Plugin error from feature-classifier:

Debug info has been saved to /tmp/qiime2-q2cli-err-va152nns.log
File content:
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 222, in bound_callable
spec.view_type, recorder)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 261, in _view
result = transformation(self._archiver.data_dir)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in transformation
new_view = transformer(view)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 72, in _1
pipeline = joblib.load(os.path.join(dirname, 'sklearn_pipeline.pkl'))
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
obj = unpickler.load()
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/pickle.py", line 1043, in load
dispatchkey[0]
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 341, in load_build
self.stack.append(array_wrapper.read(self))
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 184, in read
array = self.read_array(unpickler)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 130, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
MemoryError

Up to now exploring my results I am surprised and a bit disappointed that my results do not match my previous analysis. As my samples comes from the same locations I would expect that the general pattern of microorganisms will be identical. However it seems that there is no such pattern. I thing it must be some artifact of the analysis or data processing. Please do you have some explanation?