Ion Torrent data denoise with dada2

Hi @Marek_Koutny, can you please provide the visualization from demux summarize, as well as the commands you ran for processing through deblur? Please include any intermediate commands, too. Thanks! :t_rex:

Hello @thermokarst

I am trying similar things that @Marek_Koutny but I imported my data using --type ‘SampleData[SequencesWithQuality]’ and manifest_file.

i used that one because I do not have paired-end sequences so I assume that they do not pre-joined (nothing to join). In this case I can use dada2 for single-end option? or it is most adequate use deblur denoise?

Other question that I have is if it is good use deblur or I can use vsearch with open-reference-otu-picking directly after filtering by quality. Because I do not want trim my sequences and I can filter by number of reads or abundance later.

Thank you very much for your help in advance!
MMC

Hi @MMC_northS!

If they are single end, but pre-joined from the ion torrent processing step, you will need to import them as SampleData[JoinedSequencesWithQuality], otherwise you can import them as SampleData[SequencesWithQuality] — we have no way of know about what kind of data your reads represent, so this is something you will need to take care of understanding prior to importing.

You are good to go either direction. Just to clarify though, there is no requirement that you trim your reads, per se - its just that deblur expects your reads to be a consistent length. Depending on the sequencing technology used, you can get this without any trimming.

Hope that helps! :t_rex:

Hi @thermokarst

Thanks for your answers!

I understand that my data are not pre-joined. I used one single-end kit for PGM and the sequencer, after quality sequencer filters, made the demultiplexing cutting adapters and barcodes and giving us one fastq file per sample. I understand from that processing that my sequences were not pre-joined, so I used SampleData[SequencesWithQuality]. Is ths right?

Regarding the deblur (or dada) analyses, I do not understand what do you mean with “consistent length”? I understand you are speaking about the trimming parameters in those analyses but I would prefer do not cut my sequences. In that case could I use the dereplication step (Vsearch plugin) and then directly open-reference otu picking?

Thank you so much for your clarifications!
MMC

Yes (please see my answer above for more details).

Some sequencing technologies/amplicons/primers/etc produce consistent length sequences (as in, all sequences are the same length, while some technologies/amplicons/primers/etc don’t produce consistent length sequences. As I mentioned above - there is no strict requirement for DADA2 or deblur to trim your sequences (“there is no requirement that you trim your reads”), but often times people need to in order to process using those tools (because the nature of sequences are so variable, as I just mentioned). So, you don’t have to trim, but those parameters are there for people that need it. I highly recommend you spend some more time reading about DADA2, deblur, and the QIIME 2 docs — all of your questions have answers in those documents.

This is unrelated to trimming - you can use vsearch cluster-features-open-reference regardless of if you want to trim your sequences or not. I answered this above when I said “You are good to go either direction,” my apologies if that wasn’t clear.

Hello @THERMOKARST and others. Sorry for not responding for some time. I have get somewhat further. Now I have my taxonomy assigned and I can explore my result. Great! The main issue I had to solve was that my data still contained barcodes (i was told that they were trimmed) I have found this by visualizing the sequences with jalview, it is a great tool.
I had to use greengene classifier, the both Silva classifiers provided produced errors, actually also Atacama data provided gave the same error with Silva.
Details:
(qiime2-2018.2) [email protected]:~/AtacamaSoil$ qiime feature-classifier classify-sklearn \

–i-classifier silva-119-99-nb-classifier.qza
–i-reads rep-seqs.qza
–o-classification taxonomy.qza
Plugin error from feature-classifier:

Debug info has been saved to /tmp/qiime2-q2cli-err-va152nns.log
File content:
Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py”, line 246, in call
results = action(**arguments)
File “”, line 2, in classify_sklearn
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 222, in bound_callable
spec.view_type, recorder)
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 261, in _view
result = transformation(self._archiver.data_dir)
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 59, in transformation
new_view = transformer(view)
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/_taxonomic_classifier.py”, line 72, in _1
pipeline = joblib.load(os.path.join(dirname, ‘sklearn_pipeline.pkl’))
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py”, line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py”, line 508, in _unpickle
obj = unpickler.load()
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/pickle.py”, line 1043, in load
dispatchkey[0]
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py”, line 341, in load_build
self.stack.append(array_wrapper.read(self))
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py”, line 184, in read
array = self.read_array(unpickler)
File “/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py”, line 130, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
MemoryError

Up to now exploring my results I am surprised and a bit disappointed that my results do not match my previous analysis. As my samples comes from the same locations I would expect that the general pattern of microorganisms will be identical. However it seems that there is no such pattern. I thing it must be some artifact of the analysis or data processing. Please do you have some explanation?

Hi @Marek_Koutny!

We can’t offer any hypotheses about any differences you might be experiencing or observing unless you provide us with more details about what you did previously and what you are doing now.

This error is because you do not have enough RAM (memory) available in your computing environment. Can you provide some details about your computing environment? Thanks.

Hello. I understand that you cannot know what is happening. We analysed the same DNA samples in DGGE and get a similar basic pattern of bands in every sample. DGGE should overrate most abundant sequences. I expected the same with NGS. This is my first experience with NGS.

For the error described, I run qiime2 in Virtual box. The error occurred with your suggested setting (2GB RAM). After your hint that low RAM could be the problem I increased RAM to 5GB (the maximum I can). The error occurred again with both SILVA classifiers provided.

If you need any further data to see the solution I will provide them immediately.

Marek

Hi @Marek_Koutny,
Are you using the same PCR primers for sequencing as you used for DGGE? I agree with you that the general patterns should be consistent between the two approaches, but one reason for a difference could be differences in the PCR primers. Each pair of primers will have biases for different microbial taxa, so you’ll get a different view of the community depending on which primers you’re using. Also, at what taxonomic level are you seeing a lot of differences? At the species level, a lot of differences between the approaches isn’t very surprising, as sequencing 16S doesn’t give very accurate species-level assignments (I think the same is true of DGGE, but I know less about DGGE) - this pre-print has some information on this. At the family level, for example, I’d expect to see a lot more similarity in the profiles derived from DGGE and 16S sequencing.

Regarding your memory error, unfortunately the only way around this will be to run the analysis on a system with more memory. I would recommend trying for 16GB (some discussion of this occurred in this topic).

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.