Sequence quality control and feature table construction_DADA2 commands giving Error

bollergene · July 25, 2018, 6:42pm

I kindly need help as dada2 commands are giving me
'Error: --output-dir directory already exists, won't overwrite'

I used the below command to import my raw data
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /Users/Bollergene/Documents/Metagenomics_15072018/bolfatoyeyemi-manifest --output-path /Users/bollergene/qiime2-bolaji/paired-end-demux.qza --source-format PairedEndFastqManifestPhred33

I used the below commands to demultiplex my sequence
qiime demux summarize --i-data /Users/bollergene/qiime2-bolaji/paired-end-demux.qza --o-visualization demux.qzv

BUT the underlisted command gave me 'Error: --output-dir directory already exists, won't overwrite.'
qiime dada2 denoise-paired --i-demultiplexed-seqs /Users/bollergene/qiime2-bolaji/paired-end-demux.qza --o-table table.qza --o-representative-sequences rep-seqs.qza --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 300 --p-trunc-len-r 300 --verbose --output-dir /Users/bollergene/qiime2-bolaji

I have uploaded some of the output from the demultiplexed sequence.

Thank you in anticipation
Bolaji

Mehrbod_Estaki · July 25, 2018, 7:42pm

Hi @bollergene,

The key here is:

 ‘Error: --output-dir directory already exists, won’t overwrite.’

This is telling you that the output directory qiime2-bolaji already exists and qiime2 will not overwrite existing folders as to not accidentally delete important files. It looks as though qiime2-bolaji is the directory you're working in so simply add a new directory name at the end of your output path for example:
--output-dir /Users/bollergene/qiime2-bolaji/denoised
Then all the output files will be moved into denoised folder.

bollergene · July 25, 2018, 8:29pm

Thank you, I added a new path (/Users/bollergene/qiime2-bolaji/denoised) as you suggested but it still giving the same error message.

Mehrbod_Estaki · July 25, 2018, 8:52pm

Hi @bollergene,

Oops, sorry I just noticed that in your original command you've included both an output path and an output directory. I should have seen this before sorr!
The error message then is probably related to the existing files table.qza and rep-seqs.qza from previous attempts. Note that if you include an output-dir then you don't need to define individual output names for the table and rep-seqs. So you can either simply delete those 2 artifacts from previous attempts or don't include --o-table and --o-representative-sequences in your command and just include --output-dir as above.

p.s unrelated to this thread, but I have concerns regarding your truncating values based on the quality plots you've posted. You may want to search this forum on topics related to picking truncating values before you run your command.

bollergene · July 27, 2018, 6:25am

Dear Mehrbod

Thank you for being patient with me, the underlisted command finally worked me.

qiime dada2 denoise-paired --i-demultiplexed-seqs /Users/bollergene/qiime2-bolaji/paired-end-demux.qza --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 300 --p-trunc-len-r 300 --o-table table1.qza --o-representative-sequences rep-seqs1.qza --o-denoising-stats demonising-stats1.qza --verbose

Last three lines of the output were:
Saved FeatureTable[Frequency] to: table1.qza
Saved FeatureData[Sequence] to: rep-seqs1.qza
Saved SampleData[DADA2Stats] to: demonising-stats1.qza

I will like to request for suggestion on my truncating values based on the quality plots (as attached in the first post) because I realised that forward reads are clearly better quality compared to the reverse. I am thinking of using trunc-len-f 270 and trunc-len-r 230, that gives 500 total nts after truncation. This I think will safely overlap and get rid of the worst quality regions.
I will also adjust p-trim-left-f 60 and —p-trim-left-r 40

Kindly suggest better values if needs be

Warm regards
Bolaji

Mehrbod_Estaki · July 27, 2018, 6:46am

Great! Glad that worked out @bollergene.

The truncating values depend on a few things. What is the primer set you are using? For paired-end data such as yours, we need to first know the expected overlap region length based on your primer sets. A minimum of 20bp overlap (plus whatever natural variation in length we expect) is required for merging. Next you want to truncate before the quality scores in your plot start to dip low. For example, a good starting point is wherever the median score drops below 20.
If you leave those tails with poor quality as they are (such as in your case of truncate values of 300) then you will lose too many reads as they will be discarded. The good news is that your reads look pretty good. You won't need to do any trimming from 5' unless you are trying to remove primers/barcodes etc but we'll certainly want to truncate from the 3' a bit. Your proposed 270 and 230 values should be good as long as you have a minimum of 120bp overlap.

bollergene · July 27, 2018, 9:15am

Thank you Mehrbod

I sequenced variable V3 and V4 regions (create a single amplicon of approximately ~460 bp) of the 16S rRNA gene.
Illumina adapter overhang nucleotide sequences are added to the gene‐specific sequences. The full length primer sequences, using standard IUPAC nucleotide nomenclature, to follow the protocol targeting this region are:
16S Amplicon PCR Forward Primer = 5'
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
16S Amplicon PCR Reverse Primer = 5'
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

Regards
Bolaji

Mehrbod_Estaki · July 27, 2018, 9:48am

Perfect. Then your proposed truncating parameters should be ok. Have a go with those, this should hopefully give you more reads than your first run. Good luck!

system · August 27, 2018, 3:48pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.