Pyrosequence work in Qiime2 (Barcode)

Hello, experts.

I apologize for missing the reply to my last question. Sorry about that. Here is the link for reference: Pyrosequence work in Qiime2.

As mentioned in my earlier question, "I followed this amazing tutorial (analyzing 454 data in QIIME 2), and while applying it to my data, I encountered a problem with cutadapt."

qiime cutadapt demux-single
--i-seqs seqs.qza
--m-barcodes-file mappingfile.txt
--m-barcodes-column BarcodeSequence
--o-per-sample-sequences demux.qza
--o-untrimmed-sequences unassigned.qza

Plugin error from cutadapt:

** The following samples have duplicate barcode: 504.3**

Debug info has been saved to /var/folders/bd/kdj7l_yx1zvf2hfp96yc8s4w0000gn/T/qiime2-q2cli-err-5b775c6e.log

When we ran the 454 pyrosequencing, the same barcode was used for all runs.

So, when we used QIIME 1, we used the mapping file named as 'mapping_file_qiime1.txt'.

I reviewed the tutorial dataset (analyzing 454 data in QIIME 2), and noticed that all the sample barcodes were different.

Therefore, my question is: How do I deal with a 454 pyrosequencing dataset in QIIME 2 for Cutadapt, where all samples have the same barcode?

Many thanks

mapping_file_qiime1.txt (216 Bytes)
mapping_file.txt (149 Bytes)
Fasting_Map.txt (982 Bytes)

If the same set of barcodes were used across multiple runs, you can demultiplex each run by itself, then merge all runs together.

It looks like ACACT was used for both samples in this data set.

I have a few questions:

  • how many samples does your full cohort have?
  • Are samples 504.2 and 504.3 in the same file or two seperate files?
  • Were these barcodes used on the same 454 run of different runs?

Thank you for helping me understand your experiment!

Thank you for reply @colinbrislawn !
Okay, this is just practice set I did for 2 samples.
I have 48 samples, and all samples are have each fasta and quality files.
I had merged using cat function all together.
"Were these barcodes used on the same 454 run of different runs?" for this many years ago, I'm not sure as I wasn't in charge at the time, but I think it may have been around the same run.

Many thanks

Perfect! Two samples is good for testing.

Good. You can combine fasta and qual files into fastq files. Then you import those with Fastq manifest format:
https://docs.qiime2.org/2024.2/tutorials/importing/#fastq-manifest-formats
(Note, we are separating samples by file. So the barcodes don't matter.)

If all samples have the same barcode and all samples have been combined, it's too late; You can't tell them apart anymore!

1 Like

Many thanks @colinbrislawn !

In my understanding,

  1. use 'convert_fastaqual_fastq.py' to create a FASTQ file.
  2. Then, import it using 'SingleEndFastqManifestPhred33V2'.
  3. Next, utilize Cutadapt to remove the selected primer information.
  4. Afterwards, proceed with 'dada2-denoise-single' for denoising.

Is that correct?

Thank you for your support !

Yes! Here are some things to consider:

For step 1: Qiime 1 is depreciated, so I don't recommend using it (even if it works!)

Consider: fastaq fasta_to_fastq from https://github.com/sanger-pathogens/Fastaq
Consider: reformat.sh in=reads.fa qfin=reads.qual out=reads.fq from bbtools Reformat Guide - DOE Joint Genome Institute

For step 4, use qiime dada2 denoise-pyro for 454 data (denoise-single is for Illumina).
https://docs.qiime2.org/2024.2/plugins/available/dada2/denoise-pyro/

As you suggested, I used fastaq.
I have another question regarding the pyrosequencing analysis of bacteria, archaea, and protozoa primers using the 454 Titanium FLX system (Roche).
If I initially use cutadapt to remove the bacteria primers, should the output files then serve as the input for cutadapt with the next set of primers?

For example, I would first remove the bacteria primers and then proceed to remove the archaea primers using the following codes.

qiime cutadapt trim-single
--i-demultiplexed-sequences single-end-demux.qza
--p-front GAGTTTGATCMTGGCTCAG
--p-no-match-read-wildcards
--p-discard-untrimmed
--p-match-adapter-wildcards
--o-trimmed-sequences primer_trimmed.qza
--p-cores 32
--verbose

qiime cutadapt trim-single
--i-demultiplexed-sequences primer_trimmed.qza
--p-front AGGAATTGGCGGGGGAGCAC
--p-no-match-read-wildcards
--p-discard-untrimmed
--p-match-adapter-wildcards
--o-trimmed-sequences 2_primer_trimmed.qza
--p-cores 32
--verbose

Or should I do 3 different runs (bacteria, archaea, and protozoa)?

Yes, you can run cutadapt multiple times in a row!

And one more question,
I also needed to remove Barcode and LinkerPrimerSequence using Cutadapt?

Many thanks @colinbrislawn !!

This depends on how the sequencing was done.

You can try running Cutadapt and look at the logs to see if primers were found.

1 Like

I successfully processed 454 pyrosequencing data following your instructions!
Thanks @colinbrislawn !

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.