V4 region cutadapt

Good evening to all,

I'm using cutadapt on 4 datasets to trim from v3-v4 regions, and keep only V4

All the data sets are in single-end format using flash.

From the 4 datasets, the 3 are normally trimmed based on the primer GTGYCAGCMGCCGCCGCGGTAA (515)

However, when I used the primer again, 1 data set, did not find it and every time I used the 5end it returned only 5, 10, 3 sequenced that have this primer and so on. I used different parameters and again no luck.

But from a paper doi: 10.1128/mSphere.01202-20, I found this primer GACTACHVGGGTATCTAATCC, which is the reverse primer when sequencing v3-v4.

And normally in my 5 end it recognized the primer and returned sequences that it has recognized 112365 times e.g. in each sample. S o it works!!!

Is it correct to assume that the sequences I have now are from the V4 region?

Thank you all for your help!!

1 Like

Hello @iordanis,

Just to be clear:

  • you have four sets of paired end reads, each of which has been merged and now you have four sets of merged reads
  • you are using a single primer that matches in the middle of the sequence to extract the v4 region and you want to keep everything downstream (i.e. towards the 3' end)
  • this is working as you expected for three of the sets but is failing for the fourth

Is this all correct?

Were these four sets of reads generated using precisely the same primer sets? Were they merged using precisely the same parameters to the same merging algorithm/software?

4 Likes

Hello @colinvwood,

Yes, i have 4 datasets, all datasets have v_region v3_v4 paired end reads, and all of them have contract with different protocols, different sequence technology like Miseq, Hiseq and so on.

All dataset merged with flash separately.

Then i use this primer GTGYCAGCMGCCGCCGCGGTAA (515), to keep the V4 region for all datasets. And it is work for 3 datasets.

But for one dataset, this primer don't recognized in 5 end of sequence.

Specially this dataset generated with:

The V3-V4 hypervariable region of the bacterial 16S ribosomal RNA (rRNA) gene was amplified from the DNA samples with the barcoded forward primers 341F (50 -CCTACGGGNBGCASCAG-30 ) and the reverse primers806R (50 -GGACTACNVGGGTWTCTAAT-30 ) using KAPA
HIFI HotStart ReadyMix (KAPA Biosystems, United States).

So when i use a reverse primer for v4, recognize it at 5 end of sequence. So i dont know what happen.

I hope to be understood!!
Jordan

1 Like

Hello @iordanis,

Can you share the command(s) you're running to perform the trimming on these datasets? Can you also attach the demux visualizations for each?

Hello @colinvwood,

This is the bash command that i use for all fastq files with cutadapt.

input_directory="/home/path/fastq"

output_directory="/home/path/cutadapt"

reverse="ATTAGAWACCCBNGTAGTCC"
forward="GTGCCAGCMGCCGCGGTAA"

for fastq in "input_directory"/*.fastq; do cut=(basename "$fastq")
output_file="$output_directory/$cut"
cutadapt -a "$reverse" -g "$forward" -o "$output_file" "$fastq"
done

And this is a demux.qzv from one of those dasasets after i had use the cutadapt outside of qiime2

demux.qzv (297.0 KB)

Summary

This text will be hidden

Jordan

Hello @iordanis,

I would recommend using qiime2 to perform the trimming, merging, and demultiplexing, this is the only way for us to really be able to provide useful support--by seeing the provenance (history) of the things you've done with your data, and looking for clues that could explain the results you're seeing.

2 Likes

Hello @colinvwood,

I use qiime2 for denoise method and next step.

What your recommend me, for merging inside in qiime2?

Hello @iordanis,

I would recommend performing all initial processing steps in qiime2--everything after getting your raw sequencing data.

Hello @colinvwood,

I import the raw paired sequence into the qiime2 2023.5 like you say.

Those are the commands.

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path "fastq/manifestfile/manifest.tsv"
--output-path demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

and those are the qza files

demux.qzv (311.8 KB)

So know the only think i can think is to keep only the reverse read s

like to trim all forward reads with dada2.

Do you have another idea?

Jordan

Hello @iordanis,

Given that your goal is to eventually extract the V4 region from all of your amplicons I think you should probably:

  • run dada2 on each dataset separately
  • use feature-classifier extract-reads to extract your region of interest
  • use vsearch cluster-features-de-novo with a --p-perc-identity of 1 to update your feature table with the newly extracted sequences

I'm a little confused about the demux visualizations you've uploaded--you should have four of these, correct? Have you been uploading the problematic one only?

1 Like

Hello @colinvwood,

This is the dataset demux.qza that dose not work. which i mean i cannot find the primer in side the sequences.

All datasets i have import separately, into qiime2 and i denoised them separatly on 240 base pairs!

For scope of my analysis, after those steps i had merged the datasets the 3 datasets that works, and the i use classifier silva v4 for taxonomy.

So how i must move for this dataset that i cannot find the v4 primer?

Jordan

Hello @iordanis,

I would begin by running dada2 denoise-paired on this demux (and all others), and then looking at the dada2 stats output as a first troubleshooting step.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.