Hey everyone,
I am currently working on amplicon sequencing data from the 16s V3-V4 region, using the 341-F and 806-R primers. Some information abouth sequencing lenght:
I received my files from the sequencing facility with the barcodes and primers, that I have trimmed using the plugin cutadapt
:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences 01_intermediate_files/META004_full.qza \
--p-cores 12 \
--p-front-f CCTAYGGGRBGCASCAG \
--p-front-r GGACTACNNGGGTATCTAAT \
--verbose \
--p-discard-untrimmed \
--o-trimmed-sequences 01_intermediate_files/META004_full_clean.qza > 01_intermediate_files/META004_cutadapt_log.txt
(saved the log file to a txt file to make it easier to go through).
META004_cutadapt_log.txt (69.1 KB)
This confirmed the presence of primers & barcodes. I then went and used the clean data:
Some of the reads still had more than 231F and 228R lengths, I decided to trim all the reads at the specific positions:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs 01_intermediate_files/META004_full_clean.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 231 \
--p-trunc-len-r 228 \
--verbose \
--o-table 01_intermediate_files/META004_full_clean_table.qza \
--o-representative-sequences 01_intermediate_files/META004_full_clean_RepSeqs.qza \
--o-denoising-stats 01_intermediate_files/META004_full_clean_DenoisingStats.qza
After this, I went and check the length distribution of the Features:
qiime feature-table tabulate-seqs \
--i-data 01_intermediate_files/META004_full_clean_RepSeqs.qza \
--o-visualization 02_visualizations/META004_full_clean_RepSeqs.qzv \
--verbose
Following that, I have some questions:
(1) Is it expected for different lengths in the ASV's? Considering the overlap may vary among reads, this might be expected, I would assume. In fact, I saw the overlapped reads had different sizes when I ran the trimmed reads through FLASH. However, length size might also be determinant for ASV's, meaning similar sequences in different ASVs because they have different sizes, which could be a relevant scenario in this context, so I don't know if this is not worrisome;
(2) The expected size of the amplicon is 466bp, 429 if we do not consider the primers. Is the fact that ASV's, on average, are 415 bp's long while the amplicon size is 429bp a reason to be concerned?
Sorry for the long post,
Thanks!
EDIT:
I decided to add reverse complements of the primers into the cutadapt
step:
singularity run --bind /ifs /Users/.singularity_containers/ADA/qiime2_amplicon_2024.2.sif \
qiime cutadapt trim-paired \
--i-demultiplexed-sequences full_clean/01_intermediate_files/META004_full.qza \
--p-cores 12 \
--p-front-f CCTAYGGGRBGCASCAG \
--p-front-r GGACTACNNGGGTATCTAAT \
--p-adapter-f ATTAGATACCCNNGTAGTCC \
--p-adapter-r CTGSAGCVYCCCRTAGG \
--verbose \
--p-discard-untrimmed \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--o-trimmed-sequences full_clean/01_intermediate_files/META004_full_clean_comp.qza \
> full_clean/01_intermediate_files/META004_comp_cutadapt_log.txt
Overall, I only regained ~40 sequences in total. The remaining profiles are rather similar.