Hello everyone,
I have some conceptual doubts about what correct parameters I should choose to optimize a classifier. Unfortunately my paired-end analysis did not work. This is why I decided to try the R1 reads or the R2 reads. We amplified and sequenced the V3-V4 region, so I assumed that the R2 reads represent the V4 region, which is widely used in metagenomics.
This are my questions
- *The V3 region is represented by the R1 reads (forward) in the amplification of the V3-V4 region?. And vice versa, the V4 region is represented by the reads R2?.
- If for example when cleaning my reads R2
qiime dada2 denoise-single \
--i-demultiplexed-seqs ./readsR2_single.qza \
--p-trunc-len 200 \
--o-table ./dada2_table200.qza \
--o-representative-sequences ./dada2_rep_set200.qza \
--o-denoising-stats ./dada2_stats200.qza \
--p-n-threads 20
2) Am I cutting in the 5' or 3' direction of the R2 sequence?
Finally this leads me to want to create a custom classifier. But the question arises:
3) Where should I cut the reference sequences? (right or left of the sequence?) [Whereas query sequences are reads R2].
4) Is it advisable to truncate the reference sequence to the size of the query sequences?
- For example I should have a quality sequence of 300 bp, however the good quality only goes up to 200 bp (in R2 reads). Consequently, could I conclude that these 100 bp of poor quality would correspond to the first part of the V4 region (approximately 515-615F)? and therefore I should cut the reference sequences at the beginning, that is, cut to the left about 100 bp?
thanks in advance.
Benja