Hello everyone
I'm running Qiime2 for analysing my data from Ion s5.
but now I have a problem in dada2 step.
I tried this command.
qiime dada2 denoise-single
--i-demultiplexed-seqs demux.qza
--p-trim-left 0
--p-trunc-len 0
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
I did quality check and used fastq data including forward and reverse primer sequences.
quality checking with fastqc:
Distribution of sequence lengths over all sequences
sequence_length_distribution|666x500
also I was setting trim-left and trunc-len to 0. because I wanted to get whole length sequence that I used.
but when I checked the file rep-seqs.qza, there are quite many short reads that ended with repeat sequence ‘TTTT..’, ‘AAAA..’, ‘GGGG..’ or ‘CCCC..’ just like below.
TTACCAATTTTAGCGAGCCTGATCTTTTTT
when I checked fastq file with this command. (bolded 10bases are primer sequence btw)
awk '/^ATCAGACACGTTACCAATTTTAGCGAGCCTGATCTTTTTT/' sample.fastq
There are over 200 reads just like below.
ATCAGACACGTTACCAATTTTAGCGAGCCTGATCTTTTTTGATCATGGTCTCGCGAAAATCGTATTTAAAACCCCCACTCTTGTAATGAATCATTTTTTTTAGTGTATAAAAAAATTAAAAAAAGATACAACTTTCAACAATCGGATCTTCTGGCTCTCTGCATCGATGAAGAACGCAGC
I have no idea why I got many features that have short sequence.
Are there any other way to get whole length reads?
thank you in advance