I want to use SourvceTracker2 to track my sequenced impaired stream water samples(pair-ended 250 bp), I also download data from EBI-ENA website which are in the same primer with which one I used. But the only difference is that downloaded data are from Earth Microbiome Project(cat feces, dog feces, human feces, etc.), which are single ended sequences(100~150 bp unequal).
I have used deblur trim the sequences to the same length(130bp/100bp) for all dataset, and produce the feature table, then SourceTracker was used in the analysis, however, I cannot detect 90% of sources. I am so confused.
Then I tried to match these sources to wastewater. Still, 80% sink microbiome is unknown, only <10% human feces are detected. I think it should be wrong. And I go back to check the feature table, only a few overlaps found.
Therefore, I am considering the situation of the QIIME2 deblur. It aims at finding the sequence variants. If myself sequenced reads are one base differs from the downloaded reads. Such as:
My A bacteria ASV: ATGCTGC
Downloaded A bacteria ASV: ATGCTG
These two sequences will be classified into two different ASVs, even they are from the same bacteria? That may cause SourceTracker cannot detect.
May I please have your help to solve this kind of problem? Different length of sequence data under the same primer will be classified into different ASV? Or is there any possibility if I used traditional OTU-classification methods?
Tons of thanks.