I thank you a lot for this discussion!
I am currently referring to human gut 16S microbiome data, even if I have also low biomass samples (urine).
According to the point of contaminating sequences, I checked the input files, that is joined fastq pairs and I did not find adapter contaminants using fastqc for the samples considered.
At the moment I could not use qiime2 for the immediate, even if I have already tried on other samples to set up the pipeline using vsearc instead of usearc, but not dada-deblur since I would like to reproduce the qiime1 pipeline as first goal, even with some differences.
For this reason maybe it is something related to the possibility to match the database:
- contamination from human sequences (?)
- something related to the OTU clustering and representative sequence picking
I report the parameters I used for OTU clustering
uclust --input slout_single_sample_q20/otus/rep_set.fna --id 0.9 --rev --maxaccepts 3 --allhits --libonly --lib /lustre1/ctgb-usr/local/miniconda3/envs/qiime1/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta --uc /lustre2/scratch/tmp/UclustConsensusTaxonAssigner_GCtxUz.uc
I copy also part of the parameters used, I do not want to be annoying, however, maybe you could see something important in the settings
parameter file values:
From the laboratory preparations could you suggest guidelines in order to avoid contaminantion?
For example is it important to excide the PCR band from the gel to optimize the specificity?
Is it possible to exclude for example human sequences (? mitochondrial derived??) before doing the clustering and taxonomy assignment?
Thank you very much,