Unfortunaltely in our sequencing experience library blank do contain sequences, sometimes it happened a huge amount. In my personal opinion this should suggest a contamination since for example with 40,000 sequences coming from the blank I am able to go and obtain classification.
The answer has been that in some cases (?) the environment, say the air, ...is not sterile and in respect with samples with very low diversity/ very poor microbiome (supposed sterile, ...?) .
In my personal opinion a blank of library preparation (obtained with all the reagents excluded the DNA) should not at all contain such an high number of sequences classifying to different taxa in the end
and should be used as a quality control, suggesting some contamination is present in the workflow.
The presence of bacterial DNA in reagent kit and as potential source of contamination is a factor to consider in the analysis. This article is the first description of the problem:
There are lots of other sources of potential contamination as well as cross contamination between samples in the same batch. So theory would state that you should have different blanks (extraction, librtary prep, sequencing) for each batch. Practicality/total cost will reduce the number of blanks used for an experiment, in the hope that what you see in the negative samples used reflect the whole experiment (often is a big assumption).
In my case, when I look at a negative control, I do like to see it with as low as possible sequence count (in the range of the 1000 possibly). I always perform its taxonomic assignment as well as a quick beta diversity analysis without normalisation to explore if the bacteria in the controls are close to the samples or separating from them as one would expects. If the negative samples do show lots of sequences but are somehow not related to your samples, you are could be happy to proceed on excluding them for the good analysis. Otherwise you may suspect that something odd is going on there. I am happy if the negative samples do not pass the rarefaction threshold and excluded by this from the alpha- beta- diversity analysis.
However, if you have negatives with 40000 sequences they could look as possible samples (what is the average sequence count per sample?), could be anything happen that the negative has been swapped with a real sample (e.g. during the demultiplexing step by swhitching them in the barcode-sample association list?).