Core-metrics-results sequences vs demux sequences

Dear Admin

I have 12 samples and 4 treatment groups.
demux.qzv says I have 2291 (min 18 Max 930) sequences. When I do core-metrics-results analysis (alpha/beta) using 18 as sample depth. I get only 6 samples/3 treatments in emperor beta diversity PCoA plot or in alpha diversity plots.
why is sow?

Attachments are demux output and unweighted unifrac emperor

Demux output

I have 4 treatments

Hi @drmusk,
The issue is that you are looking at the number of sequences that you have prior to denoising, not the number of sequences in your feature table after denoising. Summarize your feature table to determine the actual number of sequences detected per sample after denoising, and select a new sequence depth based on those results.

If you still have similar issues, could you please post your table summary results and the exact commands that you are using for core-metrics-results to help us figure out what’s going on? If you do not mind sharing your data, posting your data files would help us work this out with the exact same data files (you can send these in a direct message if you want to share only with the admins)

Note that 18 is an extremely low sequence depth. You should check out alpha-rarefaction to figure out a good sequence depth for your samples. A depth in the hundreds or thousands would be more typical, and you may want to filter out samples with fewer than 100 sequences, rather than setting a sequence depth that retains all samples.

I hope that helps!

Dear Colleagues thanks for the reply.

About my above question, Your response was quite helpful.
But now I came across a more complex issue.
Since I told you that demux.qzv gave me 2291 (min 18 Max 930) sequences and after quality filtration I just get about 900 sequences and about 6 samples have zero sequences. Below I posted table for this information.

I contacted my NGS vendor and he told me that I have several thousands sequences in that analysis and also above average quality. So sequencing went perfect. Below is sequence count provided to me by NGS vendor

imageSample Sequence

OK
Now Let me show you my QIIME2 OUTput of demultiplexing
imageSample ID Sequence Count Quality Sequence Count

See here is a big difference between Sequence count of QIIME and that provided by vendor.

I repeated this analysis on QIIME1 to see if I get the same result. QIIME1 also gave me almost same kind of result as QIIME2 but I found an interesting thing in QIIME1 split_library_log file
Below I copy it
Quality filter results
Total number of input sequences: 1028678
Barcode not in mapping file: 1026387
Read too short after quality truncation: 26
Count of N characters exceeds limit: 0
Illumina quality digit = 0: 0
Barcode errors exceed max: 0

Result summary (after quality filtering)
Median sequence length: 245.00
HSB.11 922
HSB.10 352
HSB.4 342
HSB.8 253
HSB.1 165
HSB.2 50
HSB.3 49
HSB.5 40
HSB.6 30
HSB.12 24
HSB.9 21
HSB.7 17

Total number seqs written 2265

This shows that I lost my 90% samples because they do not have barcodes in the mapping file.

Well
My mapping file looks perfect
I am attaching its link here as well


Please suggest a way out.

Hi @drmusk,
Good detective work. I suspect the issue is not with your mapping file (which looks fine), but rather with the sequences. As you have a separate fastq file containing the barcodes, you should examine the frequency of each barcode in that file. The frequency of barcodes that match the barcodes in your mapping file should be equal.

The fault is probably not with the vendor, but the vendor does not actually know what the sample composition of the run is; and hence if you only have a few samples on a very large run (i.e., with many other samples), then the share of reads that match your samples will be much lower. That would be consistent with the result that you are seeing: there are many reads but only a few thousand belong to your samples, as indicated by the barcodes.

I hope that helps!

Dear Colleague thanks for the reply.

I believe that everything is right with the samples,mapping files and vendor’s multiplexing tech.
The thing wrong is with my command extract_barcodes.py.
extract_barcodes.py command works in QIIME1. Is it OK to ask question about it here?
If yes, may I proceed.
I am definitely using wrong parameters with for extract_barcodes.py command.
I have R1.fastq and R2.fastq and I need to have barcodes.fastq to run qiime demux emp-paired or in QIIME1 split_libraries_fastq.py command.
So I used extract_barcodes.py in QIIME1.
I had no skills for selection of --bc1_len and --bc2_len or other options for that command.
So every time I select a different barcode length I get a different answer.
My mapping file says that my barcodes length is 8.
I do not know how to read R1 and R2 files for barcode.
If I share with you screen view of R1, R2, and mapping file. Will you be able to construct that command for me.
Thank you for reading all. I know its worth value.

Hi @drmusk!

These types of questions would be better suited for the QIIME 1 Forum. Can you try asking over there? Thanks!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.