Losing Rows of Data Upon Importation of Data

Hello!

I am new to QIIME 2 and was hoping to get some aid in importing my data. I am running QIIME 2 using a virtual box, and my data is in the form of a forward read, backwards read, barcode file (all fastq) and a metadata csv file. I have run through the moving pictures tutorial without any issues, and am attempting to apply it to my own data, but upon viewing my demux files, it seems that most of my data is gone.

My dataset of over 160 samples returned the following upon importing and demuxing my data: demux.qzv (284.9 KB)

I have attempted both single and EMP paired methods of importing data, with no differences in the outcome.

My fastq files are unfortunately so large that I am unable to open them in virtual box to look at the formatting, but any advice on coding or changes that I can make to my files would be greatly appreciated.

Thank you!
Sirtaj Bir Singh

Hey @Sirtaj-Singh,

I took a peek at the provenance of your visualization (thanks for providing!) and your barcode map looks good, so I suspect one of two things are happening:

Either you need to reverse complement your barcodes in emp-single (with rev-comp-barcodes or rev-comp-mapping-barcodes) or there is enough sequencing error that the barcodes can’t be mapped. We don’t currently support error correction for the barcodes yet (we have an open issue to fix), but this seems unlikely to be the problem as I would expect still more than 9 samples to get correctly mapped even with messy data.

I would try complementing your barcodes and seeing if that gives you the expected number of samples.

Hope that helps!

2 Likes

Thank you for the reply!

I just ran the code using the rev-comp-mapping-barcodes parameter during demux, and it worked! I just have a few quick questions about the suggestion:

  1. Why did you recommend emp-single rather than emp-paired?
  2. Is there a major difference in using rev-comp-barcodes versus rev-comp-mapping-barcodes? Are both ever used together?

Thanks again for the responses!

Of course!

I just looked at the provenance of your artifact and it had used emp-single before, so I just assumed you had single-end data. If you have paired-end data, you can use emp-paired just fine as well (they are basically identical except for single/paired end support).

The distinction between them becomes more interesting with barcode correction strategies, but we don't support any at the moment, so there's not a practical difference right now.

Conceptually they could be used together. For example, suppose you had Golay barcodes in reverse complemented form in both your mapping file and your sequenced output. You'd reverse complement the mapping file so that they are in the right orientation for Golay, and you'd still need to reverse complement your sequenced output as it's now the opposite orientation.

There may be a clearer way to handle theses situations so this could change in the future, but to my understanding, that's why there's two different parameters in the first place.

4 Likes

This was extremely helpful, thank you!

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.