Export issues from dada2 in R to qiime2, sample metadata lost

Hello, quick query!

I have been processing data from a PacBio SMRT CSS experiment in dada2 with the intention of importing this data into qiime2 for the taxonomic assignments and taxonomy based filtering.

I ran Dada2 separately in R according to a couple of guides (Primarily this guide, but also [this] (DADA2 + PacBio: Fecal Samples) one) as I read that the version of dada2 buiild into qiime2 will not handle the long read CSS data. I then exported the sequence table and rep-seqs using the guidance from Jaroslaw [here] (Exporting DADA2 (R package) results to work with qiime2) (thanks!).

Upon importing into qiime2 I tried to use feature-table summarise to check the import worked okay. I got an error saying the ID V1 was not present in the metadata. After looking at the files it appeared that the files no longer match my metadata, and that the sample information has been lost? When I ran feature-table tabulate-seqs the sequences have no FeatureID. is there something further up the pipeline I should have done to maintain this information?

Thank you for any help you can offer!

Qiime2 version 2021.4
dada2 version 1.18.0 ran through R version 4.0.3

Hi @Sam_Prudence - sorry no one has responded sooner, it looks like you recategorized to "Other Bioinformatics Tools," which is a category not monitored for support. I have moved this back over to Technical support, because this appears to be a QIIME 2 import issue. Someone will be with you soon, thanks!

1 Like

Thank you!

I can also provide an update for this. It appears that I lose the metadata as I am running the dada2 pipeline through R. The input file is a fastq file, I demultiplexed using lima so these files contain some metadata including the barcodes (see below). I then have a metadata tsv with these barcodes and the sample information.

@m54149_210619_221157/20579182/ccs bc=0,0 bl=CTCTTTCC bq=58 bt=C bx=8,1 cx=12 qe=1512 ql=~~~~~~~~ qs=8 qt=~

In the guide I am using (see previous post) there is an "import metadata" section, which imports the metadata table this however doesn't work, and when I check the datatables from after the dereplication step these have become simple tables with the read sequences respective counts. I went back to look at the previous file and it seems that the metadata is lost after the primer trimming step, so I am not sure if there is a step I am missing to include the metadata through the pipeline or to split the samples before I run the subsequent steps, or if there is an issue the the formatting? Any advice would be greatly appreciated!

Hey there @Sam_Prudence - QIIME 2 has a strong notion of metadata, check out the official docs, here:

https://docs.qiime2.org/2021.4/tutorials/metadata/

So in general, we expect your sample/feature metadata to be broken out into its own file - this allows you to compose the metadata with all kind of other output types, on the fly.

As far as stashing information in the FASTQ record ID, we don't have any methods in QIIME 2 for manipulating or utilizing that information - what's your goal here? Perhaps if we had a bit more information we could point you towards another solution.

Thanks!

:qiime2: