Obtaining sequences per biological sample

rnasrah · December 12, 2017, 5:11pm

Hello!

I am wondering if there is a way I can obtain a list of the sequences PER biological sample.

For example, rep-seqs and table.qzv as shown in the links below from the QIIME tutorials can give all the list of sequences but for all samples mixed together. Is there a way to get them per each biological (i.e.stool) sample per individual?

https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2017.11%2Fdata%2Ftutorials%2Fmoving-pictures%2Frep-seqs.qzv

https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2017.11%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftable.qzv

Thank you!!

Mehrbod_Estaki · December 12, 2017, 11:08pm

Hi @rnasrah,

I'm just a user and not on the developing team but here's my first attempt to contribute a little to this forum!

I recently asked about some additional sorting options regarding the rep-seqs visualization table, and it looks like it is something the team might develop in the future. See that post here. I thought it might be relevant to your question. If not...

Depending on what you are looking for exactly, here are a couple of options easily available:

If you are just looking for a quick visualization of your samples' content the taxa barplot output gives a nice visual where you can look at the composition of each of your samples at various levels, plus it has some nice additional sorting options. There's also download tabs on the upper-left hand side of this output that allows you to download a .csv version of the data. In the official moving pictures tutorials, this is the taxa-bar-plots.qzv barplot which comes following taxonomic assignment so if you wanted to look at the content prior to this step see option 2.
If you are looking for the true sequence variants (prior to taxonomic assignments) then you can retrieve that by exporting (or simply unzipping using your own preferred tool) the denoised feature table. In the moving pictures tutorial this would be the table-dada2.qza . Within it there is a .biom table under the data folder which will have the sequence variance x sample table. This will look like the former OTU tables except instead of the taxonomy or OTU# it will have the unique SV ID, ex: d1df10ad656760686c75a3884fa9fc2d
In this option you will have to find an appropriate way to read or convert the .biom file as its not your typical text file.

Hope this helps!

Nicholas_Bokulich · December 13, 2017, 2:28am

Hi @rnasrah,
Thanks for posting! I am not 100% sure what you are asking for, but @Mehrbod_Estaki might have provided your answer in point 2. Thanks @Mehrbod_Estaki for helping out!

The feature table is a matrix of sample X feature frequencies, so lists the abundance of each feature in each sample. I'm not sure if this is actually what you want, though, since you mention sequences specifically. (if @Mehrbod_Estaki is correct and this is what you want, follow his steps and then use biom convert -i feature-table.biom -o feature-table.tsv --to-tsv to convert to a text file that you can open in excel or a text reader).

Do you want a list of features (and corresponding sequences), similar to the rep-seqs file you posted, but for a single sample? In a future release this will be much more straightforward but for now you can get a FeatureData[Sequence] artifact containing sequences found in a single sample by following the steps in this post. That's a bit contrived and only contains sequences corresponding to a single sample, so I'm not sure if that's what you want to do (and it would be a laborious process if you want such a file for every sample you have).

If neither of these describe what you are looking for, could you please clarify with the following?:

Write out a toy example, e.g., the first couple rows/columns of what your dream file should contain.
Let us know what you want to do with this file. Even if QIIME2 does not yet output such a file, we may have an alternative approach if this file is an intermediate file in the approach you are attempting, or if it is input to another program.

I hope we can help!

rnasrah · December 13, 2017, 11:24pm

Hello guys,

Thanks so much for the replies. It solved what I was looking for.
I basically unzipped the folder as @Mehrbod_Estaki suggested and then used biom convert function as @Nicholas_Bokulich suggested to get what I was looking for.

Thanks once again for the fast replies!

Much appreciated,

Rima

rnasrah · December 18, 2017, 5:57pm

Hi again Nicholas,

Is there a way I can basically do the same thing (matrix of sample x feature frequencies) but obtain the actual sequence (instead of the name of the feature ID).

So just to recap, my dream file would look something like in the picture below (i.e. instead of the feature ID, if I can get the actual sequence).

Thanks so much once again,

Rima

Nicholas_Bokulich · December 18, 2017, 6:16pm

Hi @rnasrah,

Yes — but not in a direct way. You have two options:

Use tabulate-seqs to obtain a list of sequences that correspond to each feature ID. You cannot swap out the feature labels in QIIME2, but since it sounds like you are exporting your data, e.g., to analyze in R, there should be ways to re-map your labels in R using that file (or, if worse comes to worse, just sort the files and replace the labels manually).
When running dada2, use the --p-no-hashed-feature-ids parameter so that feature IDs are listed as the full sequence. This has disadvantages (e.g., longer labels will require more space, memory, may run into size limits for some software) but does precisely what you are looking for.

I hope that helps!

rnasrah · December 19, 2017, 5:45pm

Thanks so much!
Will try your suggestions.

Happy Holidays

thermokarst · December 22, 2017, 5:33pm

QIIME 2 2017.12 is now out, and it includes the ability to now optionally filter sequences using a feature-table (this is id-based filtering) - this significantly streamlines the steps @Nicholas_Bokulich illustrated above!