Merging seqs.fna from multiple projects Part 2

emescioglu · July 17, 2018, 6:03pm

Hi everyone,

I wasn't sure how to add to a closed topic, so I made a part 2.

My goal in Part 1 to use SourceTracker to identify likely sources of the organisms in my samples. To do so, I wanted to download data from different projects, and merge them with my data to have 1 frequency table that has all of my samples + samples from different environments I want to compare mine to.

I thought I solved this problem in "Merging seqs.fna from multiple projects" topic. However, Qiita was trimming the sequences too short (150 max for some projects) and a majority of the sequences in my own samples weren't being identified to a specific enough level to be compared with other data. As a result, I was getting a lot of 'unknowns' in my Sourcetracker analysis. Did I say that in a confusing manner? Probably.

I'm trying something different now so I just wanted to share it with people who might want to also use SourceTracker.

Step 1: Download split library results (seqs.fna) of each project separately from Qiita and import into Qiime

qiime tools import
--input-path seqs.fna
--output-path seqs.qza
--type 'SampleData[Sequences]'

Step 2: Dereplicate samples to 100% otus

qiime vsearch dereplicate-sequences
--i-sequences seqs.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences repseqs.qza

Step 3a: Filter tables to only have samples I want (some projects come with 1000+ samples)

qiime feature-table filter-samples
--i-table table.qza
--m-metadata-file SampleIwant.tsv
--o-filtered-table table_selected.qza

Step3b: Filter frequencies that don't show up in any samples (3a removed samples, which may mean some frequencies no longer show up in any samples)

qiime feature-table filter-features
--i-table table_selected.qza
--p-min-samples 1
--o-filtered-table table_selected2.qza

Step 4: Filter seqs to only have samples I want. I couldn't do this with a metadata input, but using the table in Step 3 works great

qiime feature-table filter-seqs
--i-data repseqs.qza
--i-table table_selected2.qza
--o-filtered-data selected_seqs.qza

Step 5: Assign taxonomy

qiime feature-classifier classify-sklearn
--i-classifier 16S_classifier.qza
--i-reads selected_seqs.qza
--o-classification taxonomy.qza

Repeat steps 1-5 for all datasets you are interested in

Step 6: Merge taxonomies
Step 7: Merge tables
Step 8: Format for SourceTracker

I'm still assigning taxonomy right now, and will try Step 6 and 7 soon! I will add commands for them when I'm done and update on how sourcetracker outputs look!

Esra

thermokarst · July 19, 2018, 2:16pm

Hey there @emescioglu - It doesn't look like there is a question being asked here. Just to confirm, are you intending for this to serve as something like a tutorial? If so, let us know, we can move it over to the Community Tutorials category for you!

emescioglu · July 19, 2018, 9:09pm

Ooo yes! That sounds more appropriate.

I had asked a question before and others helped me come to a solution that I thought worked, when in fact it didn't. So I just wanted to make sure other people could read this before going down the same rabbit hole I went down. I'm sorry for the confusion!

system · August 20, 2018, 3:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.