Aligned sequences and Rescript

Hello forum!

I’m trying to figure out how aligned sequences line with with rescript. Do I just download them from the SILVA 138 release and then… import and degap? Do I filter somehow? Sorry if this is super naive.

Thanks!
Justine

1 Like

Hello @jwdebelius!

In general, the degap-seqs method is in there to handle aligned sequences, so you are correct the workflow would be something like: import aligned seqs --> degap --> do whatever.

Unfortunately, that method only degaps aligned DNA sequences... although RESCRIPt defines an RNASequence type, we still need to add an AlignedRNASequence format

In the case of SILVA, you can just use the unaligned sequences — right? — in the case of SILVA you can also use the get-silva-data pipeline to reduce your blood pressure, which downloads and imports/formats the SILVA sequences and taxonomy.

See "getting SILVA data the easy way", by @SoilRotifer:

As far as I know, there are aligned and unaligned versions for the various SILVA releases. It would be neat to get more support for aligned sequences in there for other datasets and to support streamlined integration with plugins that use aligned reference sequences. Any interest in contributing a "good first issue" to RESCRIPt? :wink:

Not at all... maybe what you'd call a documentation hole. :hole:
Besides, I'm just glad RESCRIPt is getting the attention! :nerd_face:

1 Like

Thanks @Nicholas_Bokulich for the very detailed explanation.

I could have been clearer with my question… I missed the word “get” :woman_facepalming: I’m trying to figure out if I can get aligned sequences corresponding to my unaligned sequences from RESCRIPt on Silva 138, or if I should download them from Silva and then import them myself. Sorry for the lack fo clarity, I may need more sleep. :zzz:

Best,
Justine

2 Likes

Hi @jwdebelius, you could follow some parts from my older prototype code here. See Step 6. Then import into QIIME.

I hope to devote some time very soon to add a FeatureData[AlignedRNASequence] type. It’s a good as time as any to learn how to make a type. :slight_smile:

-Mike

2 Likes

@SoilRotifer,

Excellent! Thank you. I’ll play around with it and see if I can make an artifact to drop… somewhere. Because I cannot be the only person who wants/will want an aligned reference set of Silva sequences. At least I hope not :slight_smile: .

Let me know if you want help wtih types! I was surprised at how easy they were to… well, kind of fake.

Best,
Justine

2 Likes

I am very much needing this too. It just got pushed to the back burner to get other things out of the way. :slight_smile:

I am currently looking through the developer documentation, and the RESCRIPt repo, to see how to set this up.

Hi @jwdebelius & @Nicholas_Bokulich! :wave:

Ask, and you shall receive! Check out my PR here.

I was able to run the following:

qiime tools import \
  --input-path SILVA_138.1_SSURef_NR99_tax_silva_full_align_trunc.fasta  \
  --output-path silva-138-1-aln-rna.qza \
  --type 'FeatureData[AlignedRNASequence]'

and follow it up with:

qiime rescript degap-seqs \
  --i-aligned-sequences silva-138-1-aln-rna.qza \
  --o-degapped-sequences silva-138-1-degapped.qza 

Note: degap-seqs will still be returned as FeatureData[Sequence]

-Cheers!
-Mike

3 Likes