Demultiplexed fasta files that have been trimmed

Hello,

I am currently analyzing old data using qiime2. The experiment the sequences came from was preformed a few years ago and was analyzed using RDP. With RDP we got the demultiplexed processed fasta files that have been trimmed of their barcodes. Unfortunately, we cannot use the original fastq files due to a lot of them being lost or broken (the files are empty).

I was wondering with my situation, what would be the best way to import and create a metadata table. The attempts I've made to import and use the metadata table have not worked due to it reading each sequence as if it was it's own sample. Below is a sample of how the header for the files look

SM_003_16S_ACS_pre_cn_bulk_pre_1101_16191_2317
GTGCCAGCCGCCGCGGTAATACGGAGGATCCAAGCGTTATCCGGAATCATTGGGGTTAAAGGGTCCGTAGGCGGCCCGATAAGTCAGTGGTGAAATCTCCCGGCTCAACCGGGAAATTGCCATTGATACTGTCGGGCTTGAATTATCAGGAAGTAACTAGAATATGTAGTGTAGCGGTGAAATGCTTAGAGATTACATGGAATACCAATTGCGAAGGCAGGTTACTACTGGTGGATTGACGCTGATGGACGAAAGCGTGGGTAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGGATACTAGCTGTTCGGTTGCAAGACTGAGTGGCTAAGCGAAAGTGCTAAGTATCCCACCTGGGGAGTACGCACGCAAGTTTGAAACTCAAATGAATTGACGG
SM_003_16S_ACS_pre_cn_bulk_pre_1101_6841_2711
GTGCCAGCAGCCGCGGTAATACGGAGGATCCAAGCGTTATCCGGAATCATTGGGTTTAAAGGGTCCGTAGGCGGTCTAGTAA

You are on the right track. Fixing the headers should allow these reads to be imported into Qiime2.

Here's more information about the format that works for Qiime2:
https://docs.qiime2.org/2023.9/tutorials/importing/#sequences-without-quality-information-i-e-fasta

Reformating fasta headers is a hard problem because all headers are different so you will have to build a custom reformatting script that works for you.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.