What could do if NCBI-SRA data don't have information about region, primers, barcodes?

fellora · June 1, 2021, 1:33am

Hi Everyone,

I’m working in qiime2-2020.8 and I want to analize two raw data from NCBI project (PRJNA279124) hot springs metagenome (ID 279124) - BioProject - NCBI
However, this data don't have a paper reference and the information submitted is scarce.

So I don't know what region was amplified nor primer sequences. Even I don't know if the raw data have barcode and primer sequences inserted.

Is it possible to infer this information from the raw data. Maybe blasting or searching common sequences of primers in raw data?? any help please would be thank so much!!

Nicholas_Bokulich · June 1, 2021, 5:21am

Dear @fellora ,
I recommend trying to track down the authors of the data — this can be easier and a safer bet than trying to determine this info on your own.

Yes you could use BLAST to search for specific primer sequences, if they are still in the sequences. You could also BLAST against a full-length reference 16S sequence to figure out which region is targeted (if you already know that the data are 16S). But barcodes you cannot recover if the authors have not provided this information in, e.g., a sample metadata file. Barcodes might be in the header line but if sample metadata are not provided then there is not much that you can do with it...

Good luck!

system · July 2, 2021, 11:22am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.