I am very new to bioinformatics and I am analysing datasets from differents studies. I am collecting these datasets from NCBI using SRA toolkit. My reads are paired-end and I have managed to split them, so I have the forward and reverse files.
I am going through the 'Importing files tutorial' and I am thinking I can just replicate the code line given under the demultiplexed Cassava data type. But, to be honest, I have no idea what to do.
any suggestions on how I can go about this?
Also, I have created my metadata table on excel, saving as a tsv document. I became a bit confused when reading the metadata tutorial about the validation, is this the line I will need to use/
I think your best bet for importing the paired end SRA reads is a manifest. Your names do not match the specific casava format, and so rather than trying to make that work, you'll find the manifest easier.
This is just a notice to let you know there's new functionality coming soon. It doesn't effect you now, but in the future, you can validate your metadata in the command line rather than having to use Keemei on google sheets - great for anyone who can't upload their metadata to google.
Another piece of (entirely unsolicited) advice: if. you replace the spaces () in your file paths with dashes (-) or underscores (_), it will make it easier to navigate on your linux system. Linux parses spaces as part of commands rather than a filepath. It wont affect the import, but it will make things harder later.