I was asked to conduct a project and make everything available on github. I've never used github or git. I have watched a few tutorials, but I don't understand what is the best way to save all my qiime code on git/github. I use WSL ubuntu and I write the code directly on the wsl terminal. I'd appreciate a lot if anyone could contribute with their ideas about which way is the best.
Can I keep using WSL the same way or should I run it on python with docker or something different?
Sorry about the confusion, I'm new to all of this.
Thank you all!
There's a lot of ways to use GitHub, so I'll offer one simple way to use it to share data and scripts that you may find helpful.
I'm not sure docker or python will help much, but adding iPython / Jupyter Notebook or a Markdown document for notes would be very helpful!
I use WSL ubuntu and I write the code directly on the wsl terminal.
This is a great place to start. The only issue is that everything you run is only recorded in that WSL terminal, so once you exit or restart that terminal session, all your commands are lost.
Install VS code, and open a WSL terminal in VS code (Terminal > New Terminal). This is the same WSL terminal you are used to, just running within VS code.
Add a keyboard shortcut so that you can press
Shift + Enter to run a line of text in your terminal. Then write your code in the README.md document and run it in the terminal using
Here's what that might look like (it's going to look a little different on Windows)
This lets you record all of your important commands in the README.md document, which you can then post on GitHub so people, including future you, can easily rerun your commands and replicate your results!
Why call this document README.md? On GitHub, documents called README.md are automatically displayed on each web page, so this is an easy way for people to read your code before they clone the full repo. (Here's an example README.md from my own work.)
P.S. One more thing:
You are not alone! Brian Kernighan, the guy who help develop UNIX and literally wrote the book on the C programming language recently said this:
I wish I understood git better,
but in spite of your help, I still don't have a proper
understanding, so this may take a while.
You are in good company. We are all learning together.
Hey Colin, thank you so much, that was very helpful!
Sorry about the deleted post - I had asked a question, but found the solution right after.
About the README.md, I had the impression it would be some kind of introduction or bio. Is it not the case or is it possible to have more than one?
If I start using github to save all my projects and save the markdown with names other than readme, what will happen to them? Will they still be available the same way?
If I just commit&push everything I do on the project, without writing the script as markdown, will it be available in github with all the steps that were made?
GitHub repos hold folders / directories just like your local computer. If there's a README.md file in a directory in a repo, then it's displayed when you open that folder on the website. You can put whatever you want into that README.md file.
You can have only one README.md file per directory (because you can't have two files with the same name in the same folder).
They will just be stored as files and people can still open them.
Yeah, everything saved as files can be pushed to GitHub. Once it's online, you can see how it looks and tidy it up later on, if you want to.
Well, almost everything: GitHub is not a good fit for raw data because it's too big. I would still put .fastq files on SRA or ENA, then store my downstream .qza and .qzv files on GitHub, along with notes on how I made them.
Take a look at this GitHub repo I published alongside a paper. It's got a lot of what you were asking about, including multiple README.md files, analysis scripts, .Rmd scripts, results and figures.
Here are a few examples of this that I've put together for papers:
Also of interest here is the new Provenance Replay functionality, and in particular the replay supplement. Putting a replay supplement on it's own on GitHub is a good start for ensuring reproducibility, and adding key data files that aren't too large (like feature tables, but probably not raw sequence data, like @colinbrislawn mentioned) and metadata will make it even better.