Shruti Turner.

What a GIT.

Data ScienceMachine Learning EngineerML EngineeringMachine LearningTutorial
Cover Image for What a GIT.

Screenshot from github repo by Linus Torvalds.

First off, what even is GIT? No, I'm not insulting anyone with British slang. Actually, this is the tool used to track code over time, helpful for collaborating with others. There are various speculations about what GIT stands for and how it got its name. They say on a good day it is Global Information Tracker, and any other less polite interpretations on a bad day. I will be right there in the queue to complain about my git problems, like many, but it doesn't mean that it isn't useful when it works.

There are a number of cloud based repositories (free and paid) that we can use to save our code using git methods. The most common one, from my experience, is GitHub, although there are others. These are great places to store your code so if your local machine fails you don't lose your code as well as the fact you can access your code from anywhere.

This is less of a "how to" guide of working with git - what I want to do in this post is outline the importance of using git for yourself and when working in teams. First off, I'll introduce a key feature of git...

Branching - a key git feature

When using the concept of branching, you add a safety net to the changes you make to your code. Before I go further, let's just outline what branching is using the image below. Let's say you start your project on your local machine and where you've got to works well, so you push your first code of your project to your git repo because we don't want to lose working code. That's represented by the blue blob on the left. As you can see it's on a blue line, this is representing your main branch. This branch is, like the name suggests, your main one. This is where your "final" code will saved, like your timeline of your polished code that you know works.

Blue line with circle at start and finish, with a curve purple line coming off the blue one with two circles in series before rejoining the blue line.
simple branching

But now you want to add a new feature to your working code, here is where you want to create a branch - this is represented by the purple line in the image coming off the blue one. When you create a branch from main (or any branch) you are copying what's in that main branch into your new feature branch so you can then work off the scripts and file structure that you've already set up. If we follow this purple branch along you reach a purple circle, this represents a commit to the branch..maybe you're stopping for the day or maybe you've done part 1 of the feature. Whatever the reason, you've committed the changes to the branch, maybe pushed the code to the online repo, to pick up later. Then when you have a chance you pick up the rest of the work for the feature and when you're happy with it you commit that too, represented by the second purple circle. Great, you've completed the feature. Congratulations!

Now we have two timelines, the blue main one which has been sitting unchanged whilst you've been working on the other, purple, one. We now need to synchronise our code so we are back to working on the one timeline. Here is where you would merge your working feature code back into the main code branch, represented by the blue blob on the right of the blue line.

That's it! If you've got to this point, you've got a handle on the basics of branching. You might be able to tell that this can get complicated, as more feature branches are created but for now this is enough of an explanation to get the gist to see how branching can be helpful.

If you're interested in more information on branching, the documentation is a handy place to start.

Let's talk about why it is useful.

For you personally

Tracking code over time is a key selling point of git - why do we even want to do this? Git removes the need for a folder containing a bunch of files named script_v1.py, script_v2.py, script_v3.py etc. to keep track of the changes or to make sure you have the old code just in case you want to go back to it. This is great for space saving and makes writing your code easier as you can stick to one file name in your scripts rather than having to remember to change them. You'll only see by default the latest version of your code, but it's easy enough with either your repository user interface or the command line to access previous versions of your code and compare differences.

Branching is a helpful safeguard for yourself so you don't accidentally mess up code that you already know works. Whether that code is live and in a production environment or not, it's handy for a project you're working on alone for yourself to use branches. You never know when something might go wrong, so whilst it might feel like extra work it's worth doing. (Bonus that it's great practice for if you start to collaborate on code)

For collaborating

Hosting your code in a repository becomes an obvious solution to all the emailing of scripts back and forth between people working on code, not knowing which version is the latest one and who has changed what. Let's leave that in the past and use the tools we have at our fingertips. With an online repo that uses git, you can easily see the latest version of your code and everyone has access to this. If we want to be use the jargon - this is version control. Knowing what changes have been made in each version of scripts, by whom and when.

Branching is also a helpful so that each team member can work on different features without disrupting the code that is known to be working. It is handy for trying new things but also working in parallel without getting in the way of your collaborators code. I find it's helpful if there is one person per branch in many simple cases, that way you don't get any unexpected issues, but if you're getting into more complex cases this might not be how things are. Don't forget though, git is only as useful as the team who use it make it. Git, as a tool, can and should be able to manage your communications for you. BUT this is only going to happen if your team is on the same page, so it's important to set up your ways of working together before getting your head into code.

Hopefully, this is an overview of some of the benefits of using git in your own personal projects and working with a team. I have tried not to deep dive into the technical aspects too much and keep a light touch approach on branching, which can get more complicated. For now I'll leave it there, but if there's appetite for a deep dive into git and branching let me know.

Share Now



More Stories

Cover Image for Getting Started With GIT
GITVersion ControlMLOpsMachine Learning EngineerTutorial

Git is a handy tool for keeping track of and sharing your code. I'll be doing a bit of a "How to" guide on getting stared with git and cloud repositories.

Cover Image for Machine Learning Engineer or Data Scientist?
Data ScientistMachine Learning EngineerML EngineeringMachine LearningData Science

It's not to say that one role is better than the other, nor that either is more skilled that the other. It's a matter of personal interest and company structure.