A lot of what we do involves writing code in some way. When you write code, especially in a group project but even when on your own, it is SUPER-useful to do some kind of source code management (SCM) -- basically, this means using a tool that keeps track of the various changes that different people are making to the code base. Even when only one person is writing the code, SCM helps you keep track of what you have changed since the last "stable" version of your code; anyone who has done a significant amount of programming will be aware of how crazy things can get when you lose track of what you have changed since the last time the code actually worked like it was supposed to.

Anyway, the SCM tool we mostly use is probably also the most common one nowadays, namely Git. What follows is a quickie guide to getting started with Git. It is a really powerful tool, which is a euphemism for "it can be really complicated to use", but 90% of the time you'll just be using a handful of commands. So here's the quick-start guide.

* For reference: If you need more detail than you get here, this is the official Git documentation. Be forewarned that it goes into more detail than you probably need right now.

* For more tutorial-type content: You might try this course on Codecademy. I (Matt) haven't personally tried it but most of Codecademy's stuff is pretty decent. (If you, dear reader, do try it, feel free to edit this article and add your thoughts/notes.)

* On to getting started. First thing to do is make sure you have Git. If you are using a Mac, which you should be if you are in this lab, you already have Git pre-installed. If you are using Linux, you probably have it too (or you are a big enough nerd that you already know how to install packages and can easily do so). If you are using Windows, may God have mercy on your soul. So... for right now we're going to assume you have it.

* Second thing: You don't have to use a website or service to host your code, but there are many benefits to doing so. You may have heard of sites like GitHub or Bitbucket that host repositories of code for you -- we tend to use Bitbucket. To use Bitbucket, you need an account. And then someone in the lab will need to invite you to be a member of whatever project you are working on. So, you can either sign up for a Bitbucket account and then send Matt/Rafay/whoever your username so they can add you, or just send Matt/Rafay/whoever the email address you plan to use to register on Bitbucket. (If you just send them your email address, the invitation email from Bitbucket will then let you create an account with that email address.) Either way, at the end of this step, you will have a Bitbucket account, and someone in the lab will have added you to the relevant project(s).

* Next: You need to get a copy of the code on your machine. If you go into the project's overview on Bitbucket (should have an URL like: https://bitbucket.org/MyBitbucketUserName/project_name), you should see a little box near the upper right containing a link that looks something like this: https://MyBitbucketUserName@bitbucket.org/SomeBitbucketUserName/project_name.git . That is the URL you can use to get your own copy of the repository. (A repository is basically a copy of the current version of the code, along with some invisible metadata containing the history of all previous changes made to the code.) So select the link and copy it to your clipboard.

* Now open your Terminal (assuming you are on a Mac). There are non-Terminal ways to use Git, but if you are doing enough programming to use Git, you should just be using the command line. Navigate to whatever directory you want to keep your copy of the code in. (For example, Matt keeps all his coding projects in ~/Desktop/dev but you can put it wherever your heart desires.) Then enter the following command: git clone https://MyBitbucketUserName@bitbucket.org/SomeBitbucketUserName/project_name.git where the link part is whatever you copied from Bitbucket. You will probably be asked for your Bitbucket password, and if everything goes according to plan, you will see some download progress information, and then at the end you'll have a new folder named project_name (for whatever the project is called) with the code in it.

* Now it's time to make some edits to the code. Let's assume you have edited a couple of the code files and are pretty happy with your changes. To see what has changed since the last stable version of the code, you can use the command git status (which needs to be executed from within the main project_name folder). That will show you anything that has changed (or any new files that have been created). If you enter git status before you have changed anything, Git will basically tell you that nothing has changed yet. (It will use slightly more complicated phrasing, but don't worry about the details yet.)

* Let's assume you have made changes to two code files, which we'll call codefile1.py and codefile2.py, and that you have gotten to a stopping point you are happy with. In Git terminology we are ready to commit your changes -- i.e., we are ready to make a commitment to these changes as part of our code base. Don't worry, this is not a lifelong commitment -- you can always roll back to a previous code version later with minimal effort. So just think of a commit as a good stopping point that you can easily describe, that might be a good point to return to in the future if things get all mucked up with some later code changes, and that represents a meaningful -- though not necessarily large -- modification relative to the previous code version. For example, fixing a bug -- even if you just have to make a small tweak to a single line of code in order to fix it -- would be a perfectly reasonable commit. But if you work on the code for an hour and make several small modifications or bug fixes in the process, that would also be a perfectly reasonable occasion to make a single commit. What you consider to be a good stopping point is ultimately subjective.

* So to do a commit, we first need to tell Git which changed (or newly created) files to include. You don't necessarily have to include all changes you have made to the project in a commit. For example, let's assume that in addition to codefile1.py and codefile2.py, you have also created a file called personal_notes.txt in the project directory. These are your own little thoughts about what you are doing, but they don't need to be shared with everyone else or tracked for posterity. So we only want to add the changes made to codefile1.py and codefile2.py to our commit -- not anything related to personal_notes.txt. To add a file to an upcoming commit, we use the command git add, followed by the name(s) of any file(s) we want to add to the commit. In this example, this means we'd enter (again, assuming we are in the main project directory) git add codefile1.py and git add codefile2.py

* Or, more succinctly, we could also do it all one one line: git add codefile1.py codefile2.py

* Note that we use git add regardless of whether the file we are "adding" to the project is a totally new file, or just a file that was already part of the project but has been changed in some way. Now if you do a git status command, you should get output indicating that the changes to codefile1.py and codefile2.py have been added to the upcoming commit (in Git terminology, they have been staged), but that the change to personal_notes.txt will not be included in the upcoming commit (in Git terminology, those changes are still unstaged).

* At this point, the commit has been planned (staged) but not actually made. To make the commit happen, we will do this:
git commit -m "This is a short text description of the changes I made"

* Don't forget the -m flag in that command above -- that tells Git that you will enter your note about what this commit represents on the command line. If you don't include the -m plus a description in quotes, git commit will by default open up a text file in your system's default command-line code editor for you to enter your change description into. And that default editor is probably something horrible and hard-to-use like vi.

* If all has gone to plan so far, your code changes in the current commit are now a permanent part of the code base! Well, sort of. You still haven't shared those changes with anyone else. (git commit will make the changes a permanent part of the code base on your local copy, but it doesn't communicate those changes back to the communal copy on Bitbucket. This is useful if you don't have network access -- you might make multiple commit-worthy modifications to the code on a 6-hour plane flight, but not be able to send those changes back to Bitbucket until you land. So you can queue up multiple commits to your own personal copy of the code if you want, and then send them to the server all at once.

* Anyway, whether it's a single commit or multiple commits, when you are online and ready to send your changes to the server, the command is easy: git push (you may be asked for your Bitbucket password again). This pushes a copy of your commits to the group repository on Bitbucket. (Note that you didn't have to enter your username or an URL or anything. This is because those were all auto-saved into the repository's invisible metadata when you did the initial git clone) Keep in mind that the only thing that gets pushed is what you have committed -- any changes that have not been part of a commit will remain on your local computer, but won't get sent to the cloud copy on Bitbucket.

* If you have good network access, by the way, you PROBABLY want to go ahead and do a git push every time you make a commit. Not necessarily always, but usually this is a good idea, because the longer you wait to push your commits, the greater the possibility is that someone else will modify the code themselves with changes that conflict with yours. (If this happens, it's not the end of the world -- in fact, it is precisely the sort of thing that software like Git was built to help resolve. But managing conflicting changes is outside the scope of this wiki article, and it's kind of a pain. So better to push frequently than infrequently.)

* Finally: If you already have your local copy of the code, but it may not be up-to-date with what is on the Bitbucket server (either someone else may have committed and pushed changes since the last time you did, or you committed and pushed changes from a different machine), you can pull the latest version of the repository from Bitbucket with the simple command git pull (again, like all Git commands, this assumes you are already in the project's directory in your Terminal window). If there is a newer version of the code available on Bitbucket, Git will grab it and update your local copy (you might also have to enter your Bitbucket password again). If you have files on your machine that are not in the Bitbucket copy, they will be untouched. If you have made changes on your local machine that differ from what you are trying to pull, then you have a conflict to resolve -- which is why it is usually best to commit and push any changes you care about relatively frequently, and not to pull in the middle of making a bunch of changes to your local copy of the code.

* So, in short, it is best to do a git pull when you are ready to start a new chunk of work on the project but aren't 100% sure that the copy you have is the latest version that has been committed to Bitbucket. (If you do a git pull when your copy is already up-to-date, Git will just tell you that, so there is no harm done by doing a pull when you don't need to.) And then, as long as everyone working on the project is remembering to commit and push regularly, and as long as you have done some minimal amount of communication to make sure that multiple people aren't working on the same chunk of code at exactly the same time, there should be few to no conflicts to resolve.

* Lastly: If you are setting up a new project (not just cloning an existing project from Bitbucket), there is a little more to do in the way of setup -- but that is outside the scope of this article for now. Most people reading this will probably just be cloning and editing existing projects.

rapwiki: UsingGit (last edited 2016-05-18 16:16:31 by MattJohnson)