<!--- Don't be confused; the below text is Markdown. --> # Git, Github, and You! Scott Walls, 2012-09-11 (revised 2014 by Peter Chen) ##### *Abstract* This document is a partial introduction to Git and Github in the context of ENGR 100. This is its only purpose, and I do not recommend it as a full introduction to Git, especially outside of the context of this class. ## Introduction The purpose of this document is to help you, a bright-eyed ENGR 100 student, understand the framework we have in place for you to work on your labs. This framework has two main components: Git and Github. First, I'll talk a little about Git in a general sense. Then, I'll talk about Github, particularly in relation to the project infrastructure. Finally, I'll give a small list of Git commands useful for the project. Those of you who are already familiar with Git and Github can feel free to skim. ## Git - What?! Why?! Git is a *distributed Version Control System*, or VCS. Let's break that definition down a bit. Better yet, let's do it backwards! A *Version Control System* is software that tracks changes to files over time. Examples include CVS, SVN, and Perforce. Using a VCS is a very good idea, since it allows you to easily collaborate with other people, provides some help in the case of computer failure, and allows you to do nice things such as reverting and branching code. Now, let's talk about what it means to be a *distributed* VCS. In the old days, the days of CVS and SVN, version control looked a little like this: ![Subversion repository topology][subversion] with every programmer having a *working copy*, which is a folder containing the copy of their code that they were actually working on. They would make their changes, then send them to the central repository. They would also get changes that other users had made from the central repository. There are some problems with this centralized approach. 1. If the central repository goes down, you can't commit code to the repository, nor can you pull code from the repository. 2. If the central repository explodes, you have lost the record of your code. Any previous versions are lost. So, rather than centralize, Git distributes! In Git, every user has their own copy of the entire repository (including its entire history). This solves the aforementioned problems. It has the added disadvantage of high disk usage, but disks are cheap these days. This local repository means that we no longer lose all of our data when the central repository goes down, but it also means that we are no longer necessarily tied to a centralized topology. Sometimes, Git projects look more like this: ![Git, decentralized topology][git-topology-decentralized] with individual users pushing directly to one another. This is distributed to its logical extreme. It is possible to manage your project in this entirely distributed fashion in this class, but I will mostly cover using Git a lot like SVN, but with local repositories, which looks something like this: ![Git, centralized topology][git-topology-centralized] This begs one glaring question: where will the central repository live? ## Enter Github (Way of Github? The Game of Github?) Github is a great site that hosts off-site repositories for Git projects. It gives free public repositories to open source projects, and it has been nice enough to give us an `engr100` organization where we can host private repositories. We will make a repository for each student to learn git, and a repository for each group to use for project work. In order to access these repositories, you need to: 1. Create a Github username and set up SSH keys, etc.. Github has great [how-tos][github-ssh-keys] on this, and I've been told their GUI installer does it for you beautifully, although I haven't tried it. 2. Go to the ENGR 100 website and register your Github username with the "Register your github username" link. This will create a test repository for you to play with, called engr100/`YOUR_UNIQNAME` (where `YOUR_UNIQNAME` stands for your uniqname). I found it helpful to save the link to this repository. The instructions below assume you're working with this test repository. If you're working with the project repository for your group, replace `YOUR_UNIQNAME` with a sorted, dot-separated list of your group members' uniqnames. Now you're ready to start learning some Git commands. ## A Small Set of Git Commands Here I'll talk about a few Git commands that I find integral to working with Git and Github *with relation to ENGR 100*. This is not a comprehensive list of Git commands. See the section "Conclusions and Further Reading" for places that may have such a list. Also, feel free to share any Git commands that you like with fellow students. The first thing you'll want to do is get a working copy of your test repository set up on your machine, i.e. getting what's already set up from Github to you. This is is called cloning the repository. git clone git@github.com:engr100/YOUR_UNIQNAME The `clone` command sets up a working copy and repository on your computer that have the same files as the repository on Github. It also sets up a remote repository with the name `origin`, which will come into play in just a little bit. You should now have a folder called `YOUR_UNIQNAME` which holds your working copy and repository. Feel free to play around with it. This repository's sole purpose in life is to get you up to speed with Git. We will talk more later about clone and other commands for working with remote repositories. So now, you work on your code and make a few files. You'd like to save some of your work. The next four commands are intimately related, so I'll talk about them all at once. The most important piece here is `commit`. That means taking the code and putting it into your local repository. Here's where Git is a bit different from most other Version Control Systems. In Git, `commit` takes things from your *staging area* and puts them into the local repository. The staging area is Git's name for the set of files that will make it into your next commit. For a file, the commit is the big time, so "staging area" and its limelight connotations are appropriate! The staging area and how these commands interact with it are shown here: ![Local git commands and the staging area][local-git-commands] There can be files in your Git directory that Git doesn't pay much attention to. These files are called *untracked*. Git will not commit untracked files! In order to have git track the files, you have to first `add` them to the local repository, which will also put them in the staging area. You can use the `rm` command to simultaneously remove them from the working copy and delete them. As previously stated, `commit` only deals with files in your *staging area*. To move a file into your staging area, you use the `add` command again. For those of you used to SVN, for example, the `-a` flag may be useful; it adds changes to all tracked files to the staging area before committing. Finally, `git status` will show you the status of your files, viz. whether they have been modified, are in the staging area, or are untracked. git add FILE git rm FILE git commit git commit -a git commit -m "commit message" git status So you've made some changes to your project. Great. What about your teammate? She's been coding away, too! Perhaps you two should share your changes. In order to do this, you'll need to `push` and `pull`. Say you'd like to share your changes with your teammate. First, you commit to your local repository, as before. Then, you `push` your local repository to the remote repository (Github). Once everything's in github, your partner can `pull`. One snag in this process is that the repository on Github is initially empty. When I say empty, I mean *empty* - no files, no starting point, *nothing.* The first time that you push commits to a new Github repository, you must explicitly tell git what to push: git push origin -u master When you ran `git clone`, git set up a reference to a remote repository called `origin` which points to the original github repository. Your default branch is called `master`, so this command pushes the local `master` branch to the remote repository named by `origin`. Since `master` doesn't exist there yet, this command will create it. The `-u` flag links these local and remote branches together, so that in the future, you can just type `git push` or `git pull` without the extra verbiage. Besides `git clone`, there is another way to create a git repository, which you may have stumbled upon in your own reading, or which you may be more familiar with from SVN: git init This command creates an empty repository in the current working folder. If you've done this, made some changes and commits, and now you want to push them to github, this is no problem. You simply first tell git where you're going to push to, and then you push: git remote add origin git@github.com:engr100/YOUR_UNIQNAME git push origin -u master `git remote add` defines a new *remote* (reference to a remote repository) called `origin`, pointing at a github repository. The second command tells git to push the local `master` branch to the `origin` remote repository, just as above. Here is quick visual representation of these remote commands: ![Git commands for working with remote repositories][git-push-pull] You may want to be very sure that the files that are on the github repository are the ones you assumed. I would personally recommend using the github web interface for this, although there are many other ways to do it. That's most of what you'll use on the group projects. ## Conflict resolution; every group goes through it While we hope that you find it easy to work together harmoniously with your group members, it is quite common that *conflicts* will arise in your code. This happens in any version control system when two people make changes to the same part of the code. Maybe you were cleaning up some spacing or fixing a typo while your compatriot was adding something new. You may notice this first when trying to push: $ git push To git@github.com:engr100/YOUR_UNIQNAME ! [rejected] master -> master (non-fast-forward) along with a helpful hint that you need to `pull` first before you can `push`. So you go to pull from github and see something like this: remote: Counting objects: 5, done. remote: Total 3 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (3/3), done. From git@github.com:engr100/YOUR_UNIQNAME 642816d..028663c master -> origin/master Auto-merging file.txt CONFLICT (content): Merge conflict in file.txt Automatic merge failed; fix conflicts and then commit the result. When you go open `file.txt` to see what on earth happened, you see this mess: <<<<<<< HEAD Text text and still more text. ======= Text text and more text. >>>>>>> 028663c8fcfa3d08ff77fe60d48b4ce34a8db6c4 This shows you both versions of the file, and the reason for the conflict (how the versions differ). The first part before the `========` line is the local version, and the part below that line is the remote version. (That gobbledygook at the end is a checksum of the remote content, which also identifies that commit. Don't worry about it.) Git has helpfully added those extra markers in your local copy to show you precisely where the conflict occurred. To fix the conflict, you need only edit the conflicting file(s), `add` them (which marks their conflicts as resolved), and then `commit`. ## Branches; or, breaking the code without breaking the code One of git's most useful features is the ability to easily create and work with *branches*. A *branch* is simply a sequence of revisions and a name to identify them. You've already been working with one branch in git, called `master`. If you have no other branches in a git repository, you always at least have `master`. Suppose that you've just finished a major part of the project. All your own tests are passing, and you're ready to tackle the next component. This is a great time to create a branch, because the `master` branch is currently in a good state. Though perhaps not everything is implemented, the code that *is* finished does its job right. Having experimental changes on a separate branch allows you to revert back to a stable version at any time, and it also allows you to easily see the changes you've made since you started working on the new components. Often, developers will create a local branch just to try out some new approach. If it turns out not to work, they can just delete the branch and pretend it never happened. The `branch` command with an argument will make a new branch with that name. It will NOT move you into that branch. For that, use the `checkout` command. git branch NEWBRANCHNAME git checkout NEWBRANCHNAME Or equivalently: git checkout -b NEWBRANCHNAME The branch command with no arguments will show you a list of all the branches on your local repository: git branch If you're using branches - e.g. to work on some experimental revision to your project - you may want to share those branches with your teammates. You've actually already done this earlier in this tutorial, via `git push` and `git pull`. The only trick here is to remember that, if you created a branch in your local repository, github doesn't know about that branch, and git doesn't know where to push it. Just as before, you do this simply by specifying the *remote* and the *branch* explicitly when you push: git push origin -u NEWBRANCHNAME If your teammate has pushed a new branch to github, you may notice its existence the next time you pull: $ git pull ... * [new branch] NEWBRANCHNAME -> origin/NEWBRANCHNAME $ Just like any other branch, you can now switch to this branch with `checkout`: git checkout NEWBRANCHNAME Lastly, every good branch must come to an end. Once the experimental changes on a branch are stable and your tests are passing, it's probably no longer experimental, and it's time to integrate those changes back into your main line of development. This is called *merging* a branch, and it's done simply as follows: git checkout master git merge NEWBRANCHNAME This switches to the `master` branch and merges `NEWBRANCHNAME` back into it. Hopefully there are no conflicts, but if there are, see above for how to resolve them. The merge process should be familiar if you've already been `push`ing and `pull`ing; `git pull` is really just fetching commits from the remote branches and then `merge`-ing them. ## Excluding files Files that are automatically generated should not be stored in a version control system. For example, you should not store Quartus' database or .sof files, or ase100's .labels and .mif files in a version control system. To ignore these files, download this <a href=".gitignore"><tt>.gitignore</tt></a> file into your repository. Click <a href="https://help.github.com/articles/ignoring-files/">here</a> for more information. ## Conclusions and Further Reading Well, there you have it. That's my little spiel on Git and Github. The gist is that Git is a distributed VCS and we've given you some github repositories. There's so much more to learn, though! My two favorite references for those of you who plan to use Git are: _The Git Parable_ by Tom Preston-Werner, and the _Git Book_ by Scott Chacon. Both are on the web and just a Google away. _The Git Parable_ is, in my opinion, the best reference for understanding the philosophy of Git wrapping your head around the Git Way. _The Git Book_ is more or less just a spec/tutorial, but much more complete than this little document. Good luck, and remember that your instructors and your classmates are all very good resources! [subversion]: images/svn_topology.png [git-topology-decentralized]: images/git_topology_decentralized.png [git-topology-centralized]: images/git_topology_centralized.png [local-git-commands]: images/local_git_commands.png [git-push-pull]: images/git_push_pull.png [github-ssh-keys]: https://help.github.com/articles/generating-ssh-keys [git-track]: https://raw.github.com/git/git/master/Documentation/RelNotes/1.6.6.txt