Git - a version control system

Go to:

What is it? Top

Git is a version control system (VCS) that was born in 2005. Version control means that each change is tracked and classified, so that it is easy to recall specific versions after any number of modifications.

Linus Torvalds and other developers created Git to simplify the development of the Unix kernel.

Nowadays it is one of the most used version control systems: among the others, the famous GitHub website is based on git, as it is easy to guess.

You can find several guides on git in the web, probably most of them are much better than the one you are reading. A short list:

Concepts Top

Git is different from other version control systems. Subversion, for example, is based on a central server where all the versions are stored: it is a Centralized VCS. On the contrary, Git (as Mercurial) is a Distributed VCS: there is no need for a central server, and the data may be shared directly between the working nodes.

Git saves inside a .git/ folder all the snapshots of your project. It saves snapshots: each time you commit changes, Git creates a folder where it stores the complete situation of the project. If a file is not changed, Git creates a link to the previous snapshot, so there is no waste of space.

The snapshots do not contain a date/time. Git does not consider the time history of the files, but their contents. If you modify a committed file and then you undo all the changes before a new commit, for Git the file has not been edited at all. Content, however, means also spaces and white lines: a good practice is to avoid modifying the spacing and indentation of the existing files, if not strictly required.

What you commit in Git is very difficult to be destroyed: you add data to the Git database, even if you delete files in your project. Almost all the operations are undoable. Git has also an internal checksum integrity test, so if a Git file is corrupted the system recognizes the problem.

Create a repository Top

The first thing to do when you want to create a new Git repository is to decide which is the main folder. All the project must be contained in this folder.

All the Git commands pass through the git command, that is the starting point of the command line Git sintax. git takes a number of different arguments to do everything. One of these arguments is init. Inside the folder, you can create a git repository using

git init

This creates the .git/ folder and its basic content.

There is another possibility: if you already have a Git repository somewhere, or if you want to work on a software that has a public Git repository, you can clone it. Clone means that Git copies all the data from the remote and stores everything in your local filesystem: for example, you can copy CosmoMC from GitHub using

git clone https://github.com/cmbant/CosmoMC.git


Before starting to commit, you should define your identity. You can do it globally or locally (delete the --global option):

git config --global user.name "Stefano Gariazzo"
git config --global user.email gariazzo@to.infn.it

If you use the global options, this is something that you have to do only the first time you use Git.

Commit changes Top

After you will have done some changes to the cloned repository, or you will have created your first files, you have to save your changes. This is easily done in few steps: to show the current status of the working directory with respect to the last Git snapshot, use

git status
.
If you use the -s option, the output will be shortened a bit.

The status will include a list of Untracked, Modified, Deleted or new files. You can select the files to track with

git add filename1 filename2 ...

At this point, the modifications are "staged": they are ready to be saved. The staged files are stored in a new snapshot through a commit operation. Each commit must be accompained by a non-empty description message. If you don't use the -m option, Git will open an editor where you are asked to write the message:

git commit -m "commit message"

You can also use the -a option to automatically add all the changes and commit them, skipping one step:

git commit -a -m "commit message"

There are files that you may not want to consider in a commit (temporary, compiled files, logs or others). A simple way to let Git know to ignore these files is to save a .gitignore file with the names of the files that should be ignored, as for example

#exclude *.a and *.o:
*.[oa]
#exclude the build folder:
build/
#TODO file in the main directory:
/TODO
# ignore *.txt files in doc/, but not in its subfolders
doc/*.txt
# ignore all .pdf files in the doc/ directory
doc/**/*.pdf

You can show the changes that you did since the last commit, showing the unstaged or the staged only:

git diff
git diff --staged

Additional useful commands are:

Branches Top

One interesting concept is that of the branches. A branch is just a pointer to a commit. Each commit, however, contains several information, including the pointers to the parent commits. While the initial commit has no parents, all the following ones have at least one parent. You can use a branch to restore the situation of a previous commit, but also you can split the commit histories. It is usual to create new branches to test new features (topic branches) without affecting the release version of the project, or to implement slightly different versions of the same code without having to copy all the files in a new directory. This is an advantage that is specific of Git, while many other VCS require the working directory to be copied entirely to create a new branch.

The initial branch that Git creates is called "master", but it has nothing special. Each branch is treated in the same way.

A new branch "testing" is created and entered with

git branch testing
git checkout testing

or the equivalent shorthand:
git checkout -b testing

The new branch points to the commit you are currently on.

If you now create a new commit from inside the "testing" branch, the "master" branch will not be affected. The history of the two branches will be separated until you will merge them. If you want to edit some files in the "master" branch, you have to change the working branch:

git checkout master

This will restore the files in the working directory to the commit corresponding to "branch" and move a special pointer, HEAD, to the current commit. HEAD is the way Git uses to know which is the commit corresponding to the current state of the working directory.

Once you are back in the "master" branch, you can notice that the modifications you committed in "testing" are not present, and the opposite would happen for the commits applyed to "master". If you want to include in the "master" branch the commits that you did in "testing", you can use (from inside "master")

git merge testing

If the merge does not encounter conflicts, it will be done by git without any user intervention. Git considers the common ancestor in the two branch histories and computes the differences from that point to solve the merge. If the "master" version of a file is in conflict with the "testing" version of the same file, however, Git will ask you to solve the conflicts and to indicate the correct version of the file to be considered after the merge. To do this, you can use

git mergetool
,
that uses an external program to let you see and solve all the conflicts. As an example, I use meld. You can set it as the default mergetool with
git config --global merge.tool "meld"

There is another way to merge two branches, and it is called the rebase:

git checkout testing
git rebase master

Instead of calculating the differences with respect to the common ancestor and to apply them into a new commit, the concept of a rebase is different: you patch the modifications that were done to bring to the last commit of "testing" from the common ancestor, and you apply them to the last commit in "master". This is useful if you want to keep the modifications you did in "testing", while including in this branch also the new features that have been committed in "master".

A shorthand for the two commands above is

git rebase master testing
,
that re-applies the commits in "testing" on the "master" commit. In this way, "testing" will contain the last commits that were done on "master".

As we said for git commit --amend, remember not to rebase any commit that is present in remote repositories to avoid messy things to all the other people working on the project.

Additional useful commands are:

Working with remotes Top

Sharing a code is very simple nowadays. Websites such as GitHub allows to easily share with the whole world a software and collaborate on the its development. All this stuff is easy with the Git management of a remote. Remotes are versions of the project that are hosted on the Internet or in some local network. You can list the remotes configured for your project using (with the -v option to show also the network addresses)

git remote

If you cloned a repository, you will have a remote called origin. You can add other remotes with

git remote add shortname url

The shortname is a name used in place of the url in all the operations.

To download data from a remote, use

git fetch shortname

A fetch usually saves in your local folders the content of the remote branches. For the "origin" remote, for example, you may see with git branch -r or git branch --all that there are some remote branches, named for example origin/master or origin/testing. When the data are saved locally, you may want to merge the remote and the local branch with the commands we already saw.

Git has a command that takes care of fetching and merging:

git pull origin next
,
which merges the "next" branch fetched from "origin" into the current branch.

You may want to set a tracking for some branches, so that pull knows automatically which remote branches must be merged into which local branches: to create a new local branch "localname" from a remote branch "branchname" on "origin" use

git checkout -b localname origin/branchname

and to set up a tracking for the current branch, if already existing, use
git branch -u origin/branchname
.

git branch -vv
shows the list of the branches, with the last commits and the remote branch they are tracking.

Finally, let's introduce push. git push is used to send the local data to a remote repository. To send your "master" branch to "origin", use

git push origin master

(add the -u option to set the tracking with the new branch).
You can also push all the local branches at one time:
git push origin --all
.

The push may be rejected. This happens if you don't have write permissions on the remote, or if someone pushed some modifications between your last clone/fetch/pull and your push attempt. In this case, before pushing you must fetch the remote content, solve the conflicts, if any, and merge locally the modifications. Then, your push will be accepted.

Stash temporary work Top

It may happen that you are working on a feature of your code, but you don't have time to finish since you must work on something else. In these cases, it is not convenient to commit the modifications of an incomplete and temporary work.

Git helps you with the stash command. Stashing modifications means that Git saves the state of the files and restores the last commit version. You will not lose your incomplete work and you will be able to work on something else, starting from a clean and already committed version of the project.

These are some stash related commands:

Clean Top

Something useful is the git clean command. This is used to remove from the working directory all the files that Git is ignoring and to get rid of cruft.

These are some clean useful options: