In the last two lectures we covered JavaScript and node.js, and we also went over the basics of how to publish a module on npm. The goal of this lecture is to discuss how to collaborate on modules efficiently using git, which is a powerful distributed version control system.
git is a tool that lets you maintain multiple distributed versions of a collection of files (for example, the source code for a large program). It was invented by Linus Torvalds, who is the creator of Linux and today it is used in the day-to-day maintence of the Linux kernel, as well as many other software projects. Learning to use git is much easier than learning a new programming language or environment, but it does require a bit of patience to get started.
Installing git is a bit different depending on what system you are using. On Linux or Mac, the easiest way to do this is to install git using your package manager. For example, on Debian/Ubuntu you could do:
sudo apt-get install git
On OS X, you can install it using homebrew or port. For example,
sudo brew install git
You can also download and install git from git-scm.org directly:
On Windows, the easiest solution might be to use github's windows installer. You can find this here:
Once you have git set up on your system, you can use it to create a new repository
To create a new git repository in some directory, you just type: git init
:
> git init
Initialized empty Git repository in /Users/mikolalysenko/GitHub/git-demo/.git/
This will create a folder in the current directory called .git
that contains all the data necessary to track the history of the repository. To add files to the repository, you type git add filename
. For example, here is how you can add a README to your git reposotory:
> touch README.md
> git add README.md
Once you've marked the files you want to track in the current revision, you can commit your changes using git commit
> git commit -m "initial project version"
[master (root-commit) 8eaaebf] initial project version
0 files changed
create mode 100644 README.md
To check out the status of your repository, you can type in git status
> git status
# On branch master
nothing to commit (working directory clean)
This tells you what files have outstanding changes that you need to commit. If you want to make sure that some files are explicitly not tracked by git, you can create a file called .gitignore that will explicitly prevent you from accidentally adding them to version control. This can be useful for example to prevent git from accidentally indexing your node_modules/ folder. Here is example .gitignore file:
# A comment line
node_modules/*
npm-debug.log
You can also move/rename files using git mv
and remove files using git rm
. For example,
> touch test.txt
> git add test.txt
> git commit -m "test.txt"
[master fec59f0] test.txt
0 files changed
create mode 100644 test.txt
> git mv test.txt foo.txt
> git commit -m "renamed test.txt to foo.txt"
[master 1b8fa77] renamed test.txt to foo.txt
1 file changed, 0 insertions(+), 0 deletions(-)
rename test.txt => foo.txt (100%)
> git rm foo.txt
> git commit -m "removed foo.txt"
[master dfdcb07] removed foo.txt
0 files changed
delete mode 100644 foo.txt
If you accidentally add a file that you didn't mean to, you can unmark it using the reset command. Here is a simple example,
> touch test.txt
> git add test.txt
> git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: test.txt
#
> git reset HEAD test.txt
> git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# test.txt
nothing added to commit but untracked files present (use "git add" to track)
The above details are more-or-less standard in any competent version control system. However what makes git unique is its powerful and simple distributed branching model. At any time you can create a branch from the current HEAD commit by typing git branch
:
> git branch test
> git branch
* master
test
The main branch for a repository is called master, and there can be any number of subbranches. Typing git branch
with no arguments shows you a list of all the branches. To switch the current branch you use the checkout
command:
> git checkout test
Switched to branch 'test'
You can then use these branches as check points to move around. For example, suppose that in the test branch we add some more files:
> touch testfile.txt
> git add testfile.txt
> git commit -m "added testfile"
[test 8312a30] added testfile
0 files changed
create mode 100644 testfile.txt
> ls
README.md testfile.txt
We can then switch back to the master branch again using git checkout:
> git checkout master
Switched to branch 'master'
> ls
README.md test.txt
And from here we could even make further changes:
> touch anotherfile.txt
> git add anotherfile.txt
> git commit -m "made another file"
[master cf08781] made another file
0 files changed
create mode 100644 anotherfile.txt
At this point the branch test
and master
have diverged. Supposing that we want to apply the changes we made in test
back to master, we can use the merge
command to combine the two branches. To merge the changes from test
back into master, we can use the merge
command:
> git merge test
Merge made by the 'recursive' strategy.
0 files changed
create mode 100644 testfile.txt
Then the results of making the test will be combined into a single branch in main. If after doing this we don't want the old branch hanging around any more, we can remove it easily:
> git branch -d test
Deleted branch test (was 8312a30).
Sometimes when merging you will get merge conflicts. When this happens, git will print out a message like,
$ git merge test
Auto-merging test.txt
CONFLICT (content): Merge conflict in test.txt
Automatic merge failed; fix conflicts and then commit the result.
To verify this you can check the status using git status
and manually go through and resolve the conflicts. Once this is done, git add
and git commit
the changes to complete the merge.
One of the most important aspects of branching is that this is how git also handles undoing and reverting actions. Once you've made a commit in git, it is there for all time. You can never permanently delete data from git, which greatly improves the robustness of the system. Instead, if you want to go back to an earlier revision. For example, if you want to go back to a previous version of your code, you can use the git checkout command again to revert to a previous using git checkout
followed by the commit hash:
To make changes to this commit, you can make a branch and update it just as before.
Git by itself is just a tool for maintaing multiple versions of collections of files in a distributed system. It is completely agnostic regarding how these repositories are distributed over the internet. However, in node.js there is a preferred way to share repositories which is over github:
To get started on github, first make an account. As a student, you are entitled to a free small plan which allows you to have up to 5 private repositories. To sign up for this account, go to the following URL:
Once that is done, you will probably want to set up a shared RSA key so you don't have to keep typing your password each time you push something. To do this, you should follow the instructions on the following website:
Once you have an account, it is pretty easy to create and share remote repositories. To do this, you just click on the create new repository bytton in the upper right corner:
Once you do that, you will get a screen that looks something like this:
If you type in a name, and hit "Create repository" github will create a new empty remote repository for you. You can now use this remote repository to back up one of your local git repositories online. GitHub is pretty explicit in telling you how to do this. If you look at the page, notice the following box,
Typing the first line into your repository tells git to add a remote host called "origin" to the repository:
> git remote add origin [email protected]:mikolalysenko/test-repository.git
The second line,
> git push -u origin master
Pushes your master branch to the remote repository called origin. In general, the syntax:
> git push remote-repo local-branch
Tells git to make a copy of your local repository branch on the remote repository called remote-repo.
The main advantage to using github to manage your repositories is that it is much easier to make contributions to existing open source projects. For example, if you want to copy an existing repository from github, you can clone it using the git-clone command:
> git clone [email protected]:mikolalysenko/test-repository.git
You can also make copies of repositories directly on github using the fork feature. This lets you own your own copy of the repository so you can make modifications and test things out yourself.
Once you've made a change to a fork, you can also send a patch back upstream to the original using github's pull request feature. The owner of the original repository can view the pull requests for their project, make comments and review the patch and ultimately accept or reject your changes.
The primary of function of github is that it offers you a convenient and simple way to store git repositories remotely. This makes it much easier to collaborate on projects and maintain backups of your work. However, GitHub also has many other useful features that make management of open source projects much easier. The following is a brief survey of some of the more useful parts of GitHub.
For documenting your projects, github supports the markdown document mark up lanaguage. If you create a file called README.md in the root of your project, GitHub will by default render this file on the first page of your repository. This can be helpful in explaining and documenting how your code works. npm also supports markdown, and by default will display your github README on your npm page if you have specified it.
GitHub has a built in bug tracker and wiki. You can use this to keep track of feature requests and bugs in your project, as well as open up issues on other projects. The wiki is again built on top of markdown and can be used to maintain more detailed documentation for your project. Finally, GitHub also has a detailed statistics tracking API which is useful for viewing the branch history and commits of your project.
npm is also smart enough to be able to load packages from sources other than npm. In particular, it understands git repositories and can load modules from a git repository url. There is also strong integration with github, and you can use it to directly load a module by specifying the username/repository as a dependency.
There are a large number of tutorials online that talk about how to use git in great detail. One of the best of these is "Pro Git" which covers all the basics in great detail:
There are also a number of fun interactive tutorials to practice using git:
The above reference is highly recommended and can be played as something like an interactive puzzle game.
Finally, if you want to know more about the GitHub service, you can read their extensive online documentation and help here: