In this chapter you will learn how to maintain dependencies on other Git-based software projects within your own using Git submodules.
-
When submodules are useful
-
How to add a submodule to a repository
-
How to view the status of the submodules in a repository
-
How to update and initialize all submodules in a repository
-
How to run a command in every submodule in a repository
Almost all software projects will make use of other software projects as libraries or tools. For example, say you’re using Git and writing a desktop application in C, and you want to communicate with a server that provides a JSON API. Rather than writing the JSON handling yourself, you find an open source project on GitHub that provides a library for accessing JSON APIs with C. You want to include this open source library into your project and update it when they’ve released new versions with new bug fixes you need.
There are generally two approaches to handling other software projects (usually known as dependencies) and which versions work with your own software:
-
Write documentation for what other software projects are required, what versions they should be, and where they should be installed so other developers building the project know how to set it up correctly.
-
Include the dependencies in the project’s repository so they’re always available to anyone when cloning the repository. This is known as vendoring the dependencies.
There are pros and cons of both approaches. Adopting #1 means that the software source code repository can avoid including other software projects. In the C++ application example, it would mean documenting for other developers where and how they should download the external JSON library’s Git repository from rather than storing anything related to it in the C++ application’s Git repository.
Adopting #2 means that you always have the various dependencies available but can increase the space used by the version control system. In the C++ application example (without submodules), you might copy the source code of the external JSON library into the application’s Git repository. When you wanted to update the library version, you’d copy in the new code and commit it.
As Git stores the complete history of a repository and downloads it all when cloned, too many large dependencies can result in a repository that takes a long time to clone and is unclear about any other repositories whose source code was used to populate some of this repository. This makes updating versions of things like external libraries a painful, manual process. For this reason, submodules were created.
A Git repository can contain submodules. Submodules allow you to reference other Git repositories at specific revisions. This is most commonly used to reference external Git repositories that are dependencies for software projects. In the C++ application example, instead of documenting the location or copying the source code of the external JSON library into the application’s Git repository, you could use submodules to reference the external JSON library’s Git repository.
Git’s submodules store the reference to a specific SHA-1 reference in another remote Git repository and store these in subdirectories of the current repository. All that is actually committed in the current repository are some small pieces of metadata which the git submodule
command uses to clone, update, and interact with the various submodules inside a Git repository.
Note
|
What is
You may have heard about git subtree ?git subtree , which is an alternate method of managing Git subprojects inside a Git repository. Instead of just referencing other Git repositories, git subtree will store the contents of the remote Git repository. It’s a contributed command to Git, which means it’s not documented or supported to the same extent as git submodule , so I won’t be covering it in this book. If you want to read more, you can view the git subtree documentation on GitHub: https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt
|
Let’s start by creating a new repository that can be used as a submodule in our existing GitInPracticeRedux
repository.
Create a new repository on GitHub that we can use as a submodule by following these steps:
-
Create a new repository with
git init
and pass the path to a new directory; for example,git init /Users/mike/GitInPracticeReduxSubmodule/
-
Change to the directory containing your new submodule repository; in this example,
cd /Users/mike/GitInPracticeReduxSubmodule/
. -
Create a file named
TODO.md
with theecho
command by runningecho "# TODO\n1. Add something useful to this submodule." > TODO.md
. -
Commit the new
TODO.md
as the initial commit by runninggit commit --message "Initial commit of submodule." TODO.md
. -
Create a new repository on GitHub (or another Git hosting provider).
-
Add the new remote reference to the GitHub repository by running
git remote add origin https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
. -
Push the repository to GitHub by running
git push --set-upstream origin master
.
The output of all these commands should resemble the following:
# git init /Users/mike/GitInPracticeReduxSubmodule/
Initialized empty Git repository in
/Users/mike/GitInPracticeReduxSubmodule/.git/ (1)
# cd /Users/mike/GitInPracticeReduxSubmodule/
# echo "# TODO\n1. Add something useful to this submodule." > TODO.md
# git commit --message "Initial commit of submodule."
[master (root-commit) e95b4cd] Initial commit of submodule. (2)
1 file changed, 2 insertions(+)
create mode 100644 TODO.md
# git remote add origin
https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
# git push --set-upstream origin master
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
* [new branch] master -> master (3)
Branch master set up to track remote branch master from origin.
-
New repository
-
Initial commit
-
Push repository
From the new submodule repository creation output:
-
"New repository (1)" shows the creation of a new Git repository on disk to be used as a new submodule repository for the
GitInPracticeRedux
repository. It has been created outside theGitInPracticeRedux
directory, so it can be added later as if it were just another GitHub repository. -
"Initial commit (2)" shows the first commit to the new submodule repository of the
TODO.md
file. -
"Push repository (3)" shows the push of the initial commit to the newly created GitHub repository.
The new submodule repository has been created and pushed to GitHub. Note that it’s not yet a submodule of the GitInPracticeRedux
repository; this was just to create a new repository that could be added as a submodule repository afterward.
Now that the submodule repository has been created and pushed to GitHub, it can be removed from your local machine with rm -rf GitInPracticeReduxSubmodule/
. Don’t worry; remember a complete copy is stored on GitHub (which we will use next).
Now that we’ve created a new submodule repository, let’s add it as a submodule to the existing repository.
You wish to add a the GitInPracticeReduxSubmodule
repository as a submodule of the GitInPracticeRedux
repository in the master
branch.
-
Change to the directory containing your repository; on my machine,
cd /Users/mike/GitInPracticeRedux/
. -
Run
git checkout master
. -
Run
git submodule add https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git submodule
. -
Commit the new submodule changes to the repository by running
git commit --message "Add submodule."
The output of all these commands should resemble the following:
# git submodule add
https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
submodule
Cloning into 'submodule'... (1)
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Checking connectivity... done.
# git commit --message "Add submodule."
[master cc206b5] Add submodule.
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules (2)
create mode 160000 submodule (3)
-
Submodule clone
-
.gitmodules file
-
Submodule directory
From the submodule addition output:
-
"Submodule clone (1)" shows the clone of the
GitInPracticeReduxSubmodule
into the directory namedsubmodule
in the local repository. After this was done, it also created a.gitmodules
file in the root of the repository’s working directory. -
".gitmodules file (2)" shows the file that contains the submodule metadata, such as the directory path and the URL.
-
"Submodule directory (3)" shows the new directory named
submodule
that was created to store the contents of the new submodule repository. Note that you’d normally not call thissubmodule
but we’re just using this name for these examples.
You have successfully added the GitInPracticeReduxSubmodule
submodule to the GitInPracticeRedux
repository.
We will now refer to GitInPracticeRedux
as the "superproject" i.e. the Git repository containing the submodule.
The new directory named submodule
behaves like any other Git repository. If you change into its directory, you can run services like GitX, git log
, and even make changes and push them to the GitInPracticeReduxSubmodule
repository (provided you have commit access).
Git makes use of the .gitmodules
file and special metadata for the directory named submodule
to reference the submodule and the current submodule commit. This is used to ensure that anyone else cloning this repository can access the same submodules at the same version after initializing the submodule(s).
Initializing all submodules can be done by running git submodule init
, which copies all the submodule names and URLs from the .gitmodules
file to the local repository Git configuration file (in .git/config
). Note that this was done for you when you ran git add
.
Let’s take a closer look at the last commit:
git show
submodule output# git show
commit cc206b5c9b30eef23578e48dadfa3b194a50cfe7
Author: Mike McQuaid <[email protected]>
Date: Fri Apr 18 16:16:30 2014 +0100
Add submodule.
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..c63f995
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "submodule"] (1)
+ path = submodule (2)
+ url = https://github.com/MikeMcQuaid/GitInPracticeReduxS... (3)
diff --git a/submodule b/submodule
new file mode 160000
index 0000000..e95b4cd
--- /dev/null
+++ b/submodule
@@ -0,0 +1 @@
+Subproject commit e95b4cd02cafa486a7baec19ab26edec28e9eddc (4)
-
Submodule name
-
Submodule path
-
Submodule URL
-
Submodule commit
From the git show
submodule output:
-
"Submodule name (1)" shows the name of the submodule that was created in the repository:
submodule
. This is used to reference this particular submodule with any additional submodule commands. -
"Submodule path (2)" shows the directory location where the submodule is cloned into. This is where the submodule files will be accessed.
-
"Submodule URL (3)" shows the remote repository location for the submodule that was added.
-
"Submodule commit (4)" shows the commit SHA-1 for the submodule. Even if there are changes to the submodule, this will always be the commit that is checked out by anyone using this submodule in this repository. This is to ensure that the submodule only uses a known, tested version and that changes to the submodule’s Git repository (which may be something you don’t have any control over) doesn’t change anything in the current repository.
git submodule add
can also take some parameters to affect its behavior:
-
The
--quiet
(or-q
) flag can be passed to makegit submodule add
only print out error messages and no status information. -
The
--force
(or-f
) flag can be passed to allow adding a submodule path that would otherwise be ignored by.gitignore
rules. -
The
--depth
is passed to thegit clone
of the submodule to allow creating a shallow clone with only the requested number of revisions within it. This can be used to shrink the size of the submodule on disk. This flag forgit clone
was mentioned previously in 02-RemoteGit.adoc and can be useful for reducing the clone time for very large repositories.
Now that we’ve added a submodule to the repository, it can be useful to query what submodules have been added and what their current status is. This can be done with the git submodule status
command.
-
Change to the directory containing your repository; for example,
cd /Users/mike/GitInPracticeRedux/
. -
Run
git submodule status
. The output should resemble the following:
# git submodule status
e95b4cd02cafa486a7baec19ab26edec28e9eddc submodule (heads/master) (1)
-
Submodule status
From the submodule status output:
-
"Submodule status (1)" shows the SHA-1 of the pinned submodule, the name and the ref that it’s pointing to (the
master
branch in this case). This matches the SHA-1 you saw earlier in thesubmodule
directory metadata.
We have initialized a submodule (copied the submodule names and URLs .gitmodules
to .git/config
) when we ran git submodule add
earlier. But initialization won’t be done automatically for anyone else with a clone of this repository: they must run git submodule init
.
Let’s simulate this situation by making a new clone of the GitInPracticeRedux
repository.
-
Change to the parent directory of the directory containing your repository; on my machine,
cd /Users/mike/GitInPracticeRedux/..
. -
Run
git clone GitInPracticeRedux GitInPracticeReduxClone
.
You wish to initialize all submodules in your repository and populate their working tree according to the submodule commit recorded in the GitInPracticeRedux
superproject.
-
Change to the directory containing your newly cloned repository; for example,
cd /Users/mike/GitInPracticeReduxClone/
. -
Run
git submodule update --init
. The output should resemble the following:
# git submodule update --init
Submodule 'submodule'
(https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git)
registered for path 'submodule' (1)
Cloning into 'submodule'...
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Checking connectivity... done. (2)
Submodule path 'submodule': checked out
'e95b4cd02cafa486a7baec19ab26edec28e9eddc' (3)
-
Submodule init
-
Submodule clone
-
Submodule checkout
From the submodule initialize and update output:
-
"Submodule init (1)" shows the registration of the submodule into the Git repository.
-
"Submodule clone (2)" shows the submodule being cloned into the local Git repository.
-
"Submodule checkout (3)" shows the submodule contents being checked out into the
submodule
directory for the currently stored revision.
git submodule update
can take some parameters to customize its behavior:
-
The
--recursive
flag, which will rungit submodule update --init
inside each of the submodules directories too. This is useful when there are nested submodules inside submodules. -
The
--force
(or-f
) flag can be passed to update the submodules to the commit recorded in the superproject by running the equivalent ofgit checkout --force
--to discard any uncommitted changes made to the submodule. -
The
--depth
is passed to thegit clone
of the submodule to allow creating a shallow clone with only the requested number of revisions within it. This can be used to shrink the size of the submodule on disk.
git clone
can also take a --recurse-submodules
(or --recursive
) flag to automatically run git submodule update --init
on any submodules within the repository. Typically if you’re cloning a repository you know contains submodules, then you’ll use git clone --recursive-submodules
to clone it and all the necessary submodules (and the submodules of the submodules, if they exist).
When you wish to update the submodule to the latest upstream revision to incorporate any changes that were made in the upstream, submodule repository you can use the following git submodule update
parameters:
-
The
--remote
flag will fetch and checkout the latest upstream revision in the local submodule repository. This would then require another commit to update this on the localGitInPracticeRedux
repository and a push to update this on the remoteGitInPracticeRedux
repository. This should only be done after testing that the changes made to theGitInPracticeReduxSubmodule
repository remain compatible with theGitInPracticeRedux
project. -
The
--no-fetch
flag will attempt to update the submodule without runninggit fetch
. This will only update the submodule to a later revision if this has already been fetched. This is useful if you want to fetch the changes to a submodule now and then update and test this update at a later point.
Sometimes you may wish to perform a command or query within every submodule. For example, you may want to iterate through all the submodules in a repository (and their submodules) and run a Git command to ensure they have all checked out the master
branch, and have fetched the latest remote repository commits or print status information. Git provides the git submodule foreach
command for this case: it takes a command (or commands) as an argument and then iterates through each Git submodule (and their submodules) and runs the same command.
You wish to output some status information for every submodule in the GitInPracticeRedux
repository.
-
Change to the directory containing your repository; for example,
cd /Users/mike/GitInPracticeRedux/
. -
Run
git submodule foreach 'echo $name: $toplevel/$path [$sha1]'
. The output should resemble the following:
# git submodule foreach 'echo $name: $toplevel:$path [$sha1]'
Entering 'submodule' (1)
submodule: /Users/mike/Documents/GitInPracticeRedux:submodule (2)
[e95b4cd02cafa486a7baec19ab26edec28e9eddc] (3)
-
Current submodule
-
Submodule name, path
-
Submodule SHA-1
From the submodule loop output:
-
"Current submodule (1)" shows a message showing the name of each submodule that is iterated through.
-
"Submodule name, path (2)" shows the use of the
git submodule foreach
$name
,$toplevel
, and$path
variables to print out the name of the submodule, the top level repository it belongs to, and the path within that repository.. -
"submodule SHA-1 (3)" shows the use of the
git submodule foreach
$sha1
variable to print the current SHA-1 of the submodule.
You have successfully iterated through the submodules in the GitInPracticeRedux
repository and used all the git submodule foreach
variables to print some status information.
In this chapter you hopefully learned:
-
How to use submodules to vendor project dependencies
-
How to use
git submodule add
to add a submodule and commit its metadata -
How to use
git submodule status
to view all submodules and their current revision -
How to use
git submodule update --init
to initialize all submodules, fetch any changes, and update them to the latest revision -
How to use
git submodule foreach
and its variables to run commands and print metadata for every submodule in a repository