When starting a new side-project I always wonder if parts of the codebase can be shared as an open-source project. A library extracted from the project might be of interest to other developers and it’s nice to have good stuff on your Github profile.

That possibility may compel you to spend a lot of time thinking about the repository structure. You may create a git submodule for every library that doesn’t exist yet but might be useful to you in other projects or to other people and you don’t want to lose the git history of that code. Trying to predict what is shareable before anything is done is almost impossible. When starting a project we have to try to do the minimum amount of metawork possible. There’s a solution to this problem that won’t require that much of upfront planning.

Don’t introduce any first-party git submodule in the beginning. Make sure things are cleanly separated in directories and get started.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[...]
* 493fcbc HAMT: Clone and operator=
* a33bf67 HAMT: Add destructor for the HAMT
* 9604360 HAMT: Start passing entries around instead of key and values
* 578fcbf HAMT: Move BitmatTrie and Node out of the HAMT class
* c8a6e1b HAMT: Start making it look like std::unordered_map
* 591df2a HAMT: Use KeyEqual to compare keys
* 16d3bb0 HAMT: Little pedantic tweaks
* fe8bec4 HAMT: Get rid of the fancy allocators
* 9295225 HAMT: Use std::{pair,hash} and move allocators out of the HAMT class
* df1a923 HAMT: Simplify alloc size calculation logic
* 07520de HAMT: Collect Fibonacci statistics correctly
* 9b3e627 HAMT: Make insert return the entry node pointer
* 960e5db HAMT: Rename some things
* 7d3993e HAMT: Make some stuff private
* bbde709 HAMT: Super wasteful allocation
* 429f897 HAMT: Fibonacci allocation and free lists
* 3799eca HAMT: Basic implementation
* 9777aa4 chronos: Fix lint errors in chronos.cpp
* e8d15f4 sqlite: Fix lint errors
* 4b796be chronos: Fix indentation
* ad7902f chronos: Use tuple for queries using the SQLite wrapper
* d04407e chronos: Add projects to the database through the UI
* 575ad64 chronos: Prototype with User and projects on the left sidebar
* 6bf5612 sqlite: Initial commit to the SQLite wrapper
* c67fe58 foc: SmallVector from LLVM (and tests!)
* fbecc32 cpplint.py
* 0750734 chronos: SQLite database schema
* 29eb494 sqlite: SQLite amalgamation source files
* 92c376a Copy examples/opengl3_example to chronos
* 3a369f6 Add googletest as a submodule
* bf1da9a Add imgui as a submodule
* bc3e92e Initial commit: .gitignore

Now you may notice that you have a directory with cool data structures that you want to extract and publish as a library. You can create a new repository that contains only that directory and commits that touch it using git filter-branch. Here’s how I did it for a project of mine.

To start, I made a copy of the repository to a new directory called foc_libraries.

1
2
3
~/code $ ls chronos/
CMakeLists.txt  chronos         cpplint.py      foc             sqlite_wrapper  vendor
~/code $ cp -r chronos foc_libraries

This new repository will be the repository of the library based on the foc in chronos. I use git filter-branch on master to filter out anything that’s not in the original foc directory.

1
2
3
~/code/foc_libraries (master)$ git filter-branch --prune-empty --subdirectory-filter foc/ master
Rewrite 1299f7e3c0e2ab9f469a72e2880700a301386639 (51/51)
Ref 'refs/heads/master' was rewritten

git filter-branch removed all the commits that weren’t related to the foc directory and the foc_libraries repository now contains only the files from the original foc directory in chronos.

1
2
3
4
5
6
7
8
9
10
~/code/foc_libraries (master)$ git ls-files
allocator.h
array_ref.h
hash_array_mapped_trie.h
hash_array_mapped_trie_test.cpp
hash_array_mapped_trie_test_helpers.h
none.h
small_vector.h
small_vector_test.cpp
support.h

The commit log of foc_libraries doesn’t have any commit related to the original application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[...]
* 8c46e22 - HAMT: Move BitmatTrie and Node out of the HAMT class
* e2fd8cd - HAMT: Start making it look like std::unordered_map
* 67896cc - HAMT: Use KeyEqual to compare keys
* 1237a32 - HAMT: Little pedantic tweaks
* 1868375 - HAMT: Get rid of the fancy allocators
* 8259c1d - HAMT: Use std::{pair,hash} and move allocators out of the HAMT class
* 90d5771 - HAMT: Simplify alloc size calculation logic
* ae69ffa - HAMT: Collect Fibonacci statistics correctly
* adf5651 - HAMT: Make insert return the entry node pointer
* 2e8dfaa - HAMT: Rename some things
* 1addd89 - HAMT: Make some stuff private
* 11156b8 - HAMT: Super wasteful allocation
* 85b2d73 - HAMT: Fibonacci allocation and free lists
* 9c6dc21 - HAMT: Basic implementation
* 9f0a6c5 - foc: SmallVector from LLVM (and tests!)

Now I can upload foc_libraries to Github.

1
2
3
4
5
6
7
8
9
10
11
~/code/foc_libraries (master)$ git remote add origin git@github.com:felipecrv/foc_libraries.git
~/code/foc_libraries (master)$ git push -u origin master
Counting objects: 186, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (86/86), done.
Writing objects: 100% (186/186), 67.43 KiB | 0 bytes/s, done.
Total 186 (delta 124), reused 106 (delta 100)
remote: Resolving deltas: 100% (124/124), done.
To git@github.com:felipecrv/foc_libraries.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin

After publishing the foc_libraries repository I can go back to chronos and replace the foc directory with a git submodule.

Remove the foc/ directory
1
2
3
4
5
6
7
8
9
10
~/code/chronos (master)$ git rm -r foc
rm 'foc/allocator.h'
rm 'foc/array_ref.h'
rm 'foc/hash_array_mapped_trie.h'
rm 'foc/hash_array_mapped_trie_test.cpp'
rm 'foc/hash_array_mapped_trie_test_helpers.h'
rm 'foc/none.h'
rm 'foc/small_vector.h'
rm 'foc/small_vector_test.cpp'
rm 'foc/support.h'
Add the submodule as foc/
1
2
3
4
5
6
7
8
~/code/chronos (master)$ git submodule add https://github.com/felipecrv/foc_libraries.git foc
Cloning into 'foc'...
remote: Counting objects: 186, done.
remote: Compressing objects: 100% (62/62), done.
remote: Total 186 (delta 124), reused 186 (delta 124), pack-reused 0
Receiving objects: 100% (186/186), 67.43 KiB | 0 bytes/s, done.
Resolving deltas: 100% (124/124), done.
Checking connectivity... done.
Review the changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
~/code/chronos (master)$ git status
On branch master
Your branch and 'origin/master' have diverged,
and have 28 and 1 different commit each, respectively.
  (use "git pull" to merge the remote branch into yours)
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   .gitmodules
        new file:   foc
        deleted:    foc/allocator.h
        deleted:    foc/array_ref.h
        deleted:    foc/hash_array_mapped_trie.h
        deleted:    foc/hash_array_mapped_trie_test.cpp
        deleted:    foc/hash_array_mapped_trie_test_helpers.h
        deleted:    foc/none.h
        deleted:    foc/small_vector.h
        deleted:    foc/small_vector_test.cpp
        deleted:    foc/support.h
Commit the changes
1
2
3
4
5
6
7
8
9
10
11
12
13
~/code/chronos (master)$ git commit -m "Extract foc/ as a git submodule"
[master b2d4ce0] Extract foc/ as a git submodule
 11 files changed, 4 insertions(+), 3925 deletions(-)
 create mode 160000 foc
 delete mode 100644 foc/allocator.h
 delete mode 100644 foc/array_ref.h
 delete mode 100644 foc/hash_array_mapped_trie.h
 delete mode 100644 foc/hash_array_mapped_trie_test.cpp
 delete mode 100644 foc/hash_array_mapped_trie_test_helpers.h
 delete mode 100644 foc/none.h
 delete mode 100644 foc/small_vector.h
 delete mode 100644 foc/small_vector_test.cpp
 delete mode 100644 foc/support.h

Some adaptations might be needed on both repositories, but that’s pretty much all there is to it. Happy gitting!