How We Switched to a Monorepo with Git

November 30th, 2017

Why We Switched

Our team of developers were using several different services in separate repositories, so we were faced with a few challenges when it came to our development and CI processes. We had an infrastructure repository which contained the dependencies for several of our services. This caused issues when we had to do lockstep updates for dependency changes. This problem was amplified when we were required to add breaking changes to our infrastructure repository.

Once the infrastructure changes would go in, our master branches would be in a broken state while waiting for the changes in the service repository, often causing headaches for other developers. The same issue also appeared when committing changes for our backend and frontend services that relied on each other. In addition to the impact of these lockstep changesets, we had a huge timesink with managing several different repos, creating Pull Requests for each one, and trying to reduce code duplication between them.

Implementing the Monorepo

To create a more efficient process, we decided to switch to a Monorepo.

Since the actual creation of the Monorepo would interrupt everyone’s workflow, we decided to script the whole process so that it could be quickly done from a fresh checkout. That way, a Pull Request could be raised at a convenient time for the whole team.

The first step was merging all the repositories together. We started with our core repository because we wanted it to be the destination for our Monorepo. Then we added the infrastructure, frontend-service, and backend-service repositories. Since this was done on OSX, `gsed` was used. If it had been a Linux environment, we would have replaced `gsed` with `sed`.

Below is the initial merge script:

#!/usr/bin/env bash

root="${PWD}/src"
modules=(
    infrastructure
    frontend-service
    backend-service
)

echo "Cloning..."
mkdir -p "${root}"
cd "${root}"
git clone git@github.com:aioTV/core.git
for module in "${modules[@]}"; do
    echo ${module}
    git clone git@github.com:aioTV/${module}.git
done

echo "Updating root path..."
for module in "${modules[@]}"; do
    echo ${module}
    cd "${root}/${module}"
    git filter-branch -f --index-filter '
        git ls-files -sz |
        gsed -z "s,\t,&'"$module"'/," |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index -z --index-info &&
        mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
    ' HEAD
done

echo "Pulling modules into core..."
for module in "${modules[@]}"; do
    echo ${module}
    cd "${root}/core"
    git pull --allow-unrelated-histories --no-edit ${root}/${module} master
done

After the initial merge was complete, we made changes to the new core Monorepo to support building in our CI environment.

Once the required changes were made, we created the following Git patch from the core repository directory:

git format-patch HEAD^ -o ../../

We then appended the following snippet to the Monorepo script so that the patch would be applied after merge:

echo "Applying patch..."
cd "${root}/core"
git am "${root}/../*.patch"

The Result

Now that all of our services and infrastructure are in a single location, our development process is much more efficient. The code is shared among the development team, which makes collaboration natural throughout the project.

How we approached building the Monorepo in Jenkins will be discussed in a future post.

Let us know if you have any comments or questions here.

Source: https://stackoverflow.com/a/21495718