Foreword

Git is currently an essential skill for programmers, and can be used to manage code, documents, blogs, and even recipes. Commits to personal private repositories can be relatively casual, but in team development, corresponding specifications still need to be followed. This article organizes some practices related to commits in Git usage for your reference.

git commit

As shown in the figure above (taken from Angular commit 970a3b5 ), a commit contains the following information:

  • commit message - Description related to the content of the commit
  • author & committer - Author and committer
  • changed files - Modified files
  • hash & parent - Hash of the commit content and its position in the commit tree

Commit Message

The commit message describes the functional information related to the current commit. It can generally include header, body, footer:

<header>
<BLANK LINE>
<body>
<BLANK LINE>
<footer>

For industry best practices, refer to Angular’s commit standards: Commit Message Format

Among them, <header> is mandatory. The format recommended by Angular is as follows:

<type>(<scope>): <short summary>
  │       │             │
  │       │             └─⫸ Summary in present tense. Not capitalized. No period at the end.
  │       │
  │       └─⫸ Commit Scope: animations|bazel|benchpress|common|compiler|compiler-cli|core...
  │
  └─⫸ Commit Type: build|ci|docs|feat|fix|perf|refactor|test

In <header>, <type> and <summary> are mandatory, and <scope> is optional. It is recommended to keep <header> within 50 characters.

<type> indicates the type of this commit, generally including the following:

  • build: changes related to the build
  • ci: changes related to continuous integration
  • docs: documentation
  • feat: new features
  • fix: bug fixes
  • perf: performance-related changes
  • refactor: refactoring related (not bugs, not new features)
  • test: testing related, including adding tests or changing existing tests

<scope> indicates the scope of the change. In Angular, a commit may involve scopes such as form handling, animation handling, etc. In actual work, it can be determined according to the project.

<summary> is a brief description of the commit, using imperative mood and present tense. For example, use change instead of changed or changes.

<body> is a more detailed description of the commit message, also using imperative mood and present tense like <header>. <body> describes the motivation for the change, such as why the change was introduced, what the previous logic was, what the current logic is, and what impact the change has.

Finally, <footer> is optional, and generally involves descriptions of breaking changes, feature deprecation, and references to GitHub issue or Jira ticket, PR references, etc.

Standard commit messages can be parsed by tools to automatically generate documentation or release notes. In some large open-source projects, it is very time-consuming and labor-intensive to manually organize version update documents, interface updates and compatibility impacts. Using a unified specification can greatly automate this part of the work. Of course, different projects have different requirements and format standards for commit messages, and there are also differences in the requirements for commit messages between open-source projects and company projects, generally requiring compliance with the conventions of the project. For more mature open-source projects, you can generally find how to contribute in the README document, or there is a separate CONTRIBUTING.md document that defines code style, submission methods, etc.

Automatic verification of commit message

With commit message specifications, how can we ensure that developers comply with them? We can use the Git Hooks function provided by Git to verify the submitted information. This article will not go into the details of Git Hooks, but only provide a basic explanation. For specific details, please refer to the official documentation or Atlassian documentation .

In a newly initialized Git project, we can find examples provided by the official in the .git/hooks folder:

ls -l .git/hooks
total 120
-rwxr-xr-x  1 tomo  staff   478B Nov 11 20:44 applypatch-msg.sample
-rwxr-xr-x  1 tomo  staff   896B Nov 11 20:44 commit-msg.sample
-rwxr-xr-x  1 tomo  staff   4.5K Nov 11 20:44 fsmonitor-watchman.sample
-rwxr-xr-x  1 tomo  staff   189B Nov 11 20:44 post-update.sample
-rwxr-xr-x  1 tomo  staff   424B Nov 11 20:44 pre-applypatch.sample
-rwxr-xr-x  1 tomo  staff   1.6K Nov 11 20:44 pre-commit.sample
-rwxr-xr-x  1 tomo  staff   416B Nov 11 20:44 pre-merge-commit.sample
-rwxr-xr-x  1 tomo  staff   1.3K Nov 11 20:44 pre-push.sample
-rwxr-xr-x  1 tomo  staff   4.8K Nov 11 20:44 pre-rebase.sample
-rwxr-xr-x  1 tomo  staff   544B Nov 11 20:44 pre-receive.sample
-rwxr-xr-x  1 tomo  staff   1.5K Nov 11 20:44 prepare-commit-msg.sample
-rwxr-xr-x  1 tomo  staff   2.7K Nov 11 20:44 push-to-checkout.sample
-rwxr-xr-x  1 tomo  staff   3.6K Nov 11 20:44 update.sample

The ones related to commits are the following four:

  • pre-commit - executed before Git generates the commit object
  • prepare-commit-msg - executed after pre-commit, used to generate the default commit message, the script receives three parameters:
    1. temporary file name containing the commit message
    2. commit type, such as message, template, merge, squash
    3. SHA1 of the relevant commit, only provided when there are -c, -C or --amend parameters
  • commit-msg - executed after the developer writes the commit message, only with the temporary file name as a parameter
  • post-commit - executed immediately after commit-msg, more for notification purposes

We can use prepare-commit-msg to explain the commit message specification and use commit-msg to check the execution of the specification. The script’s non-zero return will interrupt the current commit.

If we want to apply a simple format similar to Angular’s <header>, we can refer to the following implementation.

Here is an example of prepare-commit-msg:

#!/usr/bin/env python

import sys, os, re
from subprocess import check_output

# Collect the parameters
commit_msg_filepath = sys.argv[1]
if len(sys.argv) > 2:
    commit_type = sys.argv[2]
else:
    commit_type = ''
if len(sys.argv) > 3:
    commit_hash = sys.argv[3]
else:
    commit_hash = ''

print("prepare-commit-msg: File: %s\nType: %s\nHash: %s" % (commit_msg_filepath, commit_type, commit_hash))

msg_spec = '''# Please use follow format
# <type>(<scope>): <short summary>
#  │       │             │
#  │       │             └─⫸ Summary in present tense. Not capitalized. No period at the end.
#  │       │
#  │       └─⫸ Commit Scope: animations|bazel|benchpress|common|compiler|compiler-cli|core
#  │
#  └─⫸ Commit Type: build|ci|docs|feat|fix|perf|refactor|test'''

with open(commit_msg_filepath, 'r+') as f:
    f.write("\n" + msg_spec)

sys.exit(0)  # return non-zero will abort current commit

Here is a simple example of commit-msg:

#!/usr/bin/env python

import sys, os, re
# Collect the parameters
commit_msg_filepath = sys.argv[1]
print("commit-msg: File: %s" % commit_msg_filepath)

header_pattern = re.compile(r'^(?P<type>\w+)(\((?P<scope>\w+)\))?: .+$')
commit_types = 'build|ci|docs|feat|fix|perf|refactor|test'.split('|')
commit_scopes = 'animations|bazel|benchpress|common|compiler|compiler-cli|core'.split('|')

with open(commit_msg_filepath, 'r') as f:
    commit_msg_header = f.readline().rstrip('\n')  # header line
    print('<header>: %s' % commit_msg_header)
    match = header_pattern.match(commit_msg_header)
    if not match:
        print('commit message does not meet spec')
        sys.exit(1)
    commit_type = match.group('type')
    commit_scope = match.group('scope')
    if commit_type not in commit_types:
        print('invalid <type>')
        sys.exit(1)
    if commit_scope and commit_scope not in commit_scopes:  # scope is optional
        print('invalid <scope>')
        sys.exit(1)

sys.exit(0)

To use the relevant Git Hooks, you can create corresponding files in the directory .git/hooks. The file names are prepare-commit-msg and commit-msg, and give them executable permissions. This way, when we perform git commit operations, the corresponding scripts will be executed. The following figure is a schematic diagram of the relevant execution, where non-compliant submissions will be interrupted.

git hooks demo

The specific execution process is as follows ( online version ):

Git commits do not include the .git directory, so changes to the corresponding hooks will not be committed to the repository. We can create a .githooks folder in the root directory of the repository and put the code we have implemented into that directory, and then reference it by changing the configuration or using a soft link:

# use config
git config core.hooksPath .githooks
# OR use soft link
ln -sf .githooks/* .git/hooks

Of course, these are all client-side verifications. Developers can completely ignore such Git Hooks configurations and introduce non-compliant commits. In this case, we can use server-side verification for processing, or introduce some CI tools or use GitHub Action for verification.

Author & Committer

In Git, Author refers to the original author of the commit, and Committer refers to the person who applied the commit, such as the project administrator who merged the Pull Request. If you are an individual developer or only use a single Git platform service (such as GitHub, BitBucket, etc.), we generally do not need to make special configurations for the author. However, if you use multiple Git platforms or have internal company requirements, we may need to set different users and email addresses for different repositories. For example, you can set your personal GitHub account globally, and set your company email address for internal company repositories.

# Global default configuration
git config --global user.email "<github email>"
git config --global user.name "<github username>"
# Internal company repository
git config user.email "<enterprise email>"
git config user.name "<real name>"

Changed files

The core of all our commits are the files we commit. Different commits can involve more or fewer files, generally following the following principles:

  • Before committing, use git diff to view the changes in the files, use git add to add the files you want to include in the commit, use git status to view the status of the files, and finally use git commit to commit
  • Each commit should only contain relevant changes, for example, fixing two different bugs should use two separate commits
  • Frequent commits are encouraged, as this allows for faster sharing of implemented features and reduces the risk of code loss
  • Do not submit unfinished products on the main branch or collaborative feature branch. Testing is required before submission
  • Do not include compilation output, logs, intermediate products, etc. in the commit. Use .gitignore to exclude relevant files. Different languages or operating systems have some common exclusion configurations, refer to github/gitignore
  • Do not commit passwords, authorization credentials, keys, etc. For example, AWS certificate.csv files or content, GCP Service Account files, etc., if leaked to a public repository, it will lead to resources being used by malicious people, causing losses. At the same time, due to the nature of Git, it will be difficult to remove such files from historical commits, refer to GitHub official documentation and description
  • For configuration files (such as database connection information), generally use configuration templates, maintain local files individually, and configure this file in .gitignore. Or use git update-index --[no-]assume-unchanged <file> to ignore changes to certain files
  • Other commonly used commands (please use them after understanding their meanings clearly)
    • git reset <file> - remove added files (before committing), for other uses of the reset command, please refer to the help document
    • git clean -f - remove a large number of untracked intermediate files
    • git checkout <file> - revert changes to a file (before committing)

Hash & Parent

In general, we don’t need to pay extra attention to commit hash and parent node information, but in certain scenarios, we may need to fix or otherwise process commit. In such scenarios, we need to understand the entire Git commit chain, the parent node corresponding to each commit, the common ancestor between branches, and the differences between local and remote, especially when it involves rebase related operations. At the same time, we need to follow the workflow model used by the project in the entire submission process and use the operations recommended in the corresponding workflow model (for common workflow models, refer to the Atlassian document ).

Here are some scenarios involved in the actual development process:

  • In your own development branch, a feature involves multiple commits. Before officially merging into the main branch, you can use the git rebase -i <commit> command to merge, discard, and modify commit messages. Please note that if the commit has been published to the remote, you need to use git push -f to overwrite it (only for personal development branches). Here is a simple example and relevant command description, common commands are pick, reword, fixup, drop, etc.
$ git rebase -i 8717c71fc
reword 27e67629b feat: some feature first commit
fixup 7a3f0cd25 feat: some feature second commit
fixup d9a9d7f04 feat: some feature third commit

# Rebase 8717c71fc..d9a9d7f04 onto 8717c71fc (3 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
#                    commit's log message, unless -C is used, in which case
#                    keep only this commit's message; -c is same as -C but
#                    opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified); use -c <commit> to reword the commit message
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
  • In some Git workflow models, use git pull --rebase to update local commits
  • In principle, it is forbidden to perform git push -f operations on the main branch, etc. If you need to roll back, use git revert <commit>
  • For multi-branch code synchronization, you can use the git cherry-pick command