Foreword
Git is currently an essential skill for programmers, and can be used to manage code, documents, blogs, and even recipes. Commits to personal private repositories can be relatively casual, but in team development, corresponding specifications still need to be followed. This article organizes some practices related to commits in Git usage for your reference.
As shown in the figure above (taken from Angular commit 970a3b5 ), a commit contains the following information:
- commit message - Description related to the content of the commit
- author & committer - Author and committer
- changed files - Modified files
- hash & parent - Hash of the commit content and its position in the commit tree
Commit Message
The commit message describes the functional information related to the current commit. It can generally include header
, body
, footer
:
<header>
<BLANK LINE>
<body>
<BLANK LINE>
<footer>
For industry best practices, refer to Angular’s commit standards: Commit Message Format
Among them, <header>
is mandatory. The format recommended by Angular is as follows:
<type>(<scope>): <short summary>
│ │ │
│ │ └─⫸ Summary in present tense. Not capitalized. No period at the end.
│ │
│ └─⫸ Commit Scope: animations|bazel|benchpress|common|compiler|compiler-cli|core...
│
└─⫸ Commit Type: build|ci|docs|feat|fix|perf|refactor|test
In <header>
, <type>
and <summary>
are mandatory, and <scope>
is optional. It is recommended to keep <header>
within 50 characters.
<type>
indicates the type of this commit, generally including the following:
build
: changes related to the buildci
: changes related to continuous integrationdocs
: documentationfeat
: new featuresfix
: bug fixesperf
: performance-related changesrefactor
: refactoring related (not bugs, not new features)test
: testing related, including adding tests or changing existing tests
<scope>
indicates the scope of the change. In Angular, a commit may involve scopes such as form handling, animation handling, etc. In actual work, it can be determined according to the project.
<summary>
is a brief description of the commit, using imperative mood and present tense. For example, use change
instead of changed
or changes
.
<body>
is a more detailed description of the commit message, also using imperative mood and present tense like <header>
. <body>
describes the motivation for the change, such as why the change was introduced, what the previous logic was, what the current logic is, and what impact the change has.
Finally, <footer>
is optional, and generally involves descriptions of breaking changes, feature deprecation, and references to GitHub issue
or Jira ticket
, PR references, etc.
Standard commit messages can be parsed by tools to automatically generate documentation or release notes. In some large open-source projects, it is very time-consuming and labor-intensive to manually organize version update documents, interface updates and compatibility impacts. Using a unified specification can greatly automate this part of the work. Of course, different projects have different requirements and format standards for commit messages, and there are also differences in the requirements for commit messages between open-source projects and company projects, generally requiring compliance with the conventions of the project. For more mature open-source projects, you can generally find how to contribute in the README
document, or there is a separate CONTRIBUTING.md
document that defines code style, submission methods, etc.
Automatic verification of commit message
With commit message specifications, how can we ensure that developers comply with them? We can use the Git Hooks
function provided by Git to verify the submitted information. This article will not go into the details of Git Hooks
, but only provide a basic explanation. For specific details, please refer to the official documentation
or Atlassian documentation
.
In a newly initialized Git project, we can find examples provided by the official in the .git/hooks
folder:
ls -l .git/hooks
total 120
-rwxr-xr-x 1 tomo staff 478B Nov 11 20:44 applypatch-msg.sample
-rwxr-xr-x 1 tomo staff 896B Nov 11 20:44 commit-msg.sample
-rwxr-xr-x 1 tomo staff 4.5K Nov 11 20:44 fsmonitor-watchman.sample
-rwxr-xr-x 1 tomo staff 189B Nov 11 20:44 post-update.sample
-rwxr-xr-x 1 tomo staff 424B Nov 11 20:44 pre-applypatch.sample
-rwxr-xr-x 1 tomo staff 1.6K Nov 11 20:44 pre-commit.sample
-rwxr-xr-x 1 tomo staff 416B Nov 11 20:44 pre-merge-commit.sample
-rwxr-xr-x 1 tomo staff 1.3K Nov 11 20:44 pre-push.sample
-rwxr-xr-x 1 tomo staff 4.8K Nov 11 20:44 pre-rebase.sample
-rwxr-xr-x 1 tomo staff 544B Nov 11 20:44 pre-receive.sample
-rwxr-xr-x 1 tomo staff 1.5K Nov 11 20:44 prepare-commit-msg.sample
-rwxr-xr-x 1 tomo staff 2.7K Nov 11 20:44 push-to-checkout.sample
-rwxr-xr-x 1 tomo staff 3.6K Nov 11 20:44 update.sample
The ones related to commits are the following four:
pre-commit
- executed before Git generates thecommit
objectprepare-commit-msg
- executed afterpre-commit
, used to generate the default commit message, the script receives three parameters:- temporary file name containing the commit message
- commit type, such as
message
,template
,merge
,squash
- SHA1 of the relevant commit, only provided when there are
-c
,-C
or--amend
parameters
commit-msg
- executed after the developer writes the commit message, only with the temporary file name as a parameterpost-commit
- executed immediately aftercommit-msg
, more for notification purposes
We can use prepare-commit-msg
to explain the commit message specification and use commit-msg
to check the execution of the specification. The script’s non-zero return will interrupt the current commit.
If we want to apply a simple format similar to Angular’s <header>
, we can refer to the following implementation.
Here is an example of prepare-commit-msg
:
#!/usr/bin/env python
import sys, os, re
from subprocess import check_output
# Collect the parameters
commit_msg_filepath = sys.argv[1]
if len(sys.argv) > 2:
commit_type = sys.argv[2]
else:
commit_type = ''
if len(sys.argv) > 3:
commit_hash = sys.argv[3]
else:
commit_hash = ''
print("prepare-commit-msg: File: %s\nType: %s\nHash: %s" % (commit_msg_filepath, commit_type, commit_hash))
msg_spec = '''# Please use follow format
# <type>(<scope>): <short summary>
# │ │ │
# │ │ └─⫸ Summary in present tense. Not capitalized. No period at the end.
# │ │
# │ └─⫸ Commit Scope: animations|bazel|benchpress|common|compiler|compiler-cli|core
# │
# └─⫸ Commit Type: build|ci|docs|feat|fix|perf|refactor|test'''
with open(commit_msg_filepath, 'r+') as f:
f.write("\n" + msg_spec)
sys.exit(0) # return non-zero will abort current commit
Here is a simple example of commit-msg
:
#!/usr/bin/env python
import sys, os, re
# Collect the parameters
commit_msg_filepath = sys.argv[1]
print("commit-msg: File: %s" % commit_msg_filepath)
header_pattern = re.compile(r'^(?P<type>\w+)(\((?P<scope>\w+)\))?: .+$')
commit_types = 'build|ci|docs|feat|fix|perf|refactor|test'.split('|')
commit_scopes = 'animations|bazel|benchpress|common|compiler|compiler-cli|core'.split('|')
with open(commit_msg_filepath, 'r') as f:
commit_msg_header = f.readline().rstrip('\n') # header line
print('<header>: %s' % commit_msg_header)
match = header_pattern.match(commit_msg_header)
if not match:
print('commit message does not meet spec')
sys.exit(1)
commit_type = match.group('type')
commit_scope = match.group('scope')
if commit_type not in commit_types:
print('invalid <type>')
sys.exit(1)
if commit_scope and commit_scope not in commit_scopes: # scope is optional
print('invalid <scope>')
sys.exit(1)
sys.exit(0)
To use the relevant Git Hooks
, you can create corresponding files in the directory .git/hooks
. The file names are prepare-commit-msg
and commit-msg
, and give them executable permissions. This way, when we perform git commit
operations, the corresponding scripts will be executed. The following figure is a schematic diagram of the relevant execution, where non-compliant submissions will be interrupted.
The specific execution process is as follows ( online version ):
Git commits do not include the .git
directory, so changes to the corresponding hooks
will not be committed to the repository. We can create a .githooks
folder in the root directory of the repository and put the code we have implemented into that directory, and then reference it by changing the configuration or using a soft link:
# use config
git config core.hooksPath .githooks
# OR use soft link
ln -sf .githooks/* .git/hooks
Of course, these are all client-side verifications. Developers can completely ignore such Git Hooks
configurations and introduce non-compliant commits. In this case, we can use server-side verification for processing, or introduce some CI tools or use GitHub Action for verification.
Author & Committer
In Git, Author refers to the original author of the commit, and Committer refers to the person who applied the commit, such as the project administrator who merged the Pull Request
. If you are an individual developer or only use a single Git platform service (such as GitHub, BitBucket, etc.), we generally do not need to make special configurations for the author. However, if you use multiple Git platforms or have internal company requirements, we may need to set different users and email addresses for different repositories. For example, you can set your personal GitHub account globally, and set your company email address for internal company repositories.
# Global default configuration
git config --global user.email "<github email>"
git config --global user.name "<github username>"
# Internal company repository
git config user.email "<enterprise email>"
git config user.name "<real name>"
Changed files
The core of all our commits are the files we commit. Different commits can involve more or fewer files, generally following the following principles:
- Before committing, use
git diff
to view the changes in the files, usegit add
to add the files you want to include in the commit, usegit status
to view the status of the files, and finally usegit commit
to commit - Each commit should only contain relevant changes, for example, fixing two different bugs should use two separate commits
- Frequent commits are encouraged, as this allows for faster sharing of implemented features and reduces the risk of code loss
- Do not submit unfinished products on the main branch or collaborative feature branch. Testing is required before submission
- Do not include compilation output, logs, intermediate products, etc. in the commit. Use
.gitignore
to exclude relevant files. Different languages or operating systems have some common exclusion configurations, refer to github/gitignore - Do not commit passwords, authorization credentials, keys, etc. For example, AWS certificate.csv files or content, GCP Service Account files, etc., if leaked to a public repository, it will lead to resources being used by malicious people, causing losses. At the same time, due to the nature of Git, it will be difficult to remove such files from historical commits, refer to GitHub official documentation and description
- For configuration files (such as database connection information), generally use configuration templates, maintain local files individually, and configure this file in
.gitignore
. Or usegit update-index --[no-]assume-unchanged <file>
to ignore changes to certain files - Other commonly used commands (please use them after understanding their meanings clearly)
git reset <file>
- remove added files (before committing), for other uses of thereset
command, please refer to the help documentgit clean -f
- remove a large number of untracked intermediate filesgit checkout <file>
- revert changes to a file (before committing)
Hash & Parent
In general, we don’t need to pay extra attention to commit hash
and parent node information, but in certain scenarios, we may need to fix or otherwise process commit
. In such scenarios, we need to understand the entire Git commit chain, the parent node corresponding to each commit, the common ancestor between branches, and the differences between local and remote, especially when it involves rebase
related operations. At the same time, we need to follow the workflow model used by the project in the entire submission process and use the operations recommended in the corresponding workflow model (for common workflow models, refer to the Atlassian document
).
Here are some scenarios involved in the actual development process:
- In your own development branch, a feature involves multiple commits. Before officially merging into the main branch, you can use the
git rebase -i <commit>
command to merge, discard, and modify commit messages. Please note that if the commit has been published to the remote, you need to usegit push -f
to overwrite it (only for personal development branches). Here is a simple example and relevant command description, common commands arepick
,reword
,fixup
,drop
, etc.
$ git rebase -i 8717c71fc
reword 27e67629b feat: some feature first commit
fixup 7a3f0cd25 feat: some feature second commit
fixup d9a9d7f04 feat: some feature third commit
# Rebase 8717c71fc..d9a9d7f04 onto 8717c71fc (3 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
# commit's log message, unless -C is used, in which case
# keep only this commit's message; -c is same as -C but
# opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# . create a merge commit using the original merge commit's
# . message (or the oneline, if no original merge commit was
# . specified); use -c <commit> to reword the commit message
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
- In some Git workflow models, use
git pull --rebase
to update local commits - In principle, it is forbidden to perform
git push -f
operations on the main branch, etc. If you need to roll back, usegit revert <commit>
- For multi-branch code synchronization, you can use the
git cherry-pick
command