Git Internals

Understanding how Git works under the hood

Learning Objectives

  • Understand Git's object model and how it stores data
  • Learn about refs and how Git uses them to track branches and tags
  • Master the concept of packfiles and their role in Git storage
  • Understand Git's garbage collection process

Git's Object Model

At its core, Git is a content-addressable filesystem with a VCS user interface built on top. Understanding how Git stores and manages data helps you become a more effective Git user.

Core Objects

  • Blobs:Store file contents
  • Trees:Store directories and filenames
  • Commits:Store metadata and project snapshots
  • Tags:Store references to specific points

Object Properties

  • Content-addressed using SHA-1 hash
  • Immutable once created
  • Compressed for storage
  • Linked together in a DAG structure

Examining Git Objects

Git provides several low-level commands to examine its internal objects:

Inspecting Objects

# Show object type
$ git cat-file -t 1a410efbd13591db07496601ebc7a059dd55cfe9
# Show object content
$ git cat-file -p 1a410efbd13591db07496601ebc7a059dd55cfe9
# Show object size
$ git cat-file -s 1a410efbd13591db07496601ebc7a059dd55cfe9
# List objects in packfile
$ git verify-pack -v .git/objects/pack/pack-*.idx

Git References

References (refs) are pointers to commits. They're how Git keeps track of branches, tags, and other important points in your repository's history.

Types of Refs

  • HEAD:Points to current branch
  • Branches:Movable pointers to commits
  • Tags:Fixed pointers to specific commits
  • Remote refs:Track remote repository state

Working with Refs

# Show all refs
$ git show-ref
# Update ref
$ git update-ref refs/heads/main 1a410e
# Show where HEAD points
$ git symbolic-ref HEAD

Packfiles and Storage Optimization

Git uses packfiles to efficiently store objects and save space. Understanding how packfiles work helps you optimize repository performance.

Packfile Creation

# Create a packfile manually
$ git gc
# Force immediate packing
$ git gc --prune=now
# Pack a specific branch
$ git pack-objects --revs

Storage Optimizations

  • Delta compression between similar objects
  • Network transfer optimization
  • Automatic garbage collection

Garbage Collection

Git's garbage collection process helps maintain repository health by cleaning up unnecessary files and optimizing storage.

What GC Does

  • Removes unreachable objects
  • Compresses similar objects
  • Optimizes repository structure

GC Commands

# Run garbage collection
$ git gc
# Aggressive collection
$ git gc --aggressive
# Clean up unnecessary files
$ git prune
# Check what would be removed
$ git prune --dry-run

What's Next?

Now that you understand Git's internal workings, you're ready to learn about optimizing Git's performance. In the next lesson, you'll discover:

  • Large file handling and management
  • Git attributes and hooks for automation