Back to Projects

Go-Git - Version Control System from Scratch

GO

Go-Git - Version Control System from Scratch

Go-Git is a Git implementation built from first principles in Go to understand how distributed version control actually works. No libraries for Git operations - just hash-based storage, tree structures, and commit graphs.

View on GitHub →

Why I Built This

Most developers use Git daily without understanding how it works internally. I built this to learn version control from the ground up - content-addressable storage, tree building algorithms, and commit graph traversal.

Core Features

  • Content-Addressable Storage: Files stored by SHA-256 hash with automatic deduplication
  • Staging Area: Git's three-tree architecture with index file
  • Tree Objects: Nested directory structures as tree objects
  • Commit History: Full commit chain with parent references
  • Compression: zlib compression for all objects (blobs, trees, commits)
  • Config Management: User name/email stored in .git/config

Commands in Action

Initialize Repository

go-git init

Creates the .git directory structure with objects, refs, and HEAD pointer.

go-git init

What happens: Sets up .git/objects/ for content storage, .git/refs/heads/ for branch pointers, and HEAD pointing to main branch.

Configure User Identity

go-git config

Stores your name and email in .git/config for commit authorship.

go-git config

What happens: Prompts for user name and email, then writes them to .git/config in INI format. Every commit will include this information in the author field.

View Commit History

go-git log

Displays commit history by traversing parent references.

go-git log

What happens: Reads commit hash from HEADrefs/heads/main, loads commit object, displays info, follows parent pointer, repeats until no parent exists.

Get Help

go-git help

Shows available commands and usage information.

go-git help

What happens: Displays CLI help menu with all available commands and their descriptions.

Complete Command Reference

go-git init                    # Initialize repository
go-git config                  # Set user name and email
go-git add <files>             # Stage files (supports directories)
go-git commit -m "message"     # Create commit
go-git log                     # View commit history

How It Works

Content-Addressable Storage

Every object is stored by its SHA-256 hash at .git/objects/ab/c123... where the first 2 characters are the directory and the rest is the filename.

Same content = same hash = automatic deduplication. Store the same file 100 times, uses disk space once.

Tree Building (The Hard Part)

Files are stored as blobs. Directories are stored as trees.

Trees must be built bottom-up (deepest first) because parent trees need their children's hashes. This was the hardest part to implement - understanding why trees reference other trees by hash, not by content.

Commits

A commit is a pointer to a tree plus metadata (author, timestamp, message, parent). Commits form a directed acyclic graph where each commit references its parent, creating the history chain.

Technical Implementation

  • Object Format: <type> <size>\0<content>
  • Tree Format: <mode> <filename>\0<binary-hash> (mode: 100644 for files, 040000 for dirs)
  • Hash Storage: Binary bytes in tree objects, not hex strings (critical detail that took hours to debug). Note: .git/refs/heads/main stores hash as plain text hex string for simplicity
  • Compression: zlib for all objects
  • Branch Pointers: .git/refs/heads/main contains current commit hash

Performance

  • Deduplication: Identical files stored once
  • Compression: ~60-70% size reduction with zlib
  • Scalability: Linear time for add/commit operations

Not optimized for speed (built for learning) but functional for small-to-medium repos.

What I Learned

  • Content-addressable storage is elegant: hash = address, automatic deduplication
  • Trees are graphs, not nested structures: they reference by hash, not content
  • Building bottom-up is necessary: can't hash parent without child hashes
  • Binary formats are tricky: working with null bytes and binary hash data
  • Compression matters: without zlib, .git/objects would be 3-4x larger

Project Structure

go-git/
├── cmd/cmd.go              # CLI with Cobra
├── internals/
│   ├── init.go             # Repository initialization
│   ├── config.go           # User configuration
│   ├── hash.go             # SHA-256 + zlib compression
│   ├── log.go              # Commit history traversal
│   ├── index/index.go      # Staging area management
│   └── objects/
│       ├── blobs.go        # File content hashing
│       ├── trees.go        # Directory tree building
│       └── commit.go       # Commit object creation
├── install.sh              # Installation script
└── main.go                 # Entry point

Installation

git clone https://github.com/codetesla51/go-git.git
cd go-git
./install.sh

# Or build manually:
go build -buildvcs=false -o go-git
ln -s $(pwd)/go-git ~/.local/bin/go-git

Limitations

This is a learning project, not production software:

  • No branches (only main branch)
  • No merge operations
  • No diff functionality
  • No status command
  • No remote operations (push/pull/fetch)
  • No .gitignore support
  • No packed objects (each object is separate file)
  • No index optimization (linear scan)

Why these limitations exist: This project focuses on Git's core - the object model, staging, and commits. Adding branches/merging/remotes would be another 2-3x the code and shift focus from fundamentals to features.

Why This Matters

Building Git from scratch reveals:

  • Why commits are cheap (just pointers to trees)
  • How deduplication works (content-addressable storage)
  • Why branching is fast (just moving a pointer)
  • What "detached HEAD" actually means
  • How merge conflicts arise (competing tree references)

The best way to understand a tool is to build it yourself. This project taught me more about Git in a week than years of using it did.

Built With

  • Go 1.25 - Core language
  • Cobra - CLI framework
  • fatih/color - Terminal colors
  • Standard library only - All Git logic hand-written

No Git libraries used. Everything from hashing to object storage is custom implementation.

View Source Code on GitHub →

Built by Uthman | @codetesla51

Learning project focused on understanding version control internals