Service growth: environment setup and storage // Work Dad Dev

Now that we established the service that we are going to create (an URL Shortener), let’s do the necessary environment setup for the implementation and create the first piece of code for the service.

Versioning the code

There are many alternatives to version and host your code, including self-hosting options. For this series, I’m choosing Github. It is a platform I’ve used in the past and currently use in my day job so it’s something I already know how to use.

Github has a lot of good documentation on creating the repository, which you can follow to create yourself. There are a couple of options I do set for my repos though:

Require a pull request before merging: I add this rule to the main branch. This means my workflow is to create a development branch, update the code there and once it’s stable, I create a pull request for main. main then is just a series of pull requests.
Allow squash merging and disallow merge commits: a common issue with using pull requests is that you get a commit for the branch plus a merge commit. Merge commits are useful because you get the full history of changes. But I prefer to have every pull request be a single commit so I disable them.
Automatically delete head branches: because I create a lot of branches in this workflow, once I’m done with it I no longer have a use for it. Deleting it automatically keeps the branch list clean.

Setting up the Go environment

Assuming you have downloaded and installed Go on your development machine and have cloned your newly created github repository, you can cd into the repository and type

go mod init github.com/avalonbits/shortener

This creates a Go Module for the project, creating a go.mod file with the module name.

Testing workflow

One last setup step before writing code. I want to make sure that every time I push code to the github repo, it is guaranteed to build and all tests are passing. To do that, I can create a github workflow that triggers on every push to a branch. The workflow is just a yaml file with that configures a linux environment with Go and all the dependencies needed to compile and run the tests for the code:

File: .github/workflows/repo_push.yaml

name: Go package

on: [push]
jobs:
  build:

    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Go
        uses: actions/setup-go@v3
        with:
          go-version: 1.19

       - name: Build
        run: go build -v ./...

      - name: Vet
        run: go vet ./...

      - name: Test
        run: go test -v ./...

This workflow creates a linux environment using ubuntu, installs go version 1.19 (current version as of the writing of this article), then runs three go commands: go build (compiles all go code), go vet (checks for suspicious constructs e.g Printf arguments that don’t match the types) and go test (runs all the tests).

If any of these steps fail, then the push request cannot be merged with main.

Show me the (storage) code!

Ok, enough setup! Let’s get coding.

The core functionality of an URL Shortener is to create a short name for a long url and then return that url given the short name. So we need somewhere to store this mapping between the name and the URL and some code to create and retrieve the mapping. As I said in the first article, I’m going to use SQLite for that.

Schema version 1

To store data in a sql database, we need to define tables and columns. Here is the schema that we need to store the mapping:

File: storage/schema.sql

CREATE TABLE IF NOT EXISTS ShortLong(
    Short TEXT NOT NULL,
    LongN TEXT NOT NULL
);

CREATE UNIQUE INDEX IF NOT EXISTS short_idx ON ShortLong(Short);

This schema has a table, ShortLong, which has two text columns and an unique index on the Short column of that table. Each row will therefore contain a pair of values (Short, LongN) and the unique index guarantees that we can’t have duplicates values in Short and that we can quickly retrieve the LongN value given the Short value.

Queries and sqlc

In order to read and write data, we need to create SQL queries and some Go code that can connect to the database and perform the queries. As I mentioned in the first article, an ORM is a typical solution for this, instead of writing your own code to handle it. But, instead of using an ORM, I’m going to use sqlc to generate what would be the manual Go code that I write.

To do that, I need the schema and two extra files. The first one is a query file that lists all the queries that I can perform:

File: storage/queries.sql

-- name: SetShort :exec
INSERT INTO ShortLong(short, longn) VALUES (?, ?);

-- name: GetLong :one
SELECT longn FROM ShortLong WHERE short = ?;

Notice that I wrote standard SQL for SQLite, no Go code at all. But there is a difference: each query has a comment that starts with –- name, followed by a name (SetShort / GetLong), followed by :exec or :one. These are called query annotations. We will see why they are important, shortly.

The second one is a sqlc.yaml config file that tells sqlc where the schema and query files are and where the code should be generated:

File: sqlc.yaml

version: 2
sql:
  - engine: "sqlite"
    schema: "storage/schema.sql"
    queries: "storage/queries.sql"
    gen:
      go:
        package: "storage"
        out: "storage"

Once I have these files (query, schema, sqlc.yaml), I can run sqlc generate and it will create a few Go files (db.go, models.go and queries.sql.go) in the storage directory/package.

The most important one is the queries.sql.go, which has the methods SetShort and GetLong that I defined in the queries sql file:

File: storage/queries.sql.go

// Code generated by sqlc. DO NOT EDIT.
// versions:
//   sqlc v1.16.0
// source: queries.sql

package storage

import (
	"context"
)

const getLong = `-- name: GetLong :one
SELECT longn FROM ShortLong WHERE short = ?
`

func (q *Queries) GetLong(ctx context.Context, short string) (string, error) {
	row := q.db.QueryRowContext(ctx, getLong, short)
	var longn string
	err := row.Scan(&longn)
	return longn, err
}

const setShort = `-- name: SetShort :exec
INSERT INTO ShortLong(short, longn) VALUES (?, ?)
`

type SetShortParams struct {
	Short string
	Longn string
}

func (q *Queries) SetShort(ctx context.Context, arg SetShortParams) error {
	_, err := q.db.ExecContext(ctx, setShort, arg.Short, arg.Longn)
	return err
}

The generated code is very straightforward and easy to read. For each query I added to the queries.sql file, it creates a method using the name in the -- name annotation. All methods expect a context.Context and parameters if the query itself uses parameters. The queries then use the QueryRowContext or ExecContext methods, depending on the :exec or :one annotation.

For GetLong, it expects a string for the short name since we are performing a select filtering by the short name. And since we used the :one annotation, we are telling sqlc that a single result should be returned, in this case the longn string column. And because this is Go, an error is always returned as well.

For SetShort, we actually need to pass two values, the short and long names. sqlc generate creates a struct, SetShortParams, that has both parameters and we need to fill them with the values we want. Because we used the :exec annotation, we are telling sqlc that no result should be returned and so the method only returns an error.

This looks simple; why generate it?

Indeed, for only these two methods I could have written this code myself. There is nothing particularly complicated about it. But, the thing is I don’t need to write this code. This is a boilerplate that needs to be created because we are translating code from one domain (database query language) to another (Go programming language). Every time I need to change or add a query I would need to update this mapping.

Having a tool to do it for me allows me to concentrate on each domain at a time and ensures that the Go code I write never knows about the SQL code. This means I create a very clear boundary between the database code and my service code. As the service grows and we need to add more tables and columns, update and create queries, this separation will make it much easier to update each domain.

Testing the database

One major advantage of using SQLite is that you can create an in-memory database. This makes it excellent for testing because I can create a database for every test, apply the schema, run the test and then it’s gone.

But how do I apply the schema?

We can use a nice facility from Go called embed. This allows us to embed files in our Go binary and then read them as strings or even as an embedded file system. For my purposes, I’m going to embed it as a string since the schema is a single file:

File: storage/storage.go

package storage

import (
	_ "embed"
)

//go:embed schema.sql
var Schema string

That’s all there is! Now whenever I compile the storage package, storage.Schema will contain the contents of the schema file. With that, I can now create an in memory database and apply the schema before returning it to the caller:

File: storage/storage_test.go

package storage_test

import (
	"context"
	"database/sql"
	"testing"

	"github.com/avalonbits/shortner/storage"

	_ "github.com/mattn/go-sqlite3"
)

// Skipping the test code
// ...


// Returns an in-memory database with the schema applied.
// In case of errors we panic because we wouldn't be able to run the
// test code anyway.
func testDB() *sql.DB {
	db, err := sql.Open("sqlite3", ":memory:")
	if err != nil {
		panic(err)

	}
	if _, err := db.Exec(storage.Schema); err != nil {
		panic(err)
	}
	return db
}

github.com/mattn/go-sqlite3 is a Go sqlite3 driver, which we import with the "_" name because we just want the side-effects of importing it. One of those side-effects is to register the driver as “sqlite3” which can then be referenced in the sql.Open call.

After opening the database and checking for errors, I then apply the schema using the db.Exec method. If we have no errors, the schema was applied correctly and we can return it to the caller to run the tests with it.

Test code

We can finally test our code. For that we will create table driven tests. We want to test both creation of short -> longn mappings and then retrieve existing mappings.

You can read the test code in the github repository but I want to point out a few things that became obvious with the tests:

We can create short and longn mappings using the empty string “”. We could use the SQLite CHECKi constraint to prevent that from happening, but it wouldn’t block a string composed only of white space e.g “ “. Instead, we can define this as a business rule on the upper layers of our service.
While we do prevent duplicate short names, we don’t do the same for long names. This means that multiple different short names can map to the same long name, wasting both storage space and short names. This might be an issue in the future but there are multiple ways to address this and more to the point, we don’t want to have business rules that are subject to change in the lower layers of our service.

That was long

I know… There was a lot of setup work and I’m assuming most of you don’t know about sqlc, so I wanted to take the time to explain how it works in some detail. Hopefully I was able to show how useful it was, even at this early stage with just a single table and a couple of queries.

All in all, we now have a working storage API that fits the needs of our URL Shortener service.

In the next post we will go through the service layer and define the business rules that implement our service.