Semantically significant apis: a case study with sqlc
I’ve been developing an URL shortener in my spare time: shortn.cc. It works as expected, though the frontend UI definitely needs work (hey, backend engineer for most of my career! Cut me some slack).
It is built using a Go stack with sqlite. As I’ve discussed before, I’m not a fan of ORMs and instead use a named query library called sqlc to create queries and generate a storage API for my service. It’s been working quite well and it’s a relief to be able to write SQL queries that get typed checked at API generation time instead of at runtime.
shortn.cc
tracks clicks on its URLs without having to track users across the internet. This means I know nothing about
the user clicking the URL except what I can get from the IP (location data) and the HTTP headers they sent me. This
gives me less information and potentially false ones, if the caller is manually setting HTTP headers or using a VPN.
But, I think it’s a fine trade-off that preserves user anonymity as much as possible while performing the URL redirect.
sqlc
allows me to generate an API for my storage so that I can call queries.GetFullURL(shortURL)
in my service code.
But, the common storage API generated by sqlc
has a sparse interface for creating transactions and after a year of
experimenting, I evolved an API that seems obvious in retrospect but took me using the code in multiple projects, fixing
bugs from the original designs and settling on trade-offs that made sense for me, resulting in a semantically
significant API for transactions.
Let me take youi through my journey as we understand the sqlc
generated API, what makes it cumbersome for transactions and
the issues I had with each attempt that finally led me to the current API.
Generated code
sqlc
works by setting up a yaml config with:
- A reference to a schema file or migration files directory.
- A reference to a queries file.
- A setup for package names.
For our example, here is the sqlc.yaml
file I use in most of my projects:
version: 2
sql:
- engine: "sqlite"
schema: "storage/datastore/migrations"
queries: "storage/datastore/queries.sql"
gen:
go:
package: "datastore"
out: "storage/datastore"
In my projects I have a storage/datastore
package, which is where the SQL generated code will be placed. It also has a
queries.sql
file which is where I write all my datastore queries, and a migrations directory where I store my SQL
migration (ddl) files. When you call sqlc generate
, it goes through the files in the migration directory in file name
order, parses them, then parses the queries.sql
file and if the queries are correct (it checks for table and columns
names), it generates 3 files: db.go
, models.go
and queries.sql.go
.
queries.sql.go
has the API generated from queries.sql
and defines a Queries
struct with one method per query. models.go
has the Go structs that map to the tables defined in the SQL migration files. And db.go
is the db interface that will be
the same for all database drivers that are compatible with database/sql1.
Looking at the generated code for db.go:
package datastore
import (
"context"
"database/sql"
)
type DBTX interface {
ExecContext(context.Context, string, ...any) (sql.Result, error)
PrepareContext(context.Context, string) (*sql.Stmt, error)
QueryContext(context.Context, string, ...any) (*sql.Rows, error)
QueryRowContext(context.Context, string, ...any) *sql.Row
}
func New(db DBTX) *Queries {
return &Queries{db: db}
}
type Queries struct {
db DBTX
}
func (q *Queries) WithTx(tx *sql.Tx) *Queries {
return &Queries{
db: tx,
}
}
It has a New constructor that accepts a DBTX
interface and returns a *Queries
instance. The methods on the DBTX
interface are the same ones available on an *sql.DB
type, so you can open the database, create the *datastore.Queries
type and use it instead of *sql.DB
directly:
db, err := sql.Open(“sqlite”, “/tmp/datastore.db”)
if err != nil {
// Do something
}
defer db.Close()
queries := datastore.New(db)
svc := service.New(queries)
...
As you can see my service depends on *datastore.Queries
, not on *sql.DB
. Assuming we have Queries.GetFullURL
(returns
the full url for the short one), Queries.StoreShortURL
(stores a short URL in the database) and Queries.UseQuota
(uses
one URL quota if user still has any), we could have two service methods CreateShortURL
and ResolveShortURL
that uses the
storage API as such:
type Service struct {
queries *datastore.Queries
}
func (s *Service) ResolveShortURL(ctx context.Context, short string) (string, error) {
full, err := s.queries.GetFullURL(ctx, short)
if err != nil {
return “”, err
}
return full, nil
}
func (s *Service) CreateShortURL(ctx context.Context, full string) (string, error) {
if err := s.queries.UseQuota(ctx); err != nil {
return “”, fmt.Errorf(“quota exhausted: %w”, err)
}
short := s.generateShort()
err := s.queries.StoreShortURL(ctx, datastore.StoreShortURLParams{
Short: short,
Full: full,
})
if err != nil {
return “”, fmt.Errorf(“error storing short url: %w”, err)
}
return short, nil
}
Pretty straight forward. ResolveShortURL
just calls Queries.GetFullURL
and returns the full IRL if it exists.
CreateShortURL
tries to consume URL quota and if it succeeds, creates a short URL and calls Queries.StoreShortURL
to
store it in the database.
Most of you can see the issue here, right?
Where is my transaction?
While ResolveShortURL
is fine, CreateShortURL
has a big problem: its 2 database calls are not in the same transaction.
So if UseQuota
succeeds but StoreShortURL
doesn’t, we would have consumed quota without creating the URL and the user
would be very sad.
The solution, of course, is to call UseQuota
and StoreShortURL
in a transaction. That’s what the Queries.WithTx
method
is for: you pass it an *sql.Tx
and it returns a new *Queries
struct that is bound to the transaction. Except… where can
I get the *sql.Tx
from?
It should come from the *sql.DB.BeginTx
call. But, for that, I would need a reference to the *sql.DB
used by *Queries
because it does not expose the *sql.DB
for this purpose.
Moreover, the control of the transaction (calling Commit
/Rollback
when appropriate) is also external to calling the
*Queries
methods. Fundamentally, sqlc
gives you named SQL queries only. You are still in charge of managing the db
calls.
Towards a better API
The design decision from sqlc
makes sense: all it’s doing is wrapping SQL queries with methods, so
handling db transactions should still be in control of the caller.
Adding any transaction management to the library would bring it closer to being an ORM and I can appreciate the tradeoff, but it makes for a bad abstraction. From the perspective of my Go code, I’m not dealing with SQL queries, only with API calls. Having to handle transactions manually seems backwards.
A good abstraction would steer you away from trying to solve a common problem with semantically significant APIs. Being
able to express a transaction given you have a *Queries
type would make the code much easier to understand and I
experimented with multiple APIs over the course of a year until I landed on one that works well.
Attempt #1: keep a reference to the *sql.DB
Since I need the reference to *sql.DB
anyways, I just kept it side-by-side with the *Queries
instance. I can now rewrite
CreateShortURL
to use a transaction:
type Service struct {
queries *datastore.Queries
db *sql.DB
}
// Nothing changes here, no transaction needed.
func (s *Service) ResolveShortURL(ctx context.Context, short string) (string, error) {
full, err := s.queries.GetFullURL(ctx, short)
if err != nil {
return “”, err
}
return full, nil
}
func (s *Service) CreateShortURL(ctx context.Context, full string) (string, error) {
// Create the transaction.
tx, err := s.db.BeginTx(ctx, nil)
if err != nil {
return “”, err
}
// If tx.Commit is called successfully, Rollback will have no effect.
defer tx.Rollback()
queries := s.queries.WithTx(tx)
if err := queries.UseQuota(ctx); err != nil {
return “”, fmt.Errorf(“quota exhausted: %w”, err)
}
short := s.generateShort()
err = queries.StoreShortURL(ctx, datastore.StoreShortURLParams{
Short: short,
Full: full,
})
if err != nil {
return “”, fmt.Errorf(“error storing short url: %w”, err)
}
// Commit and return the short url.
return short, tx.Commit()
}
This now works as expected. If UseQuota
doesn’t fail but StoreShortURL
does, the quota will not have been consumed.
Notice the code looks pretty much the same, except that I’m creating and managing the transaction.
If you have a single or maybe two methods you want to run in a transaction, this is fine. But it gets old soon and at some point you want to hide the transaction mechanism because, while it is important to know when you are running in a transaction (semantically significant), the mechanics are a distraction.
Attempt #2: datastore.DB
with Transaction method.
Let’s make the transaction more obvious. If instead of keeping an explicit reference to *sql.DB
, we create a new type
datastore.DB
that exposes a Transaction
method, we can have it deal with the mechanics and use a function parameter that
gets called with the properly setup transaction:
package datastore
type DB struct {
// Embed *Queries, so all their methods are available in DB.
*Queries
rdbms *sql.DB
}
func (db *DB) Transaction(ctx context.Context, f func(*DB) error) (tErr error) {
defer func() {
// Rollback the transaction in case there is an error.
if tErr != nil {
rbErr := tx.Rollback()
if rbErr != nil {
tErr = errors.Join(tErr, rbErr)
}
}
}()
// Create the transaction
tx, err := db.rdbms.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("error creating transaction: %w", err)
}
// Create the new DB instance that uses the transaction.
txdb := DB{
rdbms: db.rdbms,
Queries: db.Queries.WithTx(tx),
}
// Call the function parameter with it and handle the transaction.
if err := f(txdb); err != nil {
return fmt.Errorf("transaction error: %w", err)
}
return tx.Commit()
}
Now, reimplementing the service, we get:
type Service struct {
queries datastore.DB
}
// Nothing changes here, no transaction needed.
func (s *Service) ResolveShortURL(ctx context.Context, short string) (string, error) {
full, err := s.queries.GetFullURL(ctx, short)
if err != nil {
return “”, err
}
return full, nil
}
func (s *Service) CreateShortURL(ctx context.Context, full string) (string, error) {
var short string
return short, s.queries.Transaction(ctx, func(queries datastore.DB) error {
if err := queries.UseQuota(ctx); err != nil {
return fmt.Errorf(“quota exhausted: %w”, err)
}
// Make sure I don’t shadow the short variable.
short = s.generateShort()
// Hmm…
err := s.queries.StoreShortURL(ctx, datastore.StoreShortURLParams{
Short: short,
Full: full,
})
if err != nil {
return “”, fmt.Errorf(“error storing short url: %w”, err)
}
return nil
})
}
This is much better now. The Transaction
method with the closure defined within the call makes it obvious what is
happening and other than wrapping the code within the closure, it looks exactly the same as before minus the *sql.Tx
mechanics.
But, both attempts have a problem: there is nothing in the APIs that prevent me from using the original s.queries
instance that is not bound to the transaction. In fact, one of the first bugs I had (and I left it here in the example)
was that StoreShortURL
was using s.queries
instead of queries. The code semantics is incorrect but there is nothing
syntactically incorrect, so it compiled and worked, until it didn’t.
The design of both attempts had as a goal not changing code that didn’t need the transaction, so the *Queries
instance
should be available for that purpose. The trade-off was, unless you have good testing for concurrent access, you will
inevitably mix them and have unwanted behavior. You could also argue that naming the transaction-bound instance
differently would have made it easier to spot the problem in code reviews, but a better design would make it impossible
to happen in the first place.
Attempt #3: transactions all the way down.
This version make *Queries
an unexported field instead of embedding it in datastore.DB
and the function parameter
expects *Queries
instead datastore.DB
as the function argument:
type DB struct {
queries *Queries
rdbms *sql.DB
}
func (db *DB) Transaction(ctx context.Context, f func(*Queries) error) error {
defer func() {
// Rollback the transaction in case there is an error.
if tErr != nil {
rbErr := tx.Rollback()
if rbErr != nil {
tErr = errors.Join(tErr, rbErr)
}
}
}()
tx, err := db.rdbms.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("error creating transaction: %w", err)
}
queries := db.queries.WithTx(tx)
if err := f(queries); err != nil {
return fmt.Errorf("transaction error: %w", err)
return err
}
return tx.Commit()
}
With this, we now require callers to always call Transaction
and the only way to use *Queries
methods is through the
function parameter, making it impossible to use incorrectly:
type Service struct {
db datastore.DB
}
// Rewrite to use the Transaction API.
func (s *Service) ResolveShortURL(ctx context.Context, short string) (string, error) {
var full string
return full, s.db.Transaction(ctx, func(queries *datastore.Queries) error {
var err error
full, err = queries.GetFullURL(ctx, short)
return err
})
}
func (s *Service) CreateShortURL(ctx context.Context, full string) (string, error) {
var short string
return short, s.db.Transaction(ctx, func(queries *datastore.Queries) error {
if err := queries.UseQuota(ctx); err != nil {
return fmt.Errorf(“quota exhausted: %w”, err)
}
// Make sure I don’t shadow the short variable.
short = s.generateShort()
// Calling s.db.queries would have resulted in a compiler error.
err := queries.StoreShortURL(ctx, datastore.StoreShortURLParams{
Short: short,
Full: full,
})
if err != nil {
return “”, fmt.Errorf(“error storing short url: %w”, err)
}
return nil
})
}
We now have a semantically significant API that prevents us from having unwanted transaction behaviors. The trade-off is that every call that involves the storage API must be wrapped in a function parameter.
Appendix: sqlite and repo example
I created a repo with a storage API. It is actually an echo golang service template that I use for all of my projects. It uses sqlc and SQLite and out of the box you get user authentication. You need to define a .env file with the correct environment flags (see config/config.go). You can install air which will run it in and live reload if you make changes.. Or, just call run.sh and it will start the service.
It has, however, a different implementation of the api from attempt #3: it uses generics so that I can use datatstore.DB
with different *Queries
types and instead of Transaction
it has Read
and Write
methods. The reason for that would
be a post of its own but in short, sqlite only allows a single writer at a time and by creating separate connections for the
readers and writers, I avoid dealing with user-level errors that come from having concurrent writes. Otherwise, the idea
is exactly the same.