Logo
Englika

I will never use an ORM again

I will never use an ORM again

At first, it seemed to me that ORM speeds up the development process by allowing you to work with the database as entities. Simple queries become faster to write, and complex ones can always be written manually (e.g. manager.query('SELECT * FROM users') in TypeORM). However, over time, problems which didn't seem relevant at the start, began to appear.

Some time ago I decided to use ORM (in particular TypeORM), because the following advantages seemed relevant, but after a while that proved incorrert.

Working with the database using entities

It may seem that designing a database using ORM is faster and easier. You create entities with references between them and work with them without even understanding which database schema was created and how each SQL query is generated.

After a while, it turns out that:

  • Many features of a particular database are not supported by ORM (for example, JOIN LATERAL in TypeORM). ORM supports many databases and its features are usually limited to those that exist in each of them.
  • If you cannot create a specific column in a table using ORM, then you will have to do it using migrations. At the same time, you should disable table synchronization so that the column created by migration is not deleted. From now on, any changes to this table will have to be made via migrations (even in the development environment).
  • SQL queries are not always generated as you expected and they start to run slowly as the data in the database grows. There is also a risk that the same query written using ORM will be one today and another tomorrow (after the update).
  • Some entity properties, by default, may use an incorrect type in the database (e.g. TypeORM stores Date as timestamp, not timestamptz). As a result, you double-check how each type is actually stored in the database or always specify it explicitly.

When you write SQL queries manually, there are no such problems, and queries are always executed exactly as they are written.

Creating indexes

Have you ever tried creating indexes using ORM? It's fun!

Usually, ORM supports creating only simple indexes. For example, I can't create an index in TypeORM to search thourgh text using trigrams. Such an index should be created using migrations.

CREATE INDEX companies_name_idx ON companies USING gin (name gin_trgm_ops)

The reason is clear: since ORM supports many databases, it is problematic to implement support for indexes that are unique for each database.

To prevent ORM from deleting an index created manually by migrations during synchronization, you also need to add a command that will inform ORM about the existence of this index and that it can not touch it. For example, this is how it looks in TypeORM:

@Index('companies_name_idx', { synchronize: false })
class Company extends BaseEntity {
}

As a result, to create an index that can be quickly written in raw SQL in 1 line, you should:

  • Check whether this type of index is supported by ORM. If so, create it using the ORM.
  • If the index is not supported, then add a new migration with SQL query that creates the index.
  • Add a command that tells ORM to ignore the index during synchronization. Otherwise, the index will be deleted.

It would be wrong to say this does not occur often. Probably the most popular index I use after b-tree is gin with trigrams for text search to find the desired rows in a table by substring using ILIKE. I wrote more about comparing indexes for text search here.

I have just opened my project, which I'm currently working on, and I see that about half of the indexes are crated via ORM, and another half via migrations.

Creating indexes in migrations is inconvenient. In this case, all indexes are scattered across different migrations. If you want to see which indexes exist in a particular table, you should either go into the database and list all indexes in the table (\d <table_name> in PostgreSQL), or search for them by prefix throughout the project (only if all indexes created in a particular table have the same prefix). If indexes in the same table can have different prefixes (e.g. GitLab names indexes that way), then you can view the list of all indexes only directly through the database. Very convenient, isn't it?

On the contrary, using declarative schema management, creating a new index is simply adding 1 row with an SQL query to the schema file.

Faster to write queries

This applies not only to the ORM, which usually has a built-in query builder, but also to any separate query builder (e.g. Knex).

In the beginning, it seems that it is more convenient to make almost all queries using query builder, rather than writing them in raw SQL. Sometimes, it really is more convenient, but definitely not faster.

This is how the simplest query will look like:

// TypeORM
manager.getRepository(Task).findOneBy({ id: 1 });

// Knex
knex('tasks').where('id', 1).select().first();

// Slonik
pool.maybeOne(sql.unsafe`SELECT * FROM tasks WHERE id = ${id}`);

// Raw SQL
client.query('SELECT * FROM tasks WHERE id = $1', [1]);

In this case, there is no difference.

Let's look at another example with multiple conditions.

// TypeORM
const query = ctx.manager
  .getRepository(Task)
  .createQueryBuilder()
  .where('completed');

if (id) query.andWhere(`id > :id`, { id });
if (searchTerm) query.andWhere(`name ILIKE :name`, { name: `%${searchTerm}%` });

const tasks = await query
  .orderBy('id', 'ASC')
  .limit(100)
  .getMany();
// Slonik
const tasks = await pool.any(sql.unsafe`
  SELECT * FROM tasks
  WHERE 
    completed AND
    ${id ? sql.fragment`id > ${id}` : true} AND
    ${searchTerm ? sql.fragment`name ILIKE ${`%${searchTerm}%`}` : true}
  ORDER BY id ASC
  LIMIT 100
`);

The second code with a raw SQL is definitely more readable.

When a query becomes a little more complicated, first, you need to figure out how to compose it using the query builder, and second, make sure that it is generated correctly.

If some query is generated incorrectly, then you can always write it manually, right? But tell me, at what point do you realize that the ORM generated your query incorrectly? It is unlikely that after composing each query, you turn on the logs and check how it is generated. Most likely, this happens already at the moment when some query began to be executed slowly. Of course, you will not immediately find out about it.

Knowing that queries are not always generated the way you would like, you should double-check them every time. So, the process of composing queries begins to boil down to the following 2 steps:

  • Figure out how to make an SQL query so that it executes quickly. Make queries to create the necessary indexes to speed up its execution.
  • Figure out how to construct the query using ORM so that it generates the most similar SQL query. Add indexes using ORM.

It would seem that ORM and query builders should speed up the development process, but they add extra work (a second step is added). In addition, the ORM may even start generating a query in a different way after a while (after the update).

Protection against SQL injections

Another advantage of ORM and query builders is that they protect you from SQL injections, and you don't even have to think about it. Or is it necessary?

Let's assume that we are faced with a situation where the query is generated by ORM suboptimally, and we decided to write this query in raw SQL. Let's use the manager.query method built into TypeORM.

// TypeORM
const input = '0; DROP TABLE tasks'
manager.query(`SELECT * FROM tasks WHERE id = ${input}`)

Congratulations, someone deleted your tasks table, despite the fact that you used ORM (same with query builders). To avoid this, you need to use a parameterized query, i.e. specify the parameters separately, and not insert them into the query itself. Read more here.

The query should be rewritten as follows:

// TypeORM
manager.query('SELECT * FROM tasks WHERE id = $1', [input])

In this case, ORM or query builder doesn't give you any advantage, because you can make parameterized queries without them.

// pg
client.query('SELECT * FROM tasks WHERE id = $1', [input])

SQL injections can occur not only in parameters, but also in the names of tables or columns if they are inserted into the query dynamically. In this case, you need to escape them.

As a result, firstly, if someone uses an ORM or query builder, this does not mean that he is protected from SQL injections, and secondly, this is not an advantage, because you can use parameterized queries without them.

Built-in migrations

While working in the development environment, everything is very convenient: if you add a new property to an entity, a new column in the table will be added automatically. However, in a production, someday you may lose part of your data, because the synchronization will be performed completely differently from how you planned it.

For example, if you change the property type from timestamp (in TypeORM it is used by default for the Date type) to timestamptz, the synchronization tool will execute the following SQL queries:

ALTER TABLE "tasks" DROP COLUMN "created_at";
ALTER TABLE "tasks" ADD "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now();

At first, the column will be deleted (the data will be lost), and then a new one will be created with the timestamptz type. That's why, it is extremely important to disable synchronization in production and use migrations to change the database schema.

In fact, this problem would not exist if the synchronization tool built into an ORM, instead of executing queries silently, would allow it to be called explicitly and by default it only output them, but didn't execute them. In this case, it would be possible to set up the following process of changing the schema in production:

  • Generate SQL queries that will update the database schema to the desired state. Do not execute queries.
  • Create a pull request. You should review these queries.
  • After approval, queries are executed and the database schema is updated.

As a result, you have to use migrations, but they are extremely inconvenient for changing the database schema. I wrote about the disadvantages of migrations here.

Some say that one of the advantages of ORM is that it allows you to observe the principle of DRY (Don't Repeat Yourself) in the sense that you don't need to write the database schema twice (in SQL and classes to which tables are mapped). However, this is only relevant until you have made the first release. After that, all schema changes must also be done twice: in migrations and classes.

On the contrary, declarative schema management allows you to:

  • Automatically synchronize the database schema (all changes go through the code review).
  • Store the entire schema in one file (or one file - one table for convenience).

It's easier to change the database

ORM or query builder allows you to migrate to another database with less effort, because they support many different databases. For example, you can easily migrate from MySQL to PostgreSQL and back, but on one condition: if you do not use the specifics of a particular database, which is almost impossible to do. At a minimum, you will use indexes that will be specific to a particular database, and at most you will use queries written in raw SQL using the features of the selected database.

If you really want to support multiple databases at once, then when composing each query, you need to check how it will be executed on each of the databases. Naturally, no one does this in advance, because it's just pointless and takes a lot of time, not to mention the fact that you need to have expertise in several databases at once. If there are advantages of a certain database, you will have to give them up in favor of preserving the possibility of switching to another database in some future.

Usually, when you really need to use several databases, you need to store one part of the data in one database (for example, in PostgreSQL), and the other part in another (for example, in ClickHouse). If the ORM or query builder supports both databases, then you can use one tool. This is the only advantage.

One person said about it here:

"Database-agnostic" is a pervasive and pernicious lie.
It almost always means "lowest common denominator amongst 
all supported databases".

Some databases are so far behind, they really shouldn't be 
abstracted by the same library. They have VERY different use
cases and abilities. The access model for a SQLite-based app
is quite different from a Postgres-based app.

It's like having an F-150, a Cybertruck, and a Prius while
hiring a driver that won't take either off-road or for more
than 100 miles at a time because he also has to be able to
drive a Nissan Leaf the exact same way.

But folks still hire him because he claims to handle anything
with a steering wheel and pedals. Technically true, but misses
the point of the different options.

That's an ORM.

ORM is simpler than SQL

There are 2 options here:

  1. You don't know SQL. In this case, most likely, ORM is just a black box for you. You have described the type of data that should be stored and call certain methods in order to perform CRUD operations. If something is running slowly or there is an error when executing an SQL query, you do not understand why this is happening and how to fix it. When you will use a different ORM, it will seem to you that you are working with a different database, because each ORM has its own set of methods. Learn SQL.
  2. You know SQL, but you have no idea how to set everything up so that it is convenient to work with the database. More info below.

Considering all the above problems, we can say that ORM gives advantages only at the very beginning (if you don't test queries), and in the long term it gives a lot of disadvantages:

  • You have to double-check how each query is generated.
  • A lot of extra work is added when creating indexes that are not supported by ORM (of which sometimes about half).
  • Inconvenient synchronization of the schema in production.
  • The use of features that are specific to a particular database is possible only through raw SQL queries. In this case, there is a risk of catching SQL injection, if you don't think about it.

ORM is just not for all projects

Understanding the disadvantages of ORM, someone will say that it is simply not for all projects. And here we come to the question: in which projects is it better to use ORM?

I've heard the following options:

  • ORM for simple projects.
  • ORM for projects with a large number of CRUD operations.
  • For MVP (Minimum Viable Product).
  • For small teams (< 30 people).

Imagine that you have created the MVP of your project and various companies have started using it. They tell you that you urgently need to add certain features, without which they cannot live. Are you really going to tell them to wait and you'll start rewriting almost all the API, because now you need to replace ORM with raw SQL? Of course, it won't be like that.

The project will become more complicated, the data will grow, it will be more and more difficult to leave ORM, problems will accumulate until at some point you get tired of the fact that a large number of queries are executed slowly and as a result you have to do double work: write an SQL query, and then figure out how to construct it using the query builder so, the final SQL query is generated as similar as possible to what you want.

It would make sense to use ORM if development would be accelerated several times, but this is not the case. Moreover, over time, everything happens just the opposite.

Switching to raw SQL

As soon as I made the decision to abandon ORM, a lot of questions immediately arose:

  • How can I synchronize the database schema in different environments?
  • How to perform migrations?
  • How do I add type support in SQL query results?

To answer these questions for myself, I decided to create a separate simple project and configure everything within it. This project is an API using TypeScript, Node.js, express, GraphQL with 5 operations (task, tasks, createTask, updateTask, deleteTask) and unit tests that check interaction with the database. It has 3 branches: typeorm, knex (query builder) and slonik (raw SQL).

Slonik has been selected as a helper for composing raw SQL queries. Someone may say that this is also a query builder, but using template literals, however, it is rather a set of utilities that allows you to safely execute raw SQL queries.

To manage the database schema in a declarative way, I have developed my own dsm tool that uses migra under the hood, because atlas does not support extensions/views/triggers/functions.

Synchronization of the database schema

The schema can be synchronized in 2 ways: either use migrations, which is extremely inconvenient for a number of reasons, or use a declarative schema management tool so that the schema is synchronized automatically.

In the development environment, synchronization is performed by the yarn sync command as follows:

  • The declarative schema management tool is launched.
  • The difference between the desired schema located in schema.sql and the current schema in the database is calculated. SQL queries are generated that transform the database schema into the desired state.
  • Generated SQL queries are output for manual confirmation.
  • After approval, queries are executed, converting the database schema to the desired one.
-- Planned Changes:
-- Modify "tasks" table
ALTER TABLE "public"."tasks" ADD COLUMN "description" text NOT NULL;
Use the arrow keys to navigate: ↓ ↑ → ← 
? Are you sure?: 
  ▸ Apply
    Abort

In the test environment, synchronization is also performed, only confirmation is not required. Even if there is a DROP instruction, nothing will happen, because there is no data in the tables.

In the production environment, synchronization should be performed as follows.

Migrations

Perform migrations only for DML (data manipulation). For DDL, use declarative schema management (see above). The following commands are available:

  • yarn m:new <migration_name> – creates a new migration with the specified name (use snake_case).
  • yarn m:apply – applies new migrations.

Typing

If you do not add typing, then the results of SQL queries in TypeScript will be of type any. You can specify the types in one of the following ways:

  1. Make type generation. The script should crawl all ts files in the project, find SQL queries, get the current schema from the database, determine the columns used in SELECT/RETURNING and generate the necessary types (similar to relay-compiler). Ready-made libraries: pgTyped, zapatos, @slonik/typegen.
  2. Create types manually. You can use zod (or alternatives) so that, in addition to interfaces, it is also possible to check the type of SQL query responses in runtime (TypeORM, by the way, does not do this). The responses of SQL queries in runtime may change if someone modifies the database schema manually (for example, makes a hot fix).

I decided to use the second option so that I could check the form of SQL query responses in runtime.

Articles and discussions on this topic

Related posts

How to get N rows per group in SQL

Let's assume that you are developing the home page in an online store. This page should display product categories with 10 products in each. How do you make a query to the database? What indexes do you need to create to speed up query execution?

How to get N rows per group in SQL