I wrote about Json support in Postgres recently, following my talk on the subject at Symfony Con Vienna. This follow-up article looks into the Json support of the PHP library Doctrine ORM.
You can find some sample code in my repository at github.com/dbu/postgres-json.
Doctrine
The Doctrine Object-Relational Mapper (ORM) is a library to use relational databases with model classes in PHP applications. It is the default database abstraction layer of the Symfony framework.
The model classes that map to database tables are by convention called Entity
.
Declaring JSON Structures in Entities
In Doctrine, there is only one Json type. As I explained in my Postgres article, you should prefer JSONB
over JSON
with Postgres. To make Doctrine generate JSONB
columns, set the jsonb
option to true
in the mapping configuration:
/**
* Json data stored in a postgres JSON column.
*/
#[ORM\Column(type: Types::JSON)]
private array $item;
/**
* Json data stored in a postgres JSONB column.
*/
#[ORM\Column(type: Types::JSON, options: ['jsonb' => true])]
private array $item;
Indexes on Json columns can not be defined in Doctrine mapping, because there is no way to specify custom SQL for indexes. Instead, you can add a migration that creates the index with an SQL statement you write yourself. If we have a table called auction_json_indexed
with a JSON(B) column item
, we can create an index on the author
key of the Json object like this:
CREATE INDEX json_author
ON auction_json_indexed ((item->>'author'));
Keep in mind that table and column names in the migration file are the names used in the database, not the PHP class names and property names.
Generated Columns
As discussed in the previous post, we can extract a value from a Json object into its own generated column to allow Postgres to work efficiently with that value.
It is possible to declare generated columns in the entity mapping:
#[ORM\Column(
length: 255,
nullable: true,
insertable: false,
updatable: false,
columnDefinition: "VARCHAR(255) generated always as (item->>'author') stored",
generated: "ALWAYS"
)]
private ?string $author = null;
The columnDefinition
is a native SQL statement, not a Doctrine Query-Language (DQL) statement. Therefore we need to use the column name of the database and not the property of the PHP class.
We need to set insertable
and updatable
to false
, to prevent Doctrine from trying to write directly to the field. An SQL statement that tries to write to a generated field leads to an error.
Querying Json with Doctrine
DQL does not support the SQL Json syntax out of the box.
Doctrine ORM builds on top of the Doctrine Database Abstraction Layer (DBAL). The DBAL accepts native SQL queries that are forwarded to the database without Doctrine parsing them. We can use those to run Json queries. We can tell Doctrine to hydrate the results of a native query to entities.
Sometimes, DQL and especially the query builder are really useful tools. Luckily, there is scienta/doctrine-json-functions that provides most of the Json functions for Postgres, MySQL and MariaDB. Once added into an application, we need to register the functions we want to use:
# config/packages/doctrine.yaml
doctrine:
orm:
dql:
string_functions:
JSON_GET_TEXT: Scienta\DoctrineJsonFunctions\Query\AST\Functions\Postgresql\JsonGetText
JSONB_CONTAINS: Scienta\DoctrineJsonFunctions\Query\AST\Functions\Postgresql\JsonbContains
DQL is not extensible to the point that an extension could add support for the ->>
syntax. The scienta extension provides the funcationality as a function JSON_GET_TEXT
, which is converted to ->>
when translating to SQL.
$qb = $this->createQueryBuilder('a');
$qb->select('COUNT(a)')
// ->> operator as DQL function
// Need to qualify the source of the item with a.
->where("JSON_GET_TEXT(a.item, 'author') = :author")
->setParameter('author', $authorName)
;
return $qb->getQuery()->getSingleScalarResult();
JSONB_CONTAINS
is the function for doing a @>
query. Doctrine does not know about boolean functions. The workaround is to register the JSONB_CONTAINS
as string function. Doctrine expects a comparison when using this function in a predicate, the trick is to compare the result with = true
to make the parser happy:
For the contains function, i unfortunately could not figure out how to pass the criteria as a parameter, so had to resort to string manipulation. If there is a better way, i would be glad to learn it:
// @> as DQL function
$qb = $this->createQueryBuilder('a');
$connection = $this->getEntityManager()->getConnection();
// If you have direct user input for the $authorName, please validate whether this code really is safe, to prevent SQL injections
$author = str_replace('"', '\"', $authorName);
$author = trim($connection->quote($author), "'");
$qb->select('COUNT(a)')
->where("JSONB_CONTAINS(a.item, '{\"author\": \"$author\"}') = true")
;
return $qb->getQuery()->getSingleScalarResult();
Measuring the Doctrine/PHP Overhead
Let's compare inserting many rows. Query times in milliseconds:
Rows | JSONB |
DBAL | DBAL, batch 1k | Entities |
---|---|---|---|---|
10k | 60 | 4'519 | 163 | 6'290 |
100k | 550 | 49'964 | 1'587 | 53'823 |
The JSONB
column measures a dummy script generating data. The realistic scenario would be a database dump that contains the data.
Using a prepared query to insert single rows is quite inefficient. Performance gets a lot better when building the query so that it inserts many rows at the same time. To batch insert, we generate a query with lots of parameters, and bind the values to those parameters. If you want to do this, have a look at the FixturesGenerateDbalCommand in my demo repository.
The last option is to create Entity instances and persist them with the Doctrine Entity Manager. This allows the application to fully work with objects. The performance above is measured with flushing and clearing for every 5 entities. Inserting entities becomes even slower when accumulating more objects before flushing.
In everyday operations, the entities add a layer of convenience and stability to work with your data. You can for example use events or validation frameworks. While there is some overhead, that cost is still small in comparison to web request overheads.
If you have to import large batches of data however, the overhead adds up. For such imports, it can be worth it to investigate shortcuts to avoid having to create and persist entities. With prepared statements that insert a whole batch of rows at once, this approach is viable.
Conclusions
- You can use Json in Postgres and still profit from Doctrine.
- There are some workarounds and things that are not as elegant.
- The overhead when inserting lots of data is significant. Batch DBAL prepared statements are the most efficient tool available.
Have a look at the experiments and sample code in my repository github.com/dbu/postgres-json.