Neues vom PostgreSQL Planet
As I started writing this post, I realized that to properly show how to evaluate, clean, and transform data in the database (also known as data munging), I needed to focus on each step individually. This blog post will show you exactly how to use TimescaleDB and PostgreSQL to perform your data evaluation tasks that you may have previously done in Excel, R, or Python.
This article was a bit harder to write than it would seem on the surface. Amid all the hustle and bustle of the current development cycle, it was hard to find a particular focus of what this effort was all about. Then it finally dawned on me. There wasn’t one.
That may sound a bit harsh for the developers working on it, but please let me explain. PostgreSQL is caught in the middle of a maelstrom. As it continually tries to grow up, the meaning of being a grown up keeps changing.
Today saw the launch of PostgreSQL14. For me, the important area to focus on is what the community has done to improve performance for heavy transactional workloads and additional support for distributed data.
The amount of data that enterprises have to process continues to grow exponentially. Being able to scale up and out to support these workloads on PostgreSQL 14 through better handling of concurrent connections and additional features for query parallelism makes a lot of sense for performance, while the expansion of logical replication will also help.
© Laurenz Albe 2021
Most database tables have an artificial numeric primary key, and that number is usually generated automatically using a sequence. I wrote about auto-generated primary keys in some detail in a previous article. Occasionally, gaps in these primary key sequences can occur – which might come as a surprise to you.
Installing PostgreSQL via GNU Guix.
It the the time for everybody using PostgreSQL 9.6 to start planning an upgrade to the latest supported PostgreSQL version. PostgreSQL Global Development Group (PGDG) supports any major version for 5 years after its initial release. PostgreSQL 9.6 with its initial release on September 29, 2016 is about to become an unsupported version. Its final minor version will be released on November 11, 2021, upon which no security fixes, bug fixes or patches are supported by PGDG.
Nowadays it is becoming more frequent for systems and applications to require querying data from outside the main database. PostgreSQL supports querying external postgres data using two core extensions dblink and postgres-fdw , the last one is a Foreign Data Wrapper (FDW), that is an implementation of SQL/MED standard, which is part of ANSI SQL 2003 standard specification.
Fast load into a database table is a feature we need on any database. Datawarehouses use it daily. OLTP requires it regularly. And probably from the beginning of their existence to migrate data to it. With Python,the psycopg2 client is commonly used and I'll start there (this is the first post in a series). The main motivation is that psycopg2 doesn't have prepared statements, and parsing each INSERT, even with a list of rows, is not efficient for loading million of rows. But it has a nice alternative as it can call the COPY command.
I have seen a people comparing YugabyteDB and PostgreSQL, and surprised by the different throughput when running a simple test on a from a single session. The purpose of a distributed database is to scale out. When running on a single node without the need for High-Availability-without-data-loss (this is a tautology), a monolith database will always perform with lower latency. Because a distributed DB is designed to ensure the persistence (the D in ACID) though RPC (remote procedure calls) rather than local writes.
Here is a simple workload:
In the last installment of the pipeline blog series, we explored writing streaming JSON data into PostgreSQL using Kafka Connect. In this blog, it’s time to find out if our fears of encountering mutant monsters in the gloom of an abandoned uranium mine were warranted or not.
Cover photo: Emily Morter
If you spend any amount of time working with Postgres, it’s likely that at some point you’re going to wonder why it isn’t using an index that you think it should be.
Unlike some databases, you can’t force PostgreSQL to use a specific index, but there are several things you can do to work out what’s going on.
© Laurenz Albe 2021
In this article, I will try to point out good and bad uses of JSON in PostgreSQL, and provide you with guidelines that you can follow.
How hard could it be to reset a sequence?Restarting a sequence: how hard could it be? (PostgreSQL and Oracle)
One reason I like PostgreSQL so much is that it makes me feel at home: it has a very consistent and coherent interface to its objects. An example of this, is the management of sequences: ALTER SEQUENCE allows you to modify pretty much every detail about a sequence, in particular to restart it from its initial value.
Let’s see this in action: