Neues vom PostgreSQL Planet
Nils Dijk: Diary of an Engineer: Delivering 45x faster percentiles using Postgres, Citus, & t-digest
When working on the internals of Citus, an open source extension to Postgres that transforms Postgres into a distributed database, we often get to talk with customers that have interesting challenges you won’t find everywhere. Just a few months back, I encountered an analytics workload that was a really good fit for Citus.
But we had one problem: the percentile calculations on their data (over 300 TB of data) could not meet their SLA of 30 seconds.
I recently wrote a presentation, Postgres and the Artificial Intelligence Landscape, which covers the basics of artificial intelligence and shows how Postgres can be used for this purpose. This week I presented it at the Chicago PostgreSQL Meetup Group so I am now publishing the slides.
I recently gave a talk about the Citus extension to Postgres at the Warsaw PostgreSQL Users Group. Unfortunately, I did not get to go in person to beautiful Warsaw, but it was still a nice way to interact with the global Postgres community and talk about what Citus is, how it works, and what it can do for you.
Postgres has made great strides in adding features to match proprietary databases, and it has many complex features that other databases don't have. However, that doesn't mean it is the best fit for every organization. There are still reasons not to use Postgres:
© Laurenz Albe 2020
HOT updates are not a new feature. They were introduced by commit 282d2a03dd in 2007 and first appeared in PostgreSQL 8.3.
PostgreSQL will create some catalog and other caches for each connection request to store some queried data locally, which can speed up query efficiency. If a process undergoes a DDL operation that causes catalog to change, other running processes need to be notified that some cache data is invalid. We call this notification mechanism the SI (Share Invalid) mechanism.
Database applications are initially written using the simplest queries possible. During testing and in production, some application tasks might have unacceptable performance. This is where re-architecturing happens, and where simple queries and data schema layouts can get complex. They might get complex because it is required to accomplish the task, or it might be because of limitations in how the database software handles certain queries. Database and tooling upgrades can require further complex additions.
I tested Fedora 33 on my old Lenovo T510 (enhanced 16GB RAM). It looks pretty well. The interactive processes are quick under high load (what was some time problem in older releases). It is very nice and comfortable.
Some time ago I’ve stumbled on this interesting question on stackoverflow (as stated in the title). I could not find any satisfactory answer at that time, so I came up with this solution. I think the problem is interesting enough to make it into a blog post.
The following table shows the expected results. The x column contains the original data whereas mdn_x contains the median computed from current up to 3 preceding rows.