ClickHouse vs TimescaleDB
Last updated: Nov 2, 2021
Special thanks to Ilya for extending this blog by adding remaining ClickHouse equivalent queries (Window functions, Joins and Geo queries) from the TimescaleDB docs. Your feedback and help is much appreciated.
Recently, TimescaleDB published a blog comparing ClickHouse & TimescaleDB using timescale/tsbs, a timeseries benchmarking framework. I have some experience with PostgreSQL and ClickHouse but never got the chance to play with TimescaleDB. Some of the claims about TimescaleDB made in their post are very bold, that made me even more curious. I thought it’d be a great opportunity to try it out and see if those claims are really true.
To make sure that I am doing apple to apple comparison and results don’t get biased due to my lack of knowledge of one particular system, I decided to select a dataset which both TimescaleDB and ClickHouse has documented in their website: NYC Taxi Dataset.
From TimescaleDB page, it does analysis on a subset of NYC taxi dataset (~11M rows: 1 month), I decided to download from them and run the queries mentioned there. Let’s see how ClickHouse and TimescaleDB perform.
To make sure that I am capturing the correct result, I ran the same query (all mentioned in this blog) multiple times and picked the one which ran fastest.
Following is the spec of the physical machine selected for the benchmarks.
$ rg processor /proc/cpuinfo | wc -l
32
$ free -h
total used free shared buff/cache available
Mem: 125Gi 2.9Gi 99Gi 5.9Gi 23Gi 114Gi
Swap: 0B 0B 0B
To make it easily replicable, I decided to go with docker containers for running them.
TimescaleDB
$ docker run -d -e POSTGRES_PASSWORD=timescaledb timescale/timescaledb:2.4.0-pg13
As mentioned in their blog, I used timescaledb-tune
utility to tune the configuration. I made sure that updated config is picked by triggering a config reload.
# timescaledb-tune
Using postgresql.conf at this path:
/var/lib/postgresql/data/postgresql.conf
Is this correct? [(y)es/(n)o]: y
Writing backup to:
/tmp/timescaledb_tune.backup202110290606
success: shared_preload_libraries is set correctly
Tune memory/parallelism/WAL and other settings? [(y)es/(n)o]: y
Recommendations based on 125.79 GB of available memory and 32 CPUs for PostgreSQL 13
Memory settings recommendations
success: memory settings are already tuned
Parallelism settings recommendations
success: parallelism settings are already tuned
WAL settings recommendations
success: WAL settings are already tuned
Miscellaneous settings recommendations
success: miscellaneous settings are already tuned
Saving changes to: /var/lib/postgresql/data/postgresql.conf
# psql -U postgres
psql (13.3)
Type "help" for help.
postgres=# SELECT pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 13.3 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424, 64-bit
(1 row)
ClickHouse
$ docker run -d --ulimit nofile=262144:262144 yandex/clickhouse-server:21.8
SELECT version()
┌─version()──┐
│ 21.8.10.19 │
└────────────┘
TimescaleDB
To make sure that timescaledb extension is installed:
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
-------------+---------+------------+-------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
timescaledb | 2.4.0 | public | Enables scalable inserts and complex queries for time-series data
(2 rows)
Table creation
postgres=# CREATE TABLE "rides"(
vendor_id TEXT,
pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
passenger_count NUMERIC,
trip_distance NUMERIC,
pickup_longitude NUMERIC,
pickup_latitude NUMERIC,
rate_code INTEGER,
dropoff_longitude NUMERIC,
dropoff_latitude NUMERIC,
payment_type INTEGER,
fare_amount NUMERIC,
extra NUMERIC,
mta_tax NUMERIC,
tip_amount NUMERIC,
tolls_amount NUMERIC,
improvement_surcharge NUMERIC,
total_amount NUMERIC
);
CREATE TABLE
postgres=# SELECT create_hypertable(
'rides',
'pickup_datetime',
'payment_type',
2,
create_default_indexes=>FALSE
);
create_hypertable
--------------------
(1,public,rides,t)
(1 row)
postgres=# CREATE TABLE IF NOT EXISTS "payment_types"(
payment_type INTEGER,
description TEXT
);
CREATE TABLE
postgres=# INSERT INTO payment_types(payment_type, description) VALUES
(1, 'credit card'),
(2, 'cash'),
(3, 'no charge'),
(4, 'dispute'),
(5, 'unknown'),
(6, 'voided trip');
INSERT 0 6
postgres=# CREATE TABLE IF NOT EXISTS "rates"(
rate_code INTEGER,
description TEXT
);
CREATE TABLE
postgres=# INSERT INTO rates(rate_code, description) VALUES
(1, 'standard rate'),
(2, 'JFK'),
(3, 'Newark'),
(4, 'Nassau or Westchester'),
(5, 'negotiated fare'),
(6, 'group ride');
INSERT 0 6
Indices creation
postgres=# CREATE INDEX ON rides (vendor_id, pickup_datetime desc);
CREATE INDEX
postgres=# CREATE INDEX ON rides (pickup_datetime desc, vendor_id);
CREATE INDEX
postgres=# CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX
postgres=# CREATE INDEX ON rides (passenger_count, pickup_datetime desc);
CREATE INDEX
ClickHouse
Table creation
CREATE TABLE rides
(
`vendor_id` String,
`pickup_datetime` DateTime,
`dropoff_datetime` Nullable(DateTime),
`passenger_count` Nullable(UInt8),
`trip_distance` Nullable(Float64),
`pickup_longitude` Nullable(Float64),
`pickup_latitude` Nullable(Float64),
`rate_code` Nullable(UInt8),
`dropoff_longitude` Nullable(Float64),
`dropoff_latitude` Nullable(Float64),
`payment_type` Nullable(String),
`fare_amount` Nullable(Float32),
`extra` Nullable(Float32),
`mta_tax` Nullable(Float32),
`tip_amount` Nullable(Float32),
`tolls_amount` Nullable(Float32),
`improvement_surcharge` Nullable(Float32),
`total_amount` Nullable(Float32)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(pickup_datetime)
ORDER BY pickup_datetime
SETTINGS index_granularity = 8192;
CREATE TABLE payment_types
(
`payment_type` UInt8,
`description` String
)
ENGINE = MergeTree
ORDER BY payment_type;
INSERT INTO payment_types(payment_type, description) VALUES
(1, 'credit card'),
(2, 'cash'),
(3, 'no charge'),
(4, 'dispute'),
(5, 'unknown'),
(6, 'voided trip');
CREATE TABLE rates
(
`rate_code` UInt8,
`description` String
)
ENGINE = MergeTree
ORDER BY rate_code;
INSERT INTO rates(rate_code, description) VALUES
(1, 'standard rate'),
(2, 'JFK'),
(3, 'Newark'),
(4, 'Nassau or Westchester'),
(5, 'negotiated fare'),
(6, 'group ride');
Let’s try loading the dataset. We had to turn timing on for TimescaleDB. ClickHouse shows the query duration by default.
postgres=# \timing on
Timing is on.
TimescaleDB
postgres=# \COPY rides FROM nyc_data_rides.csv CSV;
COPY 10906860
Time: 220695.563 ms (03:40.696)
Since postgres COPY command is single threaded, this was expected. We decided to try the timescaledb-parallel-copy
increasing workers and batch size from default.
# timescaledb-parallel-copy --db-name postgres --table rides --file nyc_data_rides.csv --workers 10 --batch-size 10000 --reporting-period 10s
at 10s, row rate 187998.43/sec (period), row rate 187998.43/sec (overall), 1.880000E+06 total rows
at 20s, row rate 201998.60/sec (period), row rate 194998.51/sec (overall), 3.900000E+06 total rows
at 30s, row rate 192001.22/sec (period), row rate 193999.42/sec (overall), 5.820000E+06 total rows
at 40s, row rate 181001.56/sec (period), row rate 190749.99/sec (overall), 7.630000E+06 total rows
at 50s, row rate 194998.37/sec (period), row rate 191599.67/sec (overall), 9.580000E+06 total rows
COPY 10906860
ClickHouse
# cat nyc_data_rides.csv | clickhouse-client --time --query="INSERT INTO rides FORMAT CSV"
6.702
Database | Mechanism | Duration (s) |
---|---|---|
TimescaleDB | COPY | 220 |
TimescaleDB | timescaledb-parallel-copy | 50 |
ClickHouse | INSERT | 7 |
TimescaleDB
# du -sh /var/lib/postgresql/data
4.2G /var/lib/postgresql/data
ClickHouse
# du -sh /var/lib/clickhouse
453M /var/lib/clickhouse
Counting all the rides that happened in 1 month.
TimescaleDB
postgres=# select count(*) from rides;
count
----------
10906860
(1 row)
Time: 434.663 ms
ClickHouse
SELECT count(*)
FROM rides
Query id: f2ec17cc-bf45-42a9-8181-e77ba5097416
┌──count()─┐
│ 10906860 │
└──────────┘
1 rows in set. Elapsed: 0.003 sec.
Database | Query Duration (ms) |
---|---|
TimescaleDB | 435 |
ClickHouse | 3 |
Counting the number of rides on daily basis.
TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
2016-01-06 00:00:00 | 348516
2016-01-07 00:00:00 | 364894
2016-01-08 00:00:00 | 392070
2016-01-09 00:00:00 | 405825
2016-01-10 00:00:00 | 351788
2016-01-11 00:00:00 | 342651
2016-01-12 00:00:00 | 367390
2016-01-13 00:00:00 | 395090
2016-01-14 00:00:00 | 396473
2016-01-15 00:00:00 | 401289
2016-01-16 00:00:00 | 411899
2016-01-17 00:00:00 | 379156
2016-01-18 00:00:00 | 341481
2016-01-19 00:00:00 | 385187
2016-01-20 00:00:00 | 382105
2016-01-21 00:00:00 | 399654
2016-01-22 00:00:00 | 420162
2016-01-23 00:00:00 | 78133
2016-01-24 00:00:00 | 159766
2016-01-25 00:00:00 | 282087
2016-01-26 00:00:00 | 327655
2016-01-27 00:00:00 | 359180
2016-01-28 00:00:00 | 383326
2016-01-29 00:00:00 | 414039
2016-01-30 00:00:00 | 435369
2016-01-31 00:00:00 | 361505
2017-11-17 00:00:00 | 2
(32 rows)
Time: 1054.827 ms (00:01.055)
ClickHouse
SELECT
toDate(pickup_datetime) AS day,
COUNT(*)
FROM rides
GROUP BY day
ORDER BY day ASC
Query id: 97033310-0219-47d9-850c-405cae2e66a2
┌────────day─┬─count()─┐
│ 2016-01-01 │ 345037 │
│ 2016-01-02 │ 312831 │
│ 2016-01-03 │ 302878 │
│ 2016-01-04 │ 316171 │
│ 2016-01-05 │ 343251 │
│ 2016-01-06 │ 348516 │
│ 2016-01-07 │ 364894 │
│ 2016-01-08 │ 392070 │
│ 2016-01-09 │ 405825 │
│ 2016-01-10 │ 351788 │
│ 2016-01-11 │ 342651 │
│ 2016-01-12 │ 367390 │
│ 2016-01-13 │ 395090 │
│ 2016-01-14 │ 396473 │
│ 2016-01-15 │ 401289 │
│ 2016-01-16 │ 411899 │
│ 2016-01-17 │ 379156 │
│ 2016-01-18 │ 341481 │
│ 2016-01-19 │ 385187 │
│ 2016-01-20 │ 382105 │
│ 2016-01-21 │ 399654 │
│ 2016-01-22 │ 420162 │
│ 2016-01-23 │ 78133 │
│ 2016-01-24 │ 159766 │
│ 2016-01-25 │ 282087 │
│ 2016-01-26 │ 327655 │
│ 2016-01-27 │ 359180 │
│ 2016-01-28 │ 383326 │
│ 2016-01-29 │ 414039 │
│ 2016-01-30 │ 435369 │
│ 2016-01-31 │ 361505 │
│ 2017-11-17 │ 2 │
└────────────┴─────────┘
32 rows in set. Elapsed: 0.021 sec. Processed 10.91 million rows, 43.63 MB (516.78 million rows/s., 2.07 GB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 1055 |
ClickHouse | 21 |
Calculating the average fare amount for passengers.
TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day;
day | avg
---------------------+---------------------
2016-01-01 00:00:00 | 12.5464748850129787
2016-01-02 00:00:00 | 12.1129878886746750
2016-01-03 00:00:00 | 12.8262352076841150
2016-01-04 00:00:00 | 11.9116533573721472
2016-01-05 00:00:00 | 11.7534235580737452
2016-01-06 00:00:00 | 11.7824805635293235
2016-01-07 00:00:00 | 11.9498961299166930
2016-01-08 00:00:00 | 11.8738271415577235
2016-01-09 00:00:00 | 11.4878104510668158
2016-01-10 00:00:00 | 12.0888246569772508
2016-01-11 00:00:00 | 12.1343557409576630
2016-01-12 00:00:00 | 11.8598489409162272
2016-01-13 00:00:00 | 11.8031804213844504
2016-01-14 00:00:00 | 12.3123409782144971
2016-01-15 00:00:00 | 12.5613139967255876
2016-01-16 00:00:00 | 11.8351508976680107
2016-01-17 00:00:00 | 12.0295602006362590
2016-01-18 00:00:00 | 11.8623529387202913
2016-01-19 00:00:00 | 12.3472510415266765
2016-01-20 00:00:00 | 12.4609413534048953
2016-01-21 00:00:00 | 12.6083024358000252
2016-01-22 00:00:00 | 12.0812572192038226
2016-01-23 00:00:00 | 11.8564034256627468
2016-01-24 00:00:00 | 12.1723178660326188
2016-01-25 00:00:00 | 14.6872562077357373
2016-01-26 00:00:00 | 14.2577452509214630
2016-01-27 00:00:00 | 13.2432283566172145
2016-01-28 00:00:00 | 12.9854551087781026
2016-01-29 00:00:00 | 12.5911266525638913
2016-01-30 00:00:00 | 12.2759556743269683
2016-01-31 00:00:00 | 12.5867816634207905
2017-11-17 00:00:00 | 9.7500000000000000
(32 rows)
Time: 1499.333 ms (00:01.499)
ClickHouse
SELECT
toDate(pickup_datetime) AS day,
avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day ASC
Query id: 92abcc32-f516-42d9-99dd-3b0faa7103fb
┌────────day─┬───avg(fare_amount)─┐
│ 2016-01-01 │ 12.546474885017933 │
│ 2016-01-02 │ 12.112987888606051 │
│ 2016-01-03 │ 12.826235207666542 │
│ 2016-01-04 │ 11.911653357328007 │
│ 2016-01-05 │ 11.753423558134099 │
│ 2016-01-06 │ 11.78248056342069 │
│ 2016-01-07 │ 11.949896129834796 │
│ 2016-01-08 │ 11.873827141488679 │
│ 2016-01-09 │ 11.487810451064721 │
│ 2016-01-10 │ 12.088824656929884 │
│ 2016-01-11 │ 12.134355740954636 │
│ 2016-01-12 │ 11.859848940902094 │
│ 2016-01-13 │ 11.803180421366301 │
│ 2016-01-14 │ 12.312340978159439 │
│ 2016-01-15 │ 12.561313996603548 │
│ 2016-01-16 │ 11.835150897615112 │
│ 2016-01-17 │ 12.02956020059271 │
│ 2016-01-18 │ 11.862352938597022 │
│ 2016-01-19 │ 12.347251041579934 │
│ 2016-01-20 │ 12.460941353486442 │
│ 2016-01-21 │ 12.608302435857224 │
│ 2016-01-22 │ 12.081257219198683 │
│ 2016-01-23 │ 11.856403425394397 │
│ 2016-01-24 │ 12.17231786602557 │
│ 2016-01-25 │ 14.687256207078526 │
│ 2016-01-26 │ 14.257745250869808 │
│ 2016-01-27 │ 13.243228356662698 │
│ 2016-01-28 │ 12.985455108728543 │
│ 2016-01-29 │ 12.591126652543593 │
│ 2016-01-30 │ 12.275955679633002 │
│ 2016-01-31 │ 12.586781663432406 │
│ 2017-11-17 │ 9.75 │
└────────────┴────────────────────┘
32 rows in set. Elapsed: 0.054 sec. Processed 10.91 million rows, 119.98 MB (202.67 million rows/s., 2.23 GB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 1500 |
ClickHouse | 54 |
Calculating number of rides that took place for each rate type
TimescaleDB
postgres=# SELECT rate_code, COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code;
rate_code | num_trips
-----------+-----------
1 | 10626315
2 | 225019
3 | 16822
4 | 4696
5 | 33688
6 | 102
99 | 216
(7 rows)
Time: 1014.495 ms (00:01.014)
ClickHouse
SELECT
rate_code,
COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code ASC
Query id: 288415a8-316b-42c8-8c65-ec85d476bb1e
┌─rate_code─┬─num_trips─┐
│ 1 │ 10626315 │
│ 2 │ 225019 │
│ 3 │ 16822 │
│ 4 │ 4696 │
│ 5 │ 33688 │
│ 6 │ 102 │
│ 99 │ 216 │
└───────────┴───────────┘
7 rows in set. Elapsed: 0.053 sec. Processed 10.91 million rows, 174.51 MB (205.99 million rows/s., 3.30 GB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 1014 |
ClickHouse | 53 |
Analysis of rides to JFK and EWR.
TimescaleDB
postgres=# SELECT rates.description, COUNT(vendor_id) AS num_trips,
AVG(dropoff_datetime - pickup_datetime) AS avg_trip_duration, AVG(total_amount) AS avg_total,
AVG(tip_amount) AS avg_tip, MIN(trip_distance) AS min_distance, AVG (trip_distance) AS avg_distance, MAX(trip_distance) AS max_distance,
AVG(passenger_count) AS avg_passengers
FROM rides
JOIN rates ON rides.rate_code = rates.rate_code
WHERE rides.rate_code IN (2,3) AND pickup_datetime < '2016-02-01'
GROUP BY rates.description
ORDER BY rates.description;
description | num_trips | avg_trip_duration | avg_total | avg_tip | min_distance | avg_distance | max_distance | avg_passengers
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------
JFK | 225019 | 00:45:46.822517 | 64.3278115181384683 | 7.3334228220728027 | 0.00 | 17.2602816651038357 | 221.00 | 1.7333869584346211
Newark | 16822 | 00:35:16.157472 | 86.4633688027582927 | 9.5461657353465700 | 0.00 | 16.2706122934252764 | 177.23 | 1.7435501129473309
(2 rows)
Time: 5132.768 ms (00:05.133)
ClickHouse
SELECT
rates.description,
COUNT(vendor_id) AS num_trips,
AVG(dropoff_datetime - pickup_datetime) AS avg_trip_duration,
AVG(total_amount) AS avg_total,
AVG(tip_amount) AS avg_tip,
MIN(trip_distance) AS min_distance,
AVG(trip_distance) AS avg_distance,
MAX(trip_distance) AS max_distance,
AVG(passenger_count) AS avg_passengers
FROM rides
INNER JOIN rates ON rides.rate_code = rates.rate_code
WHERE (rides.rate_code IN (2, 3)) AND (pickup_datetime < '2016-02-01')
GROUP BY rates.description
ORDER BY rates.description ASC
Query id: 303edbf1-73f8-4426-86b2-66fbecc57f4c
┌─description─┬─num_trips─┬──avg_trip_duration─┬─────────avg_total─┬───────────avg_tip─┬─min_distance─┬───────avg_distance─┬─max_distance─┬────avg_passengers─┐
│ JFK │ 225019 │ 2746.8225172096577 │ 64.32781102360629 │ 7.333422817827029 │ 0 │ 17.26028166510384 │ 221 │ 1.733386958434621 │
│ Newark │ 16822 │ 2116.157472357627 │ 86.46336993224668 │ 9.546165732798725 │ 0 │ 16.270612293425287 │ 177.23 │ 1.743550112947331 │
└─────────────┴───────────┴────────────────────┴───────────────────┴───────────────────┴──────────────┴────────────────────┴──────────────┴───────────────────┘
2 rows in set. Elapsed: 0.116 sec. Processed 10.91 million rows, 458.09 MB (94.31 million rows/s., 3.96 GB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 5132 |
ClickHouse | 116 |
Calculate the number of rides happened every 5 minutes for the first day of 2016.
TimescaleDB
postgres=# SELECT
EXTRACT(hour from pickup_datetime) as hours,
trunc(EXTRACT(minute from pickup_datetime) / 5)*5 AS five_mins,
COUNT(*)
FROM rides
WHERE pickup_datetime < '2016-01-02 00:00'
GROUP BY hours, five_mins;
hours | five_mins | count
-------+-----------+-------
0 | 0 | 703
0 | 5 | 1482
0 | 10 | 1959
0 | 15 | 2200
0 | 20 | 2285
0 | 25 | 2291
0 | 30 | 2349
0 | 35 | 2328
0 | 40 | 2440
0 | 45 | 2372
0 | 50 | 2388
0 | 55 | 2473
1 | 0 | 2395
1 | 5 | 2510
1 | 10 | 2412
1 | 15 | 2482
1 | 20 | 2428
1 | 25 | 2433
1 | 30 | 2337
1 | 35 | 2366
1 | 40 | 2325
1 | 45 | 2257
1 | 50 | 2316
1 | 55 | 2250
2 | 0 | 2303
2 | 5 | 2259
2 | 10 | 2253
2 | 15 | 2176
2 | 20 | 2138
2 | 25 | 2131
2 | 30 | 2135
2 | 35 | 2047
2 | 40 | 2086
2 | 45 | 2049
2 | 50 | 1938
2 | 55 | 1975
3 | 0 | 1966
3 | 5 | 1927
3 | 10 | 1896
3 | 15 | 1840
3 | 20 | 1917
3 | 25 | 1816
3 | 30 | 1784
3 | 35 | 1607
3 | 40 | 1652
3 | 45 | 1619
3 | 50 | 1655
3 | 55 | 1598
4 | 0 | 1684
4 | 5 | 1608
4 | 10 | 1533
4 | 15 | 1468
4 | 20 | 1430
4 | 25 | 1303
4 | 30 | 1230
4 | 35 | 1171
4 | 40 | 1092
4 | 45 | 1023
4 | 50 | 998
4 | 55 | 879
5 | 0 | 837
5 | 5 | 858
5 | 10 | 733
5 | 15 | 776
5 | 20 | 662
5 | 25 | 703
5 | 30 | 629
5 | 35 | 624
5 | 40 | 611
5 | 45 | 560
5 | 50 | 555
5 | 55 | 554
6 | 0 | 551
6 | 5 | 519
6 | 10 | 516
6 | 15 | 501
6 | 20 | 500
6 | 25 | 510
6 | 30 | 497
6 | 35 | 524
6 | 40 | 508
6 | 45 | 463
6 | 50 | 479
6 | 55 | 483
7 | 0 | 476
7 | 5 | 456
7 | 10 | 447
7 | 15 | 459
7 | 20 | 451
7 | 25 | 450
7 | 30 | 440
7 | 35 | 418
7 | 40 | 417
7 | 45 | 426
7 | 50 | 413
7 | 55 | 401
8 | 0 | 396
8 | 5 | 405
8 | 10 | 409
8 | 15 | 447
8 | 20 | 408
8 | 25 | 404
8 | 30 | 396
8 | 35 | 421
8 | 40 | 441
8 | 45 | 374
8 | 50 | 427
8 | 55 | 412
9 | 0 | 410
9 | 5 | 454
9 | 10 | 487
9 | 15 | 510
9 | 20 | 488
9 | 25 | 456
9 | 30 | 456
9 | 35 | 539
9 | 40 | 542
9 | 45 | 564
9 | 50 | 568
9 | 55 | 581
10 | 0 | 596
10 | 5 | 592
10 | 10 | 669
10 | 15 | 656
10 | 20 | 703
10 | 25 | 667
10 | 30 | 706
10 | 35 | 662
10 | 40 | 739
10 | 45 | 788
10 | 50 | 814
10 | 55 | 843
11 | 0 | 779
11 | 5 | 861
11 | 10 | 798
11 | 15 | 881
11 | 20 | 893
11 | 25 | 977
11 | 30 | 1006
11 | 35 | 1025
11 | 40 | 1114
11 | 45 | 1120
11 | 50 | 1165
11 | 55 | 1133
12 | 0 | 1132
12 | 5 | 1156
12 | 10 | 1157
12 | 15 | 1166
12 | 20 | 1203
12 | 25 | 1122
12 | 30 | 1119
12 | 35 | 1181
12 | 40 | 1194
12 | 45 | 1228
12 | 50 | 1174
12 | 55 | 1195
13 | 0 | 1234
13 | 5 | 1204
13 | 10 | 1276
13 | 15 | 1285
13 | 20 | 1272
13 | 25 | 1280
13 | 30 | 1282
13 | 35 | 1252
13 | 40 | 1420
13 | 45 | 1370
13 | 50 | 1430
13 | 55 | 1389
14 | 0 | 1423
14 | 5 | 1346
14 | 10 | 1293
14 | 15 | 1328
14 | 20 | 1355
14 | 25 | 1333
14 | 30 | 1367
14 | 35 | 1475
14 | 40 | 1397
14 | 45 | 1439
14 | 50 | 1444
14 | 55 | 1352
15 | 0 | 1315
15 | 5 | 1327
15 | 10 | 1330
15 | 15 | 1380
15 | 20 | 1351
15 | 25 | 1322
15 | 30 | 1411
15 | 35 | 1346
15 | 40 | 1355
15 | 45 | 1353
15 | 50 | 1386
15 | 55 | 1263
16 | 0 | 1308
16 | 5 | 1278
16 | 10 | 1185
16 | 15 | 1256
16 | 20 | 1223
16 | 25 | 1223
16 | 30 | 1204
16 | 35 | 1269
16 | 40 | 1184
16 | 45 | 1208
16 | 50 | 1182
16 | 55 | 1266
17 | 0 | 1228
17 | 5 | 1323
17 | 10 | 1285
17 | 15 | 1242
17 | 20 | 1303
17 | 25 | 1272
17 | 30 | 1335
17 | 35 | 1297
17 | 40 | 1328
17 | 45 | 1444
17 | 50 | 1423
17 | 55 | 1367
18 | 0 | 1365
18 | 5 | 1377
18 | 10 | 1327
18 | 15 | 1352
18 | 20 | 1370
18 | 25 | 1316
18 | 30 | 1448
18 | 35 | 1341
18 | 40 | 1475
18 | 45 | 1508
18 | 50 | 1543
18 | 55 | 1358
19 | 0 | 1308
19 | 5 | 1368
19 | 10 | 1426
19 | 15 | 1372
19 | 20 | 1365
19 | 25 | 1315
19 | 30 | 1321
19 | 35 | 1260
19 | 40 | 1280
19 | 45 | 1246
19 | 50 | 1216
19 | 55 | 1192
20 | 0 | 1154
20 | 5 | 1091
20 | 10 | 1100
20 | 15 | 1118
20 | 20 | 1105
20 | 25 | 1060
20 | 30 | 1135
20 | 35 | 1126
20 | 40 | 1098
20 | 45 | 1136
20 | 50 | 1039
20 | 55 | 1087
21 | 0 | 1042
21 | 5 | 1025
21 | 10 | 1113
21 | 15 | 1127
21 | 20 | 1102
21 | 25 | 1068
21 | 30 | 1104
21 | 35 | 1083
21 | 40 | 1101
21 | 45 | 1127
21 | 50 | 1086
21 | 55 | 1095
22 | 0 | 1103
22 | 5 | 1199
22 | 10 | 1126
22 | 15 | 1130
22 | 20 | 1110
22 | 25 | 1098
22 | 30 | 1197
22 | 35 | 1189
22 | 40 | 1167
22 | 45 | 1247
22 | 50 | 1146
22 | 55 | 1030
23 | 0 | 1139
23 | 5 | 1082
23 | 10 | 1065
23 | 15 | 986
23 | 20 | 1032
23 | 25 | 988
23 | 30 | 1032
23 | 35 | 1100
23 | 40 | 1142
23 | 45 | 1145
23 | 50 | 1149
23 | 55 | 1063
(288 rows)
Time: 170.031 ms
ClickHouse
SELECT
toHour(pickup_datetime) AS hours,
trunc(toMinute(pickup_datetime) / 5) * 5 AS five_mins,
COUNT(*)
FROM rides
WHERE pickup_datetime < '2016-01-02 00:00:00'
GROUP BY
hours,
five_mins
ORDER BY
hours ASC,
five_mins ASC
Query id: f9606d18-3852-41a3-9b21-7b58dc23c9da
┌─hours─┬─five_mins─┬─count()─┐
│ 0 │ 0 │ 703 │
│ 0 │ 5 │ 1482 │
│ 0 │ 10 │ 1959 │
│ 0 │ 15 │ 2200 │
│ 0 │ 20 │ 2285 │
│ 0 │ 25 │ 2291 │
│ 0 │ 30 │ 2349 │
│ 0 │ 35 │ 2328 │
│ 0 │ 40 │ 2440 │
│ 0 │ 45 │ 2372 │
│ 0 │ 50 │ 2388 │
│ 0 │ 55 │ 2473 │
│ 1 │ 0 │ 2395 │
│ 1 │ 5 │ 2510 │
│ 1 │ 10 │ 2412 │
│ 1 │ 15 │ 2482 │
│ 1 │ 20 │ 2428 │
│ 1 │ 25 │ 2433 │
│ 1 │ 30 │ 2337 │
│ 1 │ 35 │ 2366 │
│ 1 │ 40 │ 2325 │
│ 1 │ 45 │ 2257 │
│ 1 │ 50 │ 2316 │
│ 1 │ 55 │ 2250 │
│ 2 │ 0 │ 2303 │
│ 2 │ 5 │ 2259 │
│ 2 │ 10 │ 2253 │
│ 2 │ 15 │ 2176 │
│ 2 │ 20 │ 2138 │
│ 2 │ 25 │ 2131 │
│ 2 │ 30 │ 2135 │
│ 2 │ 35 │ 2047 │
│ 2 │ 40 │ 2086 │
│ 2 │ 45 │ 2049 │
│ 2 │ 50 │ 1938 │
│ 2 │ 55 │ 1975 │
│ 3 │ 0 │ 1966 │
│ 3 │ 5 │ 1927 │
│ 3 │ 10 │ 1896 │
│ 3 │ 15 │ 1840 │
│ 3 │ 20 │ 1917 │
│ 3 │ 25 │ 1816 │
│ 3 │ 30 │ 1784 │
│ 3 │ 35 │ 1607 │
│ 3 │ 40 │ 1652 │
│ 3 │ 45 │ 1619 │
│ 3 │ 50 │ 1655 │
│ 3 │ 55 │ 1598 │
│ 4 │ 0 │ 1684 │
│ 4 │ 5 │ 1608 │
│ 4 │ 10 │ 1533 │
│ 4 │ 15 │ 1468 │
│ 4 │ 20 │ 1430 │
│ 4 │ 25 │ 1303 │
│ 4 │ 30 │ 1230 │
│ 4 │ 35 │ 1171 │
│ 4 │ 40 │ 1092 │
│ 4 │ 45 │ 1023 │
│ 4 │ 50 │ 998 │
│ 4 │ 55 │ 879 │
│ 5 │ 0 │ 837 │
│ 5 │ 5 │ 858 │
│ 5 │ 10 │ 733 │
│ 5 │ 15 │ 776 │
│ 5 │ 20 │ 662 │
│ 5 │ 25 │ 703 │
│ 5 │ 30 │ 629 │
│ 5 │ 35 │ 624 │
│ 5 │ 40 │ 611 │
│ 5 │ 45 │ 560 │
│ 5 │ 50 │ 555 │
│ 5 │ 55 │ 554 │
│ 6 │ 0 │ 551 │
│ 6 │ 5 │ 519 │
│ 6 │ 10 │ 516 │
│ 6 │ 15 │ 501 │
│ 6 │ 20 │ 500 │
│ 6 │ 25 │ 510 │
│ 6 │ 30 │ 497 │
│ 6 │ 35 │ 524 │
│ 6 │ 40 │ 508 │
│ 6 │ 45 │ 463 │
│ 6 │ 50 │ 479 │
│ 6 │ 55 │ 483 │
│ 7 │ 0 │ 476 │
│ 7 │ 5 │ 456 │
│ 7 │ 10 │ 447 │
│ 7 │ 15 │ 459 │
│ 7 │ 20 │ 451 │
│ 7 │ 25 │ 450 │
│ 7 │ 30 │ 440 │
│ 7 │ 35 │ 418 │
│ 7 │ 40 │ 417 │
│ 7 │ 45 │ 426 │
│ 7 │ 50 │ 413 │
│ 7 │ 55 │ 401 │
│ 8 │ 0 │ 396 │
│ 8 │ 5 │ 405 │
│ 8 │ 10 │ 409 │
│ 8 │ 15 │ 447 │
│ 8 │ 20 │ 408 │
│ 8 │ 25 │ 404 │
│ 8 │ 30 │ 396 │
│ 8 │ 35 │ 421 │
│ 8 │ 40 │ 441 │
│ 8 │ 45 │ 374 │
│ 8 │ 50 │ 427 │
│ 8 │ 55 │ 412 │
│ 9 │ 0 │ 410 │
│ 9 │ 5 │ 454 │
│ 9 │ 10 │ 487 │
│ 9 │ 15 │ 510 │
│ 9 │ 20 │ 488 │
│ 9 │ 25 │ 456 │
│ 9 │ 30 │ 456 │
│ 9 │ 35 │ 539 │
│ 9 │ 40 │ 542 │
│ 9 │ 45 │ 564 │
│ 9 │ 50 │ 568 │
│ 9 │ 55 │ 581 │
│ 10 │ 0 │ 596 │
│ 10 │ 5 │ 592 │
│ 10 │ 10 │ 669 │
│ 10 │ 15 │ 656 │
│ 10 │ 20 │ 703 │
│ 10 │ 25 │ 667 │
│ 10 │ 30 │ 706 │
│ 10 │ 35 │ 662 │
│ 10 │ 40 │ 739 │
│ 10 │ 45 │ 788 │
│ 10 │ 50 │ 814 │
│ 10 │ 55 │ 843 │
│ 11 │ 0 │ 779 │
│ 11 │ 5 │ 861 │
│ 11 │ 10 │ 798 │
│ 11 │ 15 │ 881 │
│ 11 │ 20 │ 893 │
│ 11 │ 25 │ 977 │
│ 11 │ 30 │ 1006 │
│ 11 │ 35 │ 1025 │
│ 11 │ 40 │ 1114 │
│ 11 │ 45 │ 1120 │
│ 11 │ 50 │ 1165 │
│ 11 │ 55 │ 1133 │
│ 12 │ 0 │ 1132 │
│ 12 │ 5 │ 1156 │
│ 12 │ 10 │ 1157 │
│ 12 │ 15 │ 1166 │
│ 12 │ 20 │ 1203 │
│ 12 │ 25 │ 1122 │
│ 12 │ 30 │ 1119 │
│ 12 │ 35 │ 1181 │
│ 12 │ 40 │ 1194 │
│ 12 │ 45 │ 1228 │
│ 12 │ 50 │ 1174 │
│ 12 │ 55 │ 1195 │
│ 13 │ 0 │ 1234 │
│ 13 │ 5 │ 1204 │
│ 13 │ 10 │ 1276 │
│ 13 │ 15 │ 1285 │
│ 13 │ 20 │ 1272 │
│ 13 │ 25 │ 1280 │
│ 13 │ 30 │ 1282 │
│ 13 │ 35 │ 1252 │
│ 13 │ 40 │ 1420 │
│ 13 │ 45 │ 1370 │
│ 13 │ 50 │ 1430 │
│ 13 │ 55 │ 1389 │
│ 14 │ 0 │ 1423 │
│ 14 │ 5 │ 1346 │
│ 14 │ 10 │ 1293 │
│ 14 │ 15 │ 1328 │
│ 14 │ 20 │ 1355 │
│ 14 │ 25 │ 1333 │
│ 14 │ 30 │ 1367 │
│ 14 │ 35 │ 1475 │
│ 14 │ 40 │ 1397 │
│ 14 │ 45 │ 1439 │
│ 14 │ 50 │ 1444 │
│ 14 │ 55 │ 1352 │
│ 15 │ 0 │ 1315 │
│ 15 │ 5 │ 1327 │
│ 15 │ 10 │ 1330 │
│ 15 │ 15 │ 1380 │
│ 15 │ 20 │ 1351 │
│ 15 │ 25 │ 1322 │
│ 15 │ 30 │ 1411 │
│ 15 │ 35 │ 1346 │
│ 15 │ 40 │ 1355 │
│ 15 │ 45 │ 1353 │
│ 15 │ 50 │ 1386 │
│ 15 │ 55 │ 1263 │
│ 16 │ 0 │ 1308 │
│ 16 │ 5 │ 1278 │
│ 16 │ 10 │ 1185 │
│ 16 │ 15 │ 1256 │
│ 16 │ 20 │ 1223 │
│ 16 │ 25 │ 1223 │
│ 16 │ 30 │ 1204 │
│ 16 │ 35 │ 1269 │
│ 16 │ 40 │ 1184 │
│ 16 │ 45 │ 1208 │
│ 16 │ 50 │ 1182 │
│ 16 │ 55 │ 1266 │
│ 17 │ 0 │ 1228 │
│ 17 │ 5 │ 1323 │
│ 17 │ 10 │ 1285 │
│ 17 │ 15 │ 1242 │
│ 17 │ 20 │ 1303 │
│ 17 │ 25 │ 1272 │
│ 17 │ 30 │ 1335 │
│ 17 │ 35 │ 1297 │
│ 17 │ 40 │ 1328 │
│ 17 │ 45 │ 1444 │
│ 17 │ 50 │ 1423 │
│ 17 │ 55 │ 1367 │
│ 18 │ 0 │ 1365 │
│ 18 │ 5 │ 1377 │
│ 18 │ 10 │ 1327 │
│ 18 │ 15 │ 1352 │
│ 18 │ 20 │ 1370 │
│ 18 │ 25 │ 1316 │
│ 18 │ 30 │ 1448 │
│ 18 │ 35 │ 1341 │
│ 18 │ 40 │ 1475 │
│ 18 │ 45 │ 1508 │
│ 18 │ 50 │ 1543 │
│ 18 │ 55 │ 1358 │
│ 19 │ 0 │ 1308 │
│ 19 │ 5 │ 1368 │
│ 19 │ 10 │ 1426 │
│ 19 │ 15 │ 1372 │
│ 19 │ 20 │ 1365 │
│ 19 │ 25 │ 1315 │
│ 19 │ 30 │ 1321 │
│ 19 │ 35 │ 1260 │
│ 19 │ 40 │ 1280 │
│ 19 │ 45 │ 1246 │
│ 19 │ 50 │ 1216 │
│ 19 │ 55 │ 1192 │
│ 20 │ 0 │ 1154 │
│ 20 │ 5 │ 1091 │
│ 20 │ 10 │ 1100 │
│ 20 │ 15 │ 1118 │
│ 20 │ 20 │ 1105 │
│ 20 │ 25 │ 1060 │
│ 20 │ 30 │ 1135 │
│ 20 │ 35 │ 1126 │
│ 20 │ 40 │ 1098 │
│ 20 │ 45 │ 1136 │
│ 20 │ 50 │ 1039 │
│ 20 │ 55 │ 1087 │
│ 21 │ 0 │ 1042 │
│ 21 │ 5 │ 1025 │
│ 21 │ 10 │ 1113 │
│ 21 │ 15 │ 1127 │
│ 21 │ 20 │ 1102 │
│ 21 │ 25 │ 1068 │
│ 21 │ 30 │ 1104 │
│ 21 │ 35 │ 1083 │
│ 21 │ 40 │ 1101 │
│ 21 │ 45 │ 1127 │
│ 21 │ 50 │ 1086 │
│ 21 │ 55 │ 1095 │
│ 22 │ 0 │ 1103 │
│ 22 │ 5 │ 1199 │
│ 22 │ 10 │ 1126 │
│ 22 │ 15 │ 1130 │
│ 22 │ 20 │ 1110 │
│ 22 │ 25 │ 1098 │
│ 22 │ 30 │ 1197 │
│ 22 │ 35 │ 1189 │
│ 22 │ 40 │ 1167 │
│ 22 │ 45 │ 1247 │
│ 22 │ 50 │ 1146 │
│ 22 │ 55 │ 1030 │
│ 23 │ 0 │ 1139 │
│ 23 │ 5 │ 1082 │
│ 23 │ 10 │ 1065 │
│ 23 │ 15 │ 986 │
│ 23 │ 20 │ 1032 │
│ 23 │ 25 │ 988 │
│ 23 │ 30 │ 1032 │
│ 23 │ 35 │ 1100 │
│ 23 │ 40 │ 1142 │
│ 23 │ 45 │ 1145 │
│ 23 │ 50 │ 1149 │
│ 23 │ 55 │ 1063 │
└───────┴───────────┴─────────┘
288 rows in set. Elapsed: 0.015 sec. Processed 352.26 thousand rows, 1.41 MB (23.05 million rows/s., 92.19 MB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 170 |
ClickHouse | 15 |
Calculating number of rides on New Year’s morning originated from within 400m of Times Square, in 30 minute buckets.
TimescaleDB
postgres=# SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
FROM rides
WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
AND pickup_datetime < '2016-01-01 14:00'
GROUP BY thirty_min ORDER BY thirty_min;
thirty_min | near_times_sq
---------------------+---------------
2016-01-01 00:00:00 | 74
2016-01-01 00:30:00 | 102
2016-01-01 01:00:00 | 120
2016-01-01 01:30:00 | 98
2016-01-01 02:00:00 | 112
2016-01-01 02:30:00 | 109
2016-01-01 03:00:00 | 163
2016-01-01 03:30:00 | 181
2016-01-01 04:00:00 | 214
2016-01-01 04:30:00 | 185
2016-01-01 05:00:00 | 158
2016-01-01 05:30:00 | 113
2016-01-01 06:00:00 | 102
2016-01-01 06:30:00 | 91
2016-01-01 07:00:00 | 88
2016-01-01 07:30:00 | 58
2016-01-01 08:00:00 | 72
2016-01-01 08:30:00 | 94
2016-01-01 09:00:00 | 115
2016-01-01 09:30:00 | 118
2016-01-01 10:00:00 | 135
2016-01-01 10:30:00 | 160
2016-01-01 11:00:00 | 212
2016-01-01 11:30:00 | 229
2016-01-01 12:00:00 | 244
2016-01-01 12:30:00 | 230
2016-01-01 13:00:00 | 235
2016-01-01 13:30:00 | 238
(28 rows)
Time: 167.335 ms
ClickHouse
SELECT
toStartOfInterval(pickup_datetime, toIntervalMinute(30)) AS thirty_min,
COUNT(*) AS near_times_sq
FROM rides
WHERE (greatCircleDistance(-73.9851, 40.7589, pickup_longitude, pickup_latitude) < 400) AND (pickup_datetime < '2016-01-01 14:00:00')
GROUP BY thirty_min
ORDER BY thirty_min ASC
Query id: 047dfcbf-8a49-47b1-a2b6-2a44600bed30
┌──────────thirty_min─┬─near_times_sq─┐
│ 2016-01-01 00:00:00 │ 73 │
│ 2016-01-01 00:30:00 │ 101 │
│ 2016-01-01 01:00:00 │ 120 │
│ 2016-01-01 01:30:00 │ 98 │
│ 2016-01-01 02:00:00 │ 111 │
│ 2016-01-01 02:30:00 │ 109 │
│ 2016-01-01 03:00:00 │ 163 │
│ 2016-01-01 03:30:00 │ 181 │
│ 2016-01-01 04:00:00 │ 215 │
│ 2016-01-01 04:30:00 │ 186 │
│ 2016-01-01 05:00:00 │ 158 │
│ 2016-01-01 05:30:00 │ 114 │
│ 2016-01-01 06:00:00 │ 101 │
│ 2016-01-01 06:30:00 │ 92 │
│ 2016-01-01 07:00:00 │ 88 │
│ 2016-01-01 07:30:00 │ 60 │
│ 2016-01-01 08:00:00 │ 72 │
│ 2016-01-01 08:30:00 │ 95 │
│ 2016-01-01 09:00:00 │ 114 │
│ 2016-01-01 09:30:00 │ 118 │
│ 2016-01-01 10:00:00 │ 136 │
│ 2016-01-01 10:30:00 │ 160 │
│ 2016-01-01 11:00:00 │ 212 │
│ 2016-01-01 11:30:00 │ 230 │
│ 2016-01-01 12:00:00 │ 245 │
│ 2016-01-01 12:30:00 │ 228 │
│ 2016-01-01 13:00:00 │ 237 │
│ 2016-01-01 13:30:00 │ 239 │
└─────────────────────┴───────────────┘
28 rows in set. Elapsed: 0.016 sec. Processed 196.61 thousand rows, 4.33 MB (12.11 million rows/s., 266.38 MB/s.)
Database | Query Duration (ms) |
---|---|
TimescaleDB | 167 |
ClickHouse | 16 |
After running the above benchmarks, I felt that I am missing something important about TimescaleDB and started going through their docs again until I realised that TimescaleDB doesn’t enable compression by default, which is quite strange. I ran the benchmark queries by enabling compression.
postgres=# ALTER TABLE rides SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'vendor_id'
);
ALTER TABLE
postgres=# SELECT add_compression_policy('rides', INTERVAL '0 days');
add_compression_policy
------------------------
1000
(1 row)
After enabling the compression, verified that it triggered the compression from logs.
2021-10-31 02:16:48.675 UTC [252] LOG: completed compressing chunk _timescaledb_internal._hyper_1_1_chunk
2021-10-31 02:16:57.931 UTC [255] LOG: completed compressing chunk _timescaledb_internal._hyper_1_2_chunk
2021-10-31 02:17:04.964 UTC [258] LOG: completed compressing chunk _timescaledb_internal._hyper_1_5_chunk
2021-10-31 02:17:18.686 UTC [260] LOG: completed compressing chunk _timescaledb_internal._hyper_1_6_chunk
2021-10-31 02:17:25.894 UTC [264] LOG: completed compressing chunk _timescaledb_internal._hyper_1_7_chunk
2021-10-31 02:17:39.172 UTC [266] LOG: completed compressing chunk _timescaledb_internal._hyper_1_8_chunk
2021-10-31 02:17:49.474 UTC [270] LOG: completed compressing chunk _timescaledb_internal._hyper_1_9_chunk
2021-10-31 02:17:55.110 UTC [273] LOG: completed compressing chunk _timescaledb_internal._hyper_1_10_chunk
2021-10-31 02:18:03.536 UTC [275] LOG: completed compressing chunk _timescaledb_internal._hyper_1_3_chunk
2021-10-31 02:18:08.258 UTC [278] LOG: completed compressing chunk _timescaledb_internal._hyper_1_4_chunk
2021-10-31 02:18:08.403 UTC [280] LOG: completed compressing chunk _timescaledb_internal._hyper_1_11_chunk
2021-10-31 02:18:08.510 UTC [281] LOG: completed compressing chunk _timescaledb_internal._hyper_1_12_chunk
This definitely reduced the disk usage by TimescaleDB.
# du -sh /var/lib/postgresql/data/
1.6G /var/lib/postgresql/data/
Let’s run the above queries again.
Counting all the rides that happened in 1 month.
TimescaleDB (with compression)
postgres=# select count(*) from rides;
count
----------
10906860
(1 row)
Time: 282.369 ms
Counting the number of rides on daily basis.
TimescaleDB (with compression)
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
2016-01-06 00:00:00 | 348516
2016-01-07 00:00:00 | 364894
2016-01-08 00:00:00 | 392070
2016-01-09 00:00:00 | 405825
2016-01-10 00:00:00 | 351788
2016-01-11 00:00:00 | 342651
2016-01-12 00:00:00 | 367390
2016-01-13 00:00:00 | 395090
2016-01-14 00:00:00 | 396473
2016-01-15 00:00:00 | 401289
2016-01-16 00:00:00 | 411899
2016-01-17 00:00:00 | 379156
2016-01-18 00:00:00 | 341481
2016-01-19 00:00:00 | 385187
2016-01-20 00:00:00 | 382105
2016-01-21 00:00:00 | 399654
2016-01-22 00:00:00 | 420162
2016-01-23 00:00:00 | 78133
2016-01-24 00:00:00 | 159766
2016-01-25 00:00:00 | 282087
2016-01-26 00:00:00 | 327655
2016-01-27 00:00:00 | 359180
2016-01-28 00:00:00 | 383326
2016-01-29 00:00:00 | 414039
2016-01-30 00:00:00 | 435369
2016-01-31 00:00:00 | 361505
2017-11-17 00:00:00 | 2
(32 rows)
Time: 1033.075 ms (00:01.033)
Calculating the average fare amount for passengers.
TimescaleDB (with compression)
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day;
day | avg
---------------------+---------------------
2016-01-01 00:00:00 | 12.5464748850129787
2016-01-02 00:00:00 | 12.1129878886746750
2016-01-03 00:00:00 | 12.8262352076841150
2016-01-04 00:00:00 | 11.9116533573721472
2016-01-05 00:00:00 | 11.7534235580737452
2016-01-06 00:00:00 | 11.7824805635293235
2016-01-07 00:00:00 | 11.9498961299166930
2016-01-08 00:00:00 | 11.8738271415577235
2016-01-09 00:00:00 | 11.4878104510668158
2016-01-10 00:00:00 | 12.0888246569772508
2016-01-11 00:00:00 | 12.1343557409576630
2016-01-12 00:00:00 | 11.8598489409162272
2016-01-13 00:00:00 | 11.8031804213844504
2016-01-14 00:00:00 | 12.3123409782144971
2016-01-15 00:00:00 | 12.5613139967255876
2016-01-16 00:00:00 | 11.8351508976680107
2016-01-17 00:00:00 | 12.0295602006362590
2016-01-18 00:00:00 | 11.8623529387202913
2016-01-19 00:00:00 | 12.3472510415266765
2016-01-20 00:00:00 | 12.4609413534048953
2016-01-21 00:00:00 | 12.6083024358000252
2016-01-22 00:00:00 | 12.0812572192038226
2016-01-23 00:00:00 | 11.8564034256627468
2016-01-24 00:00:00 | 12.1723178660326188
2016-01-25 00:00:00 | 14.6872562077357373
2016-01-26 00:00:00 | 14.2577452509214630
2016-01-27 00:00:00 | 13.2432283566172145
2016-01-28 00:00:00 | 12.9854551087781026
2016-01-29 00:00:00 | 12.5911266525638913
2016-01-30 00:00:00 | 12.2759556743269683
2016-01-31 00:00:00 | 12.5867816634207905
2017-11-17 00:00:00 | 9.7500000000000000
(32 rows)
Time: 1599.145 ms (00:01.599)
Calculating number of rides that took place for each rate type
TimescaleDB (with compression)
postgres=# SELECT rate_code, COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code;
rate_code | num_trips
-----------+-----------
1 | 10626315
2 | 225019
3 | 16822
4 | 4696
5 | 33688
6 | 102
99 | 216
(7 rows)
Time: 974.998 ms
Overall, although some TimescaleDB queries became faster by enabling compression but many others became bit slower probably due to decompression overhead. This may be the reason why TimescaleDB disable compression by default.
I wanted to understand how the queries work in TimescaleDB. So I decided to try out EXPLAIN (ANALYZE, FORMAT JSON, VERBOSE, BUFFERS) <query>
.
It stores the data in chunks (12 in our case) and runs the query over those chunks parallelly and aggregate the result.
postgres=# SELECT show_chunks('rides');
show_chunks
-----------------------------------------
_timescaledb_internal._hyper_1_1_chunk
_timescaledb_internal._hyper_1_2_chunk
...
...
_timescaledb_internal._hyper_1_11_chunk
_timescaledb_internal._hyper_1_12_chunk
(12 rows)
I was curious if changing this chunk size/count will impact the query performance.
Reduce chunk count
Let’s first try to reduce the number of chunks.
postgres=# SELECT set_chunk_time_interval('rides', INTERVAL '1 year');
set_chunk_time_interval
-------------------------
(1 row)
postgres=# SELECT show_chunks('rides');
show_chunks
-------------------------------------------
_timescaledb_internal._hyper_1_1565_chunk
_timescaledb_internal._hyper_1_1566_chunk
_timescaledb_internal._hyper_1_1567_chunk
_timescaledb_internal._hyper_1_1568_chunk
(4 rows)
Let’s run the query to calculate the number of rides on daily basis.
TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
2016-01-06 00:00:00 | 348516
2016-01-07 00:00:00 | 364894
2016-01-08 00:00:00 | 392070
2016-01-09 00:00:00 | 405825
2016-01-10 00:00:00 | 351788
2016-01-11 00:00:00 | 342651
2016-01-12 00:00:00 | 367390
2016-01-13 00:00:00 | 395090
2016-01-14 00:00:00 | 396473
2016-01-15 00:00:00 | 401289
2016-01-16 00:00:00 | 411899
2016-01-17 00:00:00 | 379156
2016-01-18 00:00:00 | 341481
2016-01-19 00:00:00 | 385187
2016-01-20 00:00:00 | 382105
2016-01-21 00:00:00 | 399654
2016-01-22 00:00:00 | 420162
2016-01-23 00:00:00 | 78133
2016-01-24 00:00:00 | 159766
2016-01-25 00:00:00 | 282087
2016-01-26 00:00:00 | 327655
2016-01-27 00:00:00 | 359180
2016-01-28 00:00:00 | 383326
2016-01-29 00:00:00 | 414039
2016-01-30 00:00:00 | 435369
2016-01-31 00:00:00 | 361505
2017-11-17 00:00:00 | 2
(32 rows)
Time: 1184.163 ms (00:01.184)
Query performance reduced by this change.
Increase chunk count
Let’s now try to increase the number of chunks.
postgres=# SELECT set_chunk_time_interval('rides', INTERVAL '1 day');
set_chunk_time_interval
-------------------------
(1 row)
postgres=# SELECT show_chunks('rides');
show_chunks
-------------------------------------------
_timescaledb_internal._hyper_1_1501_chunk
_timescaledb_internal._hyper_1_1502_chunk
...
...
_timescaledb_internal._hyper_1_1563_chunk
_timescaledb_internal._hyper_1_1564_chunk
(64 rows)
This created 64 chunks.
Let’s run the query to calculate the number of rides on daily basis.
TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
2016-01-06 00:00:00 | 348516
2016-01-07 00:00:00 | 364894
2016-01-08 00:00:00 | 392070
2016-01-09 00:00:00 | 405825
2016-01-10 00:00:00 | 351788
2016-01-11 00:00:00 | 342651
2016-01-12 00:00:00 | 367390
2016-01-13 00:00:00 | 395090
2016-01-14 00:00:00 | 396473
2016-01-15 00:00:00 | 401289
2016-01-16 00:00:00 | 411899
2016-01-17 00:00:00 | 379156
2016-01-18 00:00:00 | 341481
2016-01-19 00:00:00 | 385187
2016-01-20 00:00:00 | 382105
2016-01-21 00:00:00 | 399654
2016-01-22 00:00:00 | 420162
2016-01-23 00:00:00 | 78133
2016-01-24 00:00:00 | 159766
2016-01-25 00:00:00 | 282087
2016-01-26 00:00:00 | 327655
2016-01-27 00:00:00 | 359180
2016-01-28 00:00:00 | 383326
2016-01-29 00:00:00 | 414039
2016-01-30 00:00:00 | 435369
2016-01-31 00:00:00 | 361505
2017-11-17 00:00:00 | 2
(32 rows)
Time: 691.523 ms
Query performance improved by this change.
Database | Chunk Count | Query Duration (ms) |
---|---|---|
TimescaleDB | 4 | 1184 |
TimescaleDB | 12 | 1055 |
TimescaleDB | 64 | 692 |
- We found ClickHouse did better while evaluting on storage, read, write performance.
- ClickHouse defaults are pretty good and doesn’t require any server side tuning, while tuning the table schema definitely helps. TimescaleDB requires tuning to make it perform faster.
Last but not the least, I may be completely wrong in my evaluation. Feel free to point it out. We all are doing our best to write good software.