Amsterdam DBA notes

Friday, December 31, 2021

Happy New Database Year everyone

This end of 2021 post by Dr Andy Pavlo

https://ottertune.com/blog/2021-databases-retrospective/

is generally good news for DBAs and DB Engineering plus Andy writes with a sense of humour.

Andy is well regarded in the industry (I'm curious about Ottertune too, an AI project to tune database and do my job for me!?):

Andy Pavlo is an Associate Professor in the Computer Science Department at Carnegie Mellon University (CMU), where his research focuses on database management systems like self-driving databases, transaction processing systems and large-scale data analytics.

It is easy to get dazzled by some of the big names, big numbers and apparently scandalous industry stories. I'm not sure I would believe all the hype, but I liked this summary from Andy:

Foremost, it is a good thing that a relational database system has become the first choice in greenfield applications. This shows the staying power of Ted Codd’s relational model from the 1970s. Second, PostgreSQL is a great database system. Yes, it has known issues and dark corners, as does every DBMS. But with so much attention and energy focused on it, PostgreSQL is only going to get better over the years.

I think it would be good to get some more light on the postgres dark corners, if you understand what can go wrong then you can avoid it.

Plus I totally agree with Andy's point above i.e. "every DBMS has its dark corners".

Happy New Year everyone and looking forward to 2022 and maybe a bit more activity on the blog side :thinking:

Cheers

Dave

Tuesday, May 5, 2020

Setting up vscode-bash-debug - Only bash versions 4.* or 5.* are supported.

While working through vscode-bash-debug setup

- https://github.com/rogalmic/vscode-bash-debug

- https://code.visualstudio.com/docs/editor/debugging

I hit this error: Only bash versions 4.* or 5.* are supported.

For macosx users (we appear to be the users hitting this issue), the solution is to simply to update the bash version, which can be done in about 30 seconds with a single brew command:

brew install bash

after this restart VSC and should pickup bash version 5.x :)

More details in my bash-by-example repo (my notes and scripts around common command-line bash tasks):

- https://github.com/dgapitts/bash-by-example/blob/master/example_003_vscode-bash-debug_setup_and_brew/README.md

NB This part of a buffer project to make handy notes on the various bash cli/scripting tool I typically use

- https://github.com/dgapitts/bash-by-example

Saturday, April 18, 2020

VirtualBox VM was created with a user that doesn't match the current user running Vagrant.

Recently I moved some project files from my old to new laptop and hit this "vagrant up" error

~/projects/vagrant-postgres9.6 $ vagrant up

The VirtualBox VM was created with a user that doesn't match the

current user running Vagrant. VirtualBox requires that the same user

be used to manage the VM that was created. Please re-run Vagrant with

that user. This is not a Vagrant issue.

The UID used to create the VM was: 303684054

Your UID is: 501

Googling I soon found

The VirtualBox VM was created with a user that doesn't match the current user running Vagrant. #8630https://github.com/hashicorp/vagrant/issues/8630

and TLDR the fix is a simple mv on the .vagrant file:

~/projects/vagrant-postgres9.6 $ mv .vagrant .vagrant.bak

~/projects/vagrant-postgres9.6 $ vagrant up

Bringing machine 'default' up with 'virtualbox' provider...

==> default: Box 'centos/7' could not be found. Attempting to find and install...

default: Box Provider: virtualbox

default: Box Version: >= 0

==> default: Loading metadata for box 'centos/7'

default: URL: https://vagrantcloud.com/centos/7

==> default: Adding box 'centos/7' (v1905.1) for provider: virtualbox

NB In case your curious ... for more details around my vagrant postgres setup please see:

https://github.com/dgapitts/vagrant-postgres9.6

I also presented to the Amsterdam Postgres UserGroup last year:

Sunday, October 20, 2019

awk stddev (standard deviation) and avg (average) per file

Input files

~/Downloads/logs $ ls

application-2019-10-19-21.log.gz application-2019-10-20-02.log.gz application-2019-10-20-07.log.gz application-2019-10-20-12.log.gz

application-2019-10-19-22.log.gz application-2019-10-20-03.log.gz application-2019-10-20-08.log.gz application-2019-10-20-13.log.gz

application-2019-10-19-23.log.gz application-2019-10-20-04.log.gz application-2019-10-20-09.log.gz application-2019-10-20-14.log.gz

application-2019-10-20-00.log.gz application-2019-10-20-05.log.gz application-2019-10-20-10.log.gz application-2019-10-20-15.log.gz

application-2019-10-20-01.log.gz application-2019-10-20-06.log.gz application-2019-10-20-11.log.gz application-2019-10-20-16.log

sample data rows

~/Downloads/logs $ zgrep GameLoop application-2019-10-19-21.log.gz | head -3

info: GameLoop execution time: 271938 nanoseconds. {"timestamp":"2019-10-19 21:00:35"}

info: GameLoop execution time: 92681 nanoseconds. {"timestamp":"2019-10-19 21:00:45"}

info: GameLoop execution time: 125291 nanoseconds. {"timestamp":"2019-10-19 21:01:47"}

awk based script to calculate average and standard deviation

~/Downloads/logs $ cat calculate_avg_and_stddev_per_file.sh

# https://stackoverflow.com/questions/18786073/compute-average-and-standard-deviation-with-awk

for i in `ls application-2019-10-*.gz`;do echo $i;zgrep GameLoop $i|awk '{x+=$5;y+=$5^2}END{print x/NR " " sqrt(y/NR-(x/NR)^2)}';done

output:

~/Downloads/logs $ ./calculate_avg_and_stddev_per_file.sh

application-2019-10-19-21.log.gz

94922.9 54338.7

application-2019-10-19-22.log.gz

82895.5 30873.3

application-2019-10-19-23.log.gz

84054.2 29225

application-2019-10-20-00.log.gz

86505.4 26829.8

application-2019-10-20-01.log.gz

86339.9 29854.3

application-2019-10-20-02.log.gz

...

Saturday, October 31, 2015

Web Performance - Case for HTTP/2 with Andy Davies

Summary

Interesting meetup : http://www.meetup.com/Dutch-Web-Operations-Meetup/events/224787669/

Not really me area of experteasse but still intersting.

Andy Davies covered many of the topics in

Software Engineering Radio Episode 232: Mark Nottingham on HTTP/2
http://www.se-radio.net/2015/07/episode-232-mark-nottingham-on-http2/

Notes

Web Performance - Case for HTTP/2 with Andy Davies

@AndyDavies

AWS

1999 RFC2126 HTTP/1.1

HTTP/1.x doesn't use the network efficiently

- good for transfering large files

- but most webpages are made up of lots of small sessiom

- each TCP connection only supports one request at a time

- HTTP Pipeline

- splitting resources over mulitiple hosts (e.g. bbc.co.uk)

- headers sent on every request ... Mark Nottingham found huge duplication of content (200K of cookies with every page load .. big UK online retailer)

- reduce requests ... CSS and JavaScripts bundles,

image sprites : browser has to decode whole image (expensive on smaller mobiles)

- use gulp or grunt to automate working around limitations for http/1.x

- test with image of little sqaure iimages ... skewed to show http2 strenght

- header frame and data frames

- insanely complex 'prioritse weights and dependencies'

- headers deduplicates into a dictionary

- long latency test (ireland-singapore) http 2 wins ...

- but only benefit for low latency test is security (over tls)

- what happens if you have packet loss (over a session on a single connection)

- server push ... currently we have "network idle" time while the "server builds page"

- now push critical css

John Mellor (Google)

- parallel image loading looks good after 15% and very good at 25%

- do people dislike partial imaging loading (needs bigger study)

Good browser support for http2

- especially firefox and chrome

Limited server for http2

- h2o server

- nginx should have something later this year

- http2 killing off apache? nope http2 code is now being ported

- haproxy doesn't support http2 yet

Problems with SPDY

- issues with priotisation rules not being

h2spec

- japanese buid tool for checking if the

h2i

- interactive consul for debug

- tricky to use

reducing sharding

- two shard domains but usingn a single tcp connection

3rd parties still growing

- ab test

- advertising

- tag management

w3c resource hints <link ref='...'>

- dns-prefetch (all browsers already provide this)

- preconnet (chrome only at the moment)

- preload

testing

- chrome dev tools .. network testimh

- webpagetest .. firefox identifies resources correctly (unlike chrome)

f5, acmi, iss

MariaDB meetup at eBay (Oct 2015)

MariaDB Ab (Oct 2015 - eBay)

Summary

A lagre number of MariaDB developers and engineers (including Monty)
Big MariaDB gathering at the Booking.com this week
I've never attended a meetup before with 16 presentation in one evening ;)
Lots of interesting developments, the introduction of histograms was particularly interesting

Notes from various presentors

1) Rasmus

MariaDB, MariaDB Galeria, MaxScale << key products with multiple releases per year

mariadb.org/jira << nearly 10,000 issues

Weekly sprint ... finish every Tues at 4pm (Swedish Time) / 3 pm Nederlands ... weekly call

2) Daniel Batholomew ... release manager

MariaDB tree on github

Buildbot ... test and re-test

Sergei Golubchik (Chef Architect) raises JIRA for actual releases

Push to mirrors into US (Origean) and Europe (Netherlands)

Anouncement to world via normal geek social networks: google+ group ...

3) Sergei Golubchik (Chef Architect)

Data Encryption at Rest

Encrypting keys?

What is encrypted: tablespace, logs, temp/swap?

encryption plugins - file_key_management plugins, simple APIs?

Apropos ... XtraDB/InnoDB 'page compression' and 'data scrubbing'

Benchmark ... ro (order of 1% slower), rw/filesort (order of 10% slower), bin log encryption (< 4%)

temp files (ie sorts) are only encrypted if they are written to disc

5) Sergei Pertunia

optimizing troubleshooting

- EXPLAIN you query .. are the stats true, where is the time spent

- slow query log ... rows_examined

- PERFORMANCE_SCHEMA .. table_io_waits_summary_by_table

- ANALYZE ... optimize the query and trace the execution

rows (estimate) vs r_rows (real)

filters (estimate) vs r_filter (real) e.g. do we expect to 50% or 5% or rows to match

- EXPLAIN/ANALYZE FORMAT=JSON ... both look hard to read compared to

r_total_time_ms, r_buffer_size

r_indexes

6) Colin Charles

- SHOW PLUGINS

- INFORMATION.ALL_PLUGINS

- AUTH_SOCKET

- PAM AUTHENTICATION (Google authenication on phone?)

- Password validation << cracklib_password

- Server_audit << required by regulator

- query_cache_info <<

- information_schema.query_response_time << distro reponse time bar/count chart

7) Otto Kekalainen

- passwordless authentication

- pain with automation

- /etc/mysql/debian.cnf << clear text root password !?

- unix_socket to the rescue

- ONLY debian/ubuntu NOT ON Centos/RPM

8) Alexander Barkov

- REGEX

- old library - Henr Spencer?

- modern PCRE (MCDEV-4424) ... google summer or code?

- several new REGEX functions: REGEX_INSTR, REGEX_REPLACE

- performance questions? the new library is generally faster

9) Jan Lindstrom

- InnoDB in MariaDB 10.1 (both XtraDB from Percona + InnoDB Oracle)

- Galera integration

- wsrep_on=1 and speciy library

- Page compesssion for SSD

- innodb-file-format=Barracuda

- punch whole

- create table t1(...) page_compressed=1;

- zip, lz4, lzo, bzip2, snappy

-Defragement

- remove duplicate are removed from page

- too empty pages are merged

- does not return totally empty pages

10) Massimiliano Pinto - What is MaxScale?

Classic proxy between db server and the client

Key Parts: Router, Montior, Protocol, Authentication, Filters

Bin Log Router via MaxScale:

offload network load from master

easier to promote a new master

the remaining slaves are unaffected as they are still interfacing with maxscale

blr_distrubute_binlog_record

events written to local binlog before distributing << 'this is important' (although not sure why?)

Enhancements/bugs/projects recorded under : https://mariadb.atlassian.net/projects/MXS

Semi-sync to be added to MaxScale (in the future)

11) ??? MariaBD java connector - failover handling

driver initialization to include failover details

transparent to application ie query execution

how it works depends on : (a) galera cluster or (b) classic master-slaves

examples of connection details for (a) classic java and (b) spring framework

fail slave

- database ping

- blacklist ... don't use slace

fail master

- database ping

- blacklist

- connect to another master (if possible)

- if unsure if the query has been excuted then raise an Exception

connection pool:

- initialise

- validate

12) Daniel (no slide)

- community member

- system-d patches

- in 10.1 proper system-d support, with notify handling

- in theory don't need mysqld-safe anymore

- galeria and system-d ... specifying initial primary at startup?

- socket activation ... provide the ability over multiple interfaces

13) Axel Schwenke - Performance Engineer

- Using Compiler Profiling

- modern CPU's use a pipeline

- CPUs very than memory access

- optimized for linear exeecution pattern (not branching)

- superscaler architecture, branch prediction, speculative execution

- attributes/hints for the compiler: this is LIKELY branch or UNLIKELY branch

- gcc -fprofile-generate << record how often a function is calls >> when you have an instrumented binary

- gcc -fprofile-use << recompile the biary using collected profiles

- this two phase compile with instrumented binary can be automated

- pain points:

- does your workload match the enduser workloads

- good results with standard benchmark tools (10% to 15%) .. less good with enduser tests (? %)

- in most case the error checking branches don get executed so don't need to be in the linear

- in theory it can performance worse (e.g. lots of errors in code / error paths) ... could performance worse but they never seen this

14) MariaDB in Docker (Kolbe@mariadb.com)

- existing images issues : root password as an environment variable

- linked containers have access

- container metadata

- 3 volumes

- data directory on faster storage

- unix socket

- load data

- helper scripts

- UID and GID of mysql user ... easy chmod

15) Sergiu Petrunia

- Engine Independent Table Statistics (EITS)

- Traditonal Mysql: (1) #row, (2) #rows in index range, (3) 'index statistic' (imprecise)

some issues

- index statistic (imprecise)

- not enought stats

- joins needs column statistics (join order on customer on supplier with moderate complicate where rules)

- Histograms (CBO)

- Engineer Independent ... beyond the scope of talk

- mysql.column_stats

set histogram_200; set use_stat_tables='preferably'; analyze table lineitem, orders

- hieght and width bal

16) Monty Q&A

- instance upgrades for mariadb?

- atomic writes standard for modern SSD (not just fusionIO)

- one disc seek is now 100 execution of the

- 10.2 tuning optimizing paremeter for hardware like gather system stats on oracle

- pluigns are potentially ... could they be sansitized by using a seperate namespace

- hybrid storage, for example : 0.5 T RAM, 100T of SSD and the rest on Spinning Disk

zfs has this sort of storage model built in?

working on a prototype with ScanDisc

memory mapping can handle 'atomic writes'?

no easy APIs exist at the moment for this sort of call ... Monty is working with vendors

- ScaleDB good for parallel execution... MariaDB looking to leverage this sort of technology