Saturday, October 31, 2015

Web Performance - Case for HTTP/2 with Andy Davies


Summary

Interesting meetup : http://www.meetup.com/Dutch-Web-Operations-Meetup/events/224787669/

Not really me area of experteasse but still intersting.

Andy Davies covered many of the topics in 

Software Engineering Radio Episode 232: Mark Nottingham on HTTP/2
http://www.se-radio.net/2015/07/episode-232-mark-nottingham-on-http2/ 


Notes

Web Performance - Case for HTTP/2 with Andy Davies
@AndyDavies
AWS


1999 RFC2126 HTTP/1.1
HTTP/1.x doesn't use the network efficiently
- good for transfering large files
- but most webpages are made up of lots of small sessiom
- each TCP connection only supports one request at a time
- HTTP Pipeline
- splitting resources over mulitiple hosts (e.g. bbc.co.uk)
- headers sent on every request ... Mark Nottingham found huge duplication of content (200K of cookies with every page load .. big UK online retailer)
- reduce requests ... CSS and JavaScripts bundles,
      image sprites : browser has to decode whole image (expensive on smaller mobiles)
- use gulp or grunt to automate working around limitations for http/1.x


- test with image of little sqaure iimages ... skewed to show http2 strenght


- header frame and data frames
- insanely complex 'prioritse weights and dependencies'
- headers deduplicates into a dictionary
- long latency test (ireland-singapore) http 2 wins ...
- but only benefit for low latency test is security (over tls)
- what happens if you have packet loss (over a session on a single connection)


- server push ... currently we have "network idle" time while the "server builds page"
  - now push critical css


 John Mellor (Google)
 - parallel image loading looks good after 15% and very good at 25%
 - do people dislike partial imaging loading (needs bigger study)


 Good browser support for http2
 - especially firefox and chrome


 Limited server for http2
 - h2o server
 - nginx should have something later this year
 - http2 killing off apache? nope http2 code is now being ported
 - haproxy doesn't support http2 yet


 Problems with SPDY
 - issues with priotisation rules not being


 h2spec
 - japanese buid tool for checking if the


 h2i
 - interactive consul for debug
 - tricky to use

 reducing sharding
 - two shard domains but usingn a single tcp connection


 3rd parties still growing
  - ab test
  - advertising
  - tag management


 w3c resource hints <link ref='...'>
 - dns-prefetch (all browsers already provide this)
 - preconnet (chrome only at the moment)
 - preload
 -  


 testing
 - chrome dev tools .. network testimh
 - webpagetest .. firefox identifies resources correctly (unlike chrome)

 f5, acmi, iss

MariaDB meetup at eBay (Oct 2015)

MariaDB Ab (Oct 2015 - eBay)

Summary

  • A lagre number of MariaDB developers and engineers (including Monty)
  • Big MariaDB gathering at the Booking.com this week
  • I've never attended a meetup before with 16 presentation in one evening ;)
  • Lots of interesting developments, the introduction of histograms was particularly interesting




Notes from various presentors

1) Rasmus
MariaDB, MariaDB Galeria, MaxScale << key products with multiple releases per year  
mariadb.org/jira << nearly 10,000 issues
Weekly sprint ... finish every Tues at 4pm (Swedish Time)  / 3 pm Nederlands ... weekly call


2) Daniel Batholomew ... release manager
MariaDB tree on github
Buildbot ... test and re-test
Sergei Golubchik (Chef Architect) raises JIRA for actual releases
Push to mirrors into US (Origean) and Europe (Netherlands)
Anouncement to world via normal geek social networks: google+ group ...


3) Sergei Golubchik (Chef Architect)
Data Encryption at Rest
Encrypting keys?
What is encrypted: tablespace, logs, temp/swap?
encryption plugins - file_key_management plugins, simple APIs?

Apropos ... XtraDB/InnoDB 'page compression' and 'data scrubbing'

Benchmark ... ro (order of 1% slower), rw/filesort (order of 10% slower), bin log encryption (< 4%)
temp files (ie sorts) are only encrypted if they are written to disc


5) Sergei Pertunia
optimizing troubleshooting
- EXPLAIN you query .. are the stats true, where is the time spent
- slow query log ... rows_examined
- PERFORMANCE_SCHEMA .. table_io_waits_summary_by_table

- ANALYZE ... optimize the query and trace the execution
 rows (estimate) vs r_rows (real)
 filters (estimate) vs r_filter (real) e.g. do we expect to 50% or 5% or rows to match

- EXPLAIN/ANALYZE FORMAT=JSON ... both look hard to read compared to
r_total_time_ms, r_buffer_size
r_indexes

6) Colin Charles
- SHOW PLUGINS
- INFORMATION.ALL_PLUGINS
- AUTH_SOCKET
- PAM AUTHENTICATION (Google authenication on phone?)
- Password validation << cracklib_password
- Server_audit << required by regulator
- query_cache_info <<
- information_schema.query_response_time << distro reponse time bar/count chart


7) Otto Kekalainen
- passwordless authentication
- pain with automation
- /etc/mysql/debian.cnf << clear text root password !?
- unix_socket to the rescue
- ONLY debian/ubuntu NOT ON Centos/RPM

8) Alexander Barkov
- REGEX
- old library - Henr Spencer?
- modern PCRE (MCDEV-4424)  ... google summer or code?
- several new REGEX functions: REGEX_INSTR, REGEX_REPLACE
- performance questions? the new library is generally faster


9) Jan Lindstrom
- InnoDB in MariaDB 10.1 (both XtraDB from Percona + InnoDB Oracle)
- Galera integration
  - wsrep_on=1 and speciy library
- Page compesssion for SSD
  - innodb-file-format=Barracuda
  - punch whole
  - create table t1(...) page_compressed=1;
  - zip, lz4, lzo, bzip2, snappy
-Defragement
 - remove duplicate are removed from page
 - too empty pages are merged
 - does not return totally empty pages


10) Massimiliano Pinto - What is MaxScale?
 Classic proxy between db server and the client
 Key Parts: Router, Montior, Protocol, Authentication, Filters
 Bin Log Router via MaxScale:
    offload network load from master
    easier to promote a new master
       the remaining slaves are unaffected as they are still interfacing with maxscale
    blr_distrubute_binlog_record
      events written to local binlog before distributing << 'this is important' (although not sure why?)
 Enhancements/bugs/projects recorded under : https://mariadb.atlassian.net/projects/MXS
 Semi-sync to be added to MaxScale (in the future)


11) ??? MariaBD java connector - failover handling
 driver initialization to include failover details
 transparent to application ie query execution
 how it works depends on : (a) galera cluster or (b) classic master-slaves
 examples of connection details for (a) classic java and (b) spring framework
 fail slave
  - database ping
  - blacklist ... don't use slace
 fail master
  - database ping
  - blacklist
  - connect to another master (if possible)
  - if unsure if the query has been excuted then raise an Exception
  connection pool:
  - initialise
  - validate

12) Daniel (no slide)
 - community member
 - system-d patches
 - in 10.1 proper system-d support, with notify handling
 - in theory don't need mysqld-safe anymore
 - galeria and system-d ... specifying initial primary at startup?
 - socket activation ... provide the ability over multiple interfaces


13) Axel Schwenke - Performance Engineer
 - Using Compiler Profiling
 - modern CPU's use a pipeline
 - CPUs very than memory access
 - optimized for linear exeecution pattern (not branching)
 - superscaler architecture, branch prediction, speculative execution
 - attributes/hints for the compiler: this is LIKELY branch or UNLIKELY branch
 - gcc -fprofile-generate << record how often a function is calls >> when you have an instrumented binary
 - gcc -fprofile-use << recompile the biary using collected profiles
 - this two phase compile with instrumented binary can be automated
 - pain points:
   - does your workload match the enduser workloads
   - good results with standard benchmark tools (10% to 15%) .. less good with enduser tests (? %)
 - in most case the error checking branches don get executed so don't need to be in the linear
 - in theory it can performance worse (e.g. lots of errors in code / error paths) ... could performance worse but they never seen this

14) MariaDB in Docker (Kolbe@mariadb.com)
- existing images issues : root password as an environment variable
   - linked containers have access
   - container metadata
- 3 volumes
 - data directory on faster storage
 - unix socket
 - load data
- helper scripts
 - UID and GID of mysql user ... easy chmod

15) Sergiu Petrunia
- Engine Independent Table Statistics (EITS)
- Traditonal Mysql: (1) #row,  (2) #rows in index range, (3) 'index statistic' (imprecise)
   some issues
   - index statistic (imprecise)
   - not enought stats
   - joins needs column statistics (join order on customer on supplier with moderate complicate where rules)
- Histograms (CBO)
- Engineer Independent ... beyond the scope of talk
- mysql.column_stats

set histogram_200; set use_stat_tables='preferably'; analyze table lineitem, orders
- hieght  and width bal


16) Monty Q&A
- instance upgrades for mariadb?
- atomic writes standard for modern SSD (not just fusionIO)
- one disc seek is now 100 execution of the
- 10.2 tuning optimizing paremeter for hardware like gather system stats on oracle
- pluigns are potentially ... could they be sansitized by using a seperate namespace
- hybrid storage, for example : 0.5 T RAM, 100T of SSD and the rest on Spinning Disk
    zfs has this sort of storage model built in?
    working on a prototype with ScanDisc
    memory mapping can handle 'atomic writes'?
    no easy APIs exist at the moment for this sort of call ... Monty is working with vendors
- ScaleDB good for parallel execution... MariaDB looking to leverage this sort of technology

Scaling Postgres Meetup at IBM



Summary

This was a *really interesting* meetup with two presentation:
  • one of the postgres commiters (Andres Freund) presenting about the internals of postgres, recent changes ("major recent scalability improvements (9.2, 9.5, 9.6) that have improved postgres' scalability massively") and some of the ongoing challenges
  • the second presentation was Marco Slot (Citus Data) was on pgshard ("seamlessly distribute a table across many servers for horizontal scale and replicate it for high availability"

Some of the highlights from the presentations
  • Discussion of UMA and NUMA architecture and changes to Postgres internals to work better with large modern Intel servers (multiple Cores and Sockets) … the key here is that you want to ensure the memory access is local (i.e. the memory buffer is directly attached to that core/socket).  The Postgres database is significantly different from Oracle e.g. in Oracle the users belong to the database, where as on a Postgres server you might be running multiple databases but with common users.
  • Improvement to the locking implementation to better match the NUMA architecture. Andreas started with a quick overview of the different levels of locking within Postgres : level-one (internal spin lock.. very fast but minimal functionality), level-two (internal light-weight lock - some basic functionality: can be acquired in read mode by multiple threads, with error recovery) and level-three (heavy-weight lock - full functionality … sort of locks a DBA inspects via pg_locks). Andreas then went on to explained how by moving the heavy weight end-user locks from the global postgres server level to the more local database level, this was a better fit for modern NUMA architecture. He also went through some results from pgbench test he had run … basically his tests scaled pretty well upto several hundred concurrent sessions and under extremely load the totally through didn't significantly drop (as per earlier Postgres releases).
  • The pgshard presentation was also interesting, this is a relatively young open-source project, but there was a nice demo running over multiple (4?) AWS instances… the sharding principals looked similar to me to Mongo sharding.


There was also some good Q&A after the presentation
  • One of the current scalability limitation of Postgres is the Buffer Cache, which is still using the older "clean-sweep algorithm". Interesting there a release where this algorithm was replaced with something more modern/efficient (clean-sweep algorithm is from 1979). Unfortunately this release had to be pulled as it turned-out that it infringed on IBM patents :)
  • There was an interesting discussion during q&a regarding whether you should have a buffer cache bigger than 8G (presumably due to the performance limitations of the clean-sweep algorithm). The response from Andres was that this is highly dependent on workload.
  • After the meetup I did a bit of googling regarding the clean-sweep algorithm and found some more details: "Inside the Buffer Cache" (see link below)
  • There was also some discussions of whether introducing pgbouncer (connection pooling algorithm) should be moved into postgres core?

Links:




Lastly this is a photo of me making notes at this meetup:

A couple of useful looking resources I found after the presentation:

Notes (from presentation … so not necessarily 100% accurate)


Introduction


SplendidData - PostgresPURE: (i) own Postgres Distro and (ii) Advanced Migration (Oracle>Postgres)
Reiner Peterke (performance analysis tooling) … presentation on hold (as we have two speakers)

Andres Freund

Vertical Scaling
- 2005 - 2 cores (x86)
- 2015 - 18 cores + multiple sockets
- Why scale back on a single machine?

Architecture considerations:
a) UMA
- single bus between memory and CPU(s)
- simpler but doesn't scale nicely for modern x86 servers with multiple cores, each with multiple sockets
b) NUMA (Non-Uniform Memory Access)
- each CPU has own memory
- accessing local memory but remote memory is more expensive

Postgres lock primer
l1 - spin lock … very fast to acquire, exclusive mode only, no queuing (cpu spins), …
l2 - light-weight lock … can be acquired in read mode by multiple threads, with error recovery
l3 - heavy-weight lock … as per pg_locks, only locks seen by enduser DBAs, more complex, error recovery, dynamic identities, deadlock checks



Acquire lock
- can be a bottleneck in earlier version of postgres
- most lock don’t conflict? stock lock details locally which is better for NUMA architecture (ask “fastpath”)

testing concurrent clients
- readonly pgbench scale 300
- ec2 m4.8xlarge
- new locking method is better for over 8 clients

perf top -az
- s_lock (spinlock acquire) taking 90% of CPU time

atomic operations
- https://en.wikipedia.org/wiki/Linearizability : ‘In concurrent programming, an operation (or set of operations) is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes.’)
- add & subtract, compare exchange
- 20 +1 operations, don’t want to loose any updates
- rlwock pgbench now scales better unto 64 clients / 400000 TPS

levels of caching
- traditional discs are super slow
- even ssd are much slower than main memory
- 8k pages by default
- postgres inefficient buffer replacement (‘clock-sweep algorithm’ based on paper from 1979?)
- struct Buffer … usage count
- patent infringement algorithm (‘IBM contacted them and they had to pull from postgres release)
- atomic buffer-algorthim scales well upto 500 clients

not-fixed issues
- extension locks … problematic for bulk write workloads
- buffer algorithm … so far only made a bad algorithm better
- transaction isolation … max out at about 300 active concurrent sessions

q&a
- introducing pgbouncer into postgres (core)
- pgbouncer is a connection pooling utility that can be plugged on top of a PostgreSQL server. It can be used to limit the maximum number of connections on server side by managing a pool of idle connections that can be used by any applications. (http://michael.otacoo.com/postgresql-2/first-steps-with-pgbouncer-how-to-set-and-run-it/)


Scaling up … sharing
- Citus Data … pg_shard
- CloudFlare (CDN) … massive event logs
- PostgresSQL 9.3+
- pg_shard is good for nosql type work patterns i.e. access path by PK
- shards are regular postgres tables

create extension pg_shard;
create table customer_reviews (
select master_create_distributed_table

AWS example: 4 worker nodes … each with multiple table shards

psql -h master-node-1   << sharding/hashtext details  (SPOF? can have multi-master config e.g. )
/d show single table

psql -h worker-node-1  << actual data
/d show local shards for table

select * from ps_distribution_metadata.shard << shard identifier
pg_shard using hash partitioning by default

INSERT - shared lcok on shard
UPDATE/DELETE - exclusive lock

select avg(rating) from reviews
- as mush as possible push the compute down to individual node/shard … which returns local avg + a weight count

\timing … inserts taking 2.5ms not 0.1ms
selecting on a range of PKs
- can be expensive (all shard queries)
- you can you use range partitions (typically by time)

No explain for pg_shard queries
small number of customers using pg_shard