Sunday, October 20, 2019

awk stddev (standard deviation) and avg (average) per file


Input files


~/Downloads/logs $ ls
application-2019-10-19-21.log.gz application-2019-10-20-02.log.gz application-2019-10-20-07.log.gz application-2019-10-20-12.log.gz
application-2019-10-19-22.log.gz application-2019-10-20-03.log.gz application-2019-10-20-08.log.gz application-2019-10-20-13.log.gz
application-2019-10-19-23.log.gz application-2019-10-20-04.log.gz application-2019-10-20-09.log.gz application-2019-10-20-14.log.gz
application-2019-10-20-00.log.gz application-2019-10-20-05.log.gz application-2019-10-20-10.log.gz application-2019-10-20-15.log.gz
application-2019-10-20-01.log.gz application-2019-10-20-06.log.gz application-2019-10-20-11.log.gz application-2019-10-20-16.log


sample data rows


~/Downloads/logs $ zgrep GameLoop application-2019-10-19-21.log.gz | head -3
info: GameLoop execution time: 271938 nanoseconds. {"timestamp":"2019-10-19 21:00:35"}
info: GameLoop execution time: 92681 nanoseconds. {"timestamp":"2019-10-19 21:00:45"}
info: GameLoop execution time: 125291 nanoseconds. {"timestamp":"2019-10-19 21:01:47"}


awk based script to calculate average and standard deviation 


~/Downloads/logs $ cat calculate_avg_and_stddev_per_file.sh
# https://stackoverflow.com/questions/18786073/compute-average-and-standard-deviation-with-awk
for i in `ls application-2019-10-*.gz`;do echo $i;zgrep GameLoop $i|awk '{x+=$5;y+=$5^2}END{print x/NR " " sqrt(y/NR-(x/NR)^2)}';done


output:


~/Downloads/logs $ ./calculate_avg_and_stddev_per_file.sh
application-2019-10-19-21.log.gz
94922.9 54338.7
application-2019-10-19-22.log.gz
82895.5 30873.3
application-2019-10-19-23.log.gz
84054.2 29225
application-2019-10-20-00.log.gz
86505.4 26829.8
application-2019-10-20-01.log.gz
86339.9 29854.3
application-2019-10-20-02.log.gz

...