04 Mar 2017
The theme of the week has been geography. Listening to Startup CEO, in which Matt Blumberg states strongly, that having a central office for as long as you can is important for that human connection, serendipitous conversations and culture. This is in contrast to virtual or home working. How connected are we to physical geography and at the same time, connected to a virtual world?
(Read more...)
25 Feb 2017
What sort of question is that? I don’t know about you, but I feel privileged and happy that I enjoy what I do at work, and would even go as far to call it fun.
I want to be remembered for making work fun for everyone.
(Read more...)
18 Feb 2017
I ran into a couple of problems this week involving security groups
A security group acts like a virtual firewall on your instance. It controls what traffic enters and leaves and is attached to an instance on start.
(Read more...)
11 Feb 2017
The challenge this week was to find out why the authentication appeared to be broken on the automated mongodb build.
Several weeks ago I had written a puppet module to build a mongodb cluster using a number of arguments,
like number of nodes, nodenames, certificates, etc.
Despite having certificates generated from a CA (Certificate Authority), and the certificate with the client to log on,
this user could do anything. and .auth()
was not needed.
mongo admin --ssl --sslCAFile /etc/mongodb/ssl/mongoCA.pem \
--sslPEMKeyFile /etc/mongodb/ssl/mongo1.pem \
-u mongoReadony -p mongotest --host mongo1
In the /etc/mongod.conf
file, security clusterAuthMode: x509
was set, but security.authorization:
was disabled
It was assumed that specifying net.ssl.mode was enough and the security.authorization setting would be ignored.
Sorry, false assumption.
(Read more...)
02 Feb 2017
Another incidence of a tired admin fixing an outage to cause a bigger outage isn't news as such, however I have to hand it to gitlab with their open honesty about this weeks incident.
After a spam storm created serious (4GB) replication lag on the firms postgresql database cluster, to fix the replication a very very tired on-call team-member then deleted the data folder on the active rather than the replicating server.
The full incident is documented here
I embrace the honesty that they have shown as this enables the whole community to learn from this and offer better services to our clients. This is very much the message in Black Box Thinking by Matthew Syed.
Matthew describes the difference between closed cultures where mistakes are hidden vs an open hostest culture where mistakes are open and much learning and prevention occurs as a result.
As shown by the support on Twitter the DevOps and cloud reliability engineers agree.
Lessons so far? Test your backups, you never know when you will really need them.
With my ethos about servers being disposible, I love destroying and rebuilding servers, to prove in any Disaster Recovery situation, the service can be restored.
This relies on well designed recovery processes and code, keeping the focus away from avoiding failure, to focus on embracing failure and reducing the mean time to recovery.
(Read more...)