Programming in the pandemic - Coralogix: Don't run with scissors, do run with a runbook
With only a proportion of developers classified as key workers (where their responsibilities perhaps included the operations-side of keeping mission-critical and life-critical systems up and online), the majority of programmers will have been forced to work remotely, often in solitude.
So how have the fallout effects of this played out?
This post is written by Ariel Assaraf, the CEO of Coralogix – the company is known for its platform that provides teams with metrics, log monitoring, and security so that they can build, run, and secure their applications and infrastructures without any coverage, storage, or cost limitations.
Assaraf writes as follows…
Transitioning to working from home while simultaneously growing our engineering team has introduced many new challenges for us. Back in February 2020, we understood the direction things were headed and started researching and trying different things that could help us quickly adjust.
With feedback from our teams, we learned that one of the biggest challenges we faced was how to support communication and collaboration in a natural format that doesn’t require too much extra effort from our engineers.
Day-to-day communication
We’ve found that voice channels are the most natural way for communicating. Adding Discord to our tool stack and enabling the act of ‘walking into a room’ and talking as if we are in the office, was exactly what we needed.
It’s helped us maintain a sense of community in our day-to-day work. Just like in the office, where we never locked doors or closed the blinds during meetings, anyone can see who is meeting in our virtual conference rooms.
We set up specific channels for communication on different topics like production issues and support calls. This way everyone knows exactly where to go to discuss different tasks, projects, and issues. Our pair programming and code review are also done on Discord using the live screen share. Without the casual in-office encounters, having processes in place and regular check-ins are crucial. Actually, we set up a virtual ‘break room’ to try to enable those informal meetings between people.
Still, it’s really important to us that the leaders and managers in the company are touching base with their team daily and that everyone in the company is in sync regarding vision and goals.
Updated tech stack
Probably the most damaged part of the team collaboration with WFH is the feedback loop for developers. With less interaction and daily contact, we initially saw significantly longer cycles and more bottlenecks between R&D and operations.
To mitigate this issue, we made some updates to how we collaborate as well as to our tech stack.
First, we expanded our monitoring coverage. We added more coverage with logs, metrics, and security to increase our observability. Then, we added more development and staging environments to allow for shorter (and faster) feedback loops and testing and to reduce bottlenecks and dependency in the platform teams.
Finally, we doubled-down on documentation. We always used Confluence, but usage increased dramatically. We use it to manage feature documentation, both internally and externally, as well as the creation of detailed runbooks for every incident.
Along with our well-defined on-call rotation and regular knowledge transfers on handover, this has dramatically improved our entire workflow.
Improved security
WFH increases risks in several areas, whether it’s a potential weakness in a feature we release, or the exposure we get due to the usage of many new tools. To make sure we are in control, we have added several new layers of security such as:
The Coralogix SIEM/IDS solution for our cloud environments – To get full security observability, monitoring, and forensics.
- Panorays for 3rd party vulnerability scans – To manage communication with 3rd parties and scan possible vulnerabilities that may arise by 3rd parties.
- Salt security – for API protection – To get automatic alerts whenever one of our API’s is being attacked or exploited.
- Teleport – for production access that is fully audited and encrypted.
- Added more frequent penetration tests and security design reviews.
People and hiring
With the accelerated digitalisation during Covid-19, we had to scale our production and more than double our engineering teams. To support that growth, and the team leaders that are tasked with hiring new engineers, we started using a tool called Comeet. Using Comeet, we can view all relevant information in a single dashboard from the moment a candidate is found/leaves his details on our career page and up to the moment when he receives an offer.
Of course, the other side to that is making sure that we’re living up to our commitment to our employees’ happiness and wellbeing throughout these challenging times. The first part of that was to continue holding happy hours when possible to give an opportunity for socializing and community, even when we can’t do it in person. We also provided a Headspace subscription to all employees and encourage them to participate in their own communities with volunteering enablement through Vee.