The value of data and of rapid updates
I don't think anyone would question the statement that "data is valuable" or that "it is valuable to be able to update often". But, I think that as developers we often fail to act appropriately given the value these things have. And I am guilty of this as well.
I recently spoke about how I had written a system to keep a small business application in sync with a remote database. And I failed at the first part, but succeeded at the second part.
The failure was caused by a few things. Firstly, I was deleting the data I was being sent immediately after processing it. While I had thoroughly tested the "base" case, I hadn't thoroughly tested an additional case I had added last minute. There was an older version of the software still in use which I had updated to write to the same sync table. But, it was serializing things differently. And, because I was deleting the data after processing I could fix the issue, but had already lost data. And I was reliant on new data coming in to validate the fix.
So, I quickly rewrote the processing end to preserve the data and flagged it instead.
Thankfully, I had deployed the solution with Docker and Watchtower. So I was able to quickly build a new version, push the new image and have the server pick it up and re-send the missing data. Now I have a wealth of data. Of course, from the beginning I should have been both logging more data and preserving it until I could be certain that the solution operated as expected.
Now, this seems like a rookie mistake. But then, most problems seem that way after the root cause is discovered and it is the sort of oversight that wouldn't be note-worthy had the process been working. There are likely many other places in my code where things are working just fine at the moment, but aren't logging nearly enough to recover the data and intended state should something go wrong.
Now, to take this off on a different tangent, this is something that draws me to event driven architecture. In event driven architecture it is quite easy to define what it is that is valuable to capture and log; it is the events. And the events tend to be serialized already. So you simply store that data. From there it is typically trivial to "replay" the data in the future.
This isn't a bullet-proof solution. The logging itself can be a weak point. And you can always run into things later that were missing from the event data you were sending. The problem with logging is always what you aren't logging already or what fails to get logged. But, in a fully event driven system you are generally forced to put a lot of thought into considering what it published.
My next iteration of this project will likely be to convert into more of an event driven system.
And yes, it seems like I am simply taking this down another rabbit hole instead of veering back to the original topic because event driven architecture can actually resolve a lot of underlying problems in a much better fashion.
For instance, when I discovered the need to sync data from a legacy system, rather than trying to shoe-horn it into an existing solution I would have simply published those events to a new queue backed by some persistent storage. Then I would have had the data synced, and been able to deal with it at my leisure. If I discovered I was missing something, I could simply publish an event back to the host telling it to give me more/better data. If the host didn't know how to process that event? Fine, push out a new build. Watchtower will update, and then it will start handling them.
Basically, events are a built in point for spawning off extensibility or new features. They are also a logical place to inject logging or telemetry gathering.
Like anything, you need to be careful that you're not diving off the deep end and integrating it everywhere without evaluating the costs. For instance, the Amazon example where they swore off microservices. If you simply replaced microservices with event driven design in that situation, it would not have been much better even if it were otherwise a monolith. That is because funneling everything through message queues adds the same sort of complexity and overhead as funneling everything through serverless functions.
But, along boundaries where you are already doing asynchronous work and the overhead is acceptable, then passing it to message queue rather than straight to a DB or a consumer makes more sense. You can introduce more consumers or even re-route to different consumers if some translation/transformation needs to be done first. And, there are a lot of very quick and powerful systems out there to manage message queues (or you can build one yourself).
Comments
Post a Comment