Architecture Design: Cheap Edge Computing

This is just a post to spitball some thoughts.

I want to move this application to the Edge. I have a Hosted DB solution and a local DB solution. One is SQL and the hosted one is Postgres, so I have a lot more testing to do. I'm using NHibernate and both are supported, so, assuming the testing goes well, I will have a fairly simple interface when talking to them both. If not, I'll likely need to create my own abstraction. But that is no big deal.

I also did some quick performance testing. The hosted solution isn't bad. For a cold start of the endpoint + a cold start of the Session Factory and then a query it takes about 500+ms and gets about 100ms quicker when the endpoint is warmed up. A simple query takes about 50ms once the session factory is warmed up.

This means that it makes queries about 10x slower to not have everything pre-warmed.

That is "bad" for the serverless approach. I mean, those numbers are reasonable enough for this particular use case, but the cold start for the session factory wouldn't really be avoidable in a serverless scenario. Also, making it serverless would be a bit of a pain in and of itself.

Another thought is to run the API Gateway in the cloud and have it run regular health checks. If the primary server goes down, then it could make an API call to a cloud host to spin up the normal, full-blown service and have it service requests until the primary comes back online. Then sync the data and take down the backup.

The data sync is the ugly part.

One possibility there would be to create a connection broker. When the service starts up, it could check with the replica to see if it is out of sync. If it is, it could connect to that database until it is back in sync and then switch to its own connection once ready. 

The existing sync process would need to know to ignore Sync records generated from it's own "cluster" of services. Shouldn't be too much work. And then all other services run on the primary until a fault is detected.

Perhaps a better option for the Sync would be to have it sync to a persistent topic automatically distributed to separate queues for each system requiring a sync. Then, when the primary comes back online it doesn't need to filter. It just needs to keep consuming from its the Sync Queue until it is ready. This could use the current Mongo Atlas DB perhaps. Having a separate table/DB for each system separate from their internal queues would be a lot cleaner than the current implementation.

Luckily the API Gateway is the lightest weight component in the whole thing and should remain that way even after adding this logic. As such, it should be able to get away with one of those fly.io free tier machines.

So, I think that the plan is:

  • Sign up for fly.io and figure out how to deploy a docker image from my private registry.
  • Add functionality to the API Gateway to perform health checks against pre-defined endpoints.
  • Add functionality to the API Gateway to be able to switch between endpoints based on health.
  • Determine where/how I will log these events as I will definitely want to know when it fails over to the cloud so that I can be sure that everything went as expected.
  • Write a library to help manage fly.io (or another vendor) to spin up containers when the primary goes down.
  • Ensure that new containers are only spun up during business hours and that they are taken down at night and back up in the morning if the primary is still unavailable.
  • Replace direct access to the ISessionFactory with a broker which will return the replica's DB connection when there are unsynced records generated by the replica to process.
    • This will likely consist of both a broker and a background task
    • Modify Sync logic to write to a mongodb instance specific to the sync targets for that system.
    • Modify logic reading syncs to process from that external queue
  • Validate this list of tasks again later
  • Test changes with multiple local deployments
  • Deploy test instance
A bigger list of tasks than I wanted. But, there are some interesting challenges here. I'll need to think on this a bit and see if anything can be better architected. 

Also the broader objective here is in line with what I think is the future of the industry; which is to say, to leverage the cloud only as far as is necessary to provide redundancy and to help maintain state. Try to run as few services, as cheaply and efficiently as possible in the public cloud, leverage for safe and encrypted storage for data which can safely live there and run everything else on-prem or in a private cloud.

It also means leveraging a lot of different solutions for some of the same problems, another thing I have advocated for. For instance, I have Mongo and Postgres in hosted databases for example, in addition to a local MS SQL instance. 

Why? It's a matter of cost and what suits the particular use cases the best. Mongo is great for my Syncs and Logging/Event data. The document storage is more flexible, reliable and performant and suits those needs. Postgres is a relational database that works with NHibernate and I primarily need it for redundancy. As it is much cheaper to find a host for Postgres as compared to MS SQL, it is my current choice for that. In fact, both hosted DB solutions should be free at my scale.

But I don't want to get too deep in the weeds here, this post was primarily to collect my thoughts.

Comments

Popular Posts