SmartHome Journal: July 7th 2020 - And MotionEye is back out. Again!

Well, it isn't back out yet. But, it is getting there.

While I loved the GPU based object detection, it did pose a realistic problem; it was too 'heavy". It required a dedicated GPU. And, it even required an Nvidia GPU specifically (and technically a specific architecture if I didn't want to spend 2 hours rebuilding by Docker images). On top of which, I can't say I ever liked the notion of trying to run object detection on every single frame.

So, after I moved most of my SmartHome stuff over to a machine without a dedicated GPU I also switched back to an idea I had before. Mix motion and object detection. The objective is to run this in a stock Python container just by adding the opencv contrib libraries and my code for a small, efficient container.

The general idea is to be as frugal on processing power as possible. Right now it is just two passes; motion detection using contours and then if that passes a threshold, then it passes through a MobileNet-SSD. And, unless it detects something, it only process the motion detection every 1.5s.

I still need to add in video recording. And, I would also like to add a third pass. I'd like to do an even simpler motion detection pass first which just looks for the amount of difference in subsequent frames. The contour based detection is good, but even at 1/3 FPS it still puts more drain on the CPU than I'd like. And I think the reason is all of the calculation it is doing.

Basically, processing every single frame, and especially attempting to do that in real time is a fools errand. Most events we want to capture move neither that fast, nor that slow. Take a person breaking into your house for example. They don't move so quickly that you should need to capture every single frame to capture their movement (incidentally, using 2 frames further apart are actually MORE likely to detect the movement). Nor, do they move so slowly that you're generally worried about them evading motion detection.

So, we can save a LOT of energy, CPU cycles and other resources simply by ignoring most of the frames, most of the time. We can also save a lot of wasted effort by effectively sieving out those bad results through progressively more accurate.

Early results are promising though. With just the contour detection filtering out before object detection my 8 year old i5 mobile processor which is a weakly low power variant bounces between just 10-20% CPU so long as there isn't anything relevant to detect.

I'd like to see it hover at 5% CPU when running @ 2 FPS during normal times. But I'm OK with spending the processing power on valid detections. But, when very little is actually transpiring I should be able to set a reasonable threshold and filter out most results with pre-filtering. Getting there would represent a 6-12x increase. So, I'm not sure if I can get there. But, I'm hopeful.

MotionEye puts my new media server at about 20-30% increased load perpetually. My media server should be more powerful than my laptop (not but tons) and I'm OK with dropping the FPS to the 1/3 - 1/2 range. If I can keep the average below MotionEye's numbers at those values I'll still be happy.

And I really see no reason why I shouldn't be able to do so. I believe MotionEye is just doing a single pass on every single frame. And despite being reasonably simplistic, this one pass has a lot of logic baked in. So, I think that it ends up being costlier than it really needs to be while not yielding much in terms of accuracy.

And that is problem I've had with every single detection system. They are riddled with either too many false positives, false negatives or in some cases... both somehow. And, this is largely boils down to relying on a single pass.

Basic motion detection for instance needs a LOT of tweak in anything but a totally controlled environment (lighting, background, etc...).

Contour detection starts to break down if the area being tracked covers a large area in 3 dimensions. For instance, a car in the driveway (close to the camera) and a car on the road (further away) are far enough apart that the size of contour generated are drastically different. As are things like dogs or humans even at the same distance as a car. So it can be really difficult to choose practical threshold which doesn't filter everything out, or assuming everything is what you want.

And, object detection is simply costly while also not being perfect. But, most well established pre-trained nets are reasonable in terms of accuracy given reasonable conditions.

And from that, we can kind of arrive at a set things we want to accomplish. We want to use object detectors because they have better accuracy. But, we want to rely on them as infrequently as possible.

My plan for getting there:
  1. Mask: ALL of cameras contain at least some amount of the shot which I don't want to bother processing. Changes in these areas are purely noise and throw off other calculations needlessly. A mask is cheap to apply and should be included at all levels of image processing.
  2. Low FPS: When you're not in the midst of detection don't bother processing every frame.
  3. Delta: The quickest thing we can check do, if we're not already in the process of recording motion is to check if there is a significant difference in the last two frames. This will just be something simple like the total number of pixels changed being over a threshold and perhaps below some maximum. The values here should be low enough to detect the smallest thing you want to detect reliably.
  4. Contour: If you have enough motion to justify a deeper check, then do a more advanced contrast incorporating a Gaussian Blur and looking for contours above a certain size. This is basically looking for clusters of changes. Basic motion detection will throw a lot more false positives because a large number of isolated pixels can change due to lighting, etc... but looking for contours you are less likely to be fooled this way. Again, you want a fairly aggressive threshold here. You don't want to miss valid results.
  5. Detect: Lastly, if you get this far, see if any of the objects that you're interested in from your pre-trained net can be detected are in the image.
  6. Scale Back: I'll have to some tests here, but, once a detection is found, I can probably fall back to just using contours for detection rather than continuing to rely on object detection. At least, until the contour detection stops finding relevant data, and then it might be wise to reconfirm with the object detector.
With #6 in the mix, it could actually be possible to run object detection only twice for an entire detection event. Once to confirm it at the start and again to confirm there is nothing in the shot any longer.

However, the worst case could be worse than not trying this without any other measures. For instance, if you're detecting people and a person enters the frame and then stands perfectly still. The motion detectors would stop registering motion, so we would fall back to object detection for every frame on top of the motion detection.

But, a little bit of smarts and I think this can be salvaged. For instance, if the camera is in detection mode and we try to save cycles by checking for motion, find none, fallback to object detection and still find something detected, then, if we detect no new motion, we may be able to assume that whatever was there is still in frame and thus don't need to perform object detection again. Combine that with a sanity check every X frames and we can still avoid the costly object detection most of the time.

Anyway, just tossing out some ideas. I think it will work and work well. But, only time will tell.

Comments

Popular Posts