The Nerualink Challenge - Or A Lesson In How to make everyone wrong

So, my first thought on this was; "that is stupid", but I try to remain open minded and when I saw this video from Theo I went down a rabbit hole.

Everyone is wrong. Neuralink is wrong. The Comp-Sci community is wrong. Theo and this poster on X are wrong. To be fair... everyone is also right about certain things as well. That isn't as interesting to write about though.

Let's start with Neuralink and this challenge. The challenge is, as provided, unsolvable. 

And that challenge is to losslessly encode data, similar to the supplied data, to 1/200th the original size and to do so quickly and efficiently. Lossless compression relies on finding patterns and then being able to represent those patterns and where they occur in less space than the original input. 

For example, if I had a file which contained XXXYYYZZZ, it takes up 9 characters. I notice a pattern that each character repeats 3 times. If this pattern holds for all such files, I can simply encode the algorithm to output this as XYZ and then have the decoder write back each character it sees 3 times. This, of course, only works if every file we encode follows this absurd convention. Some might get around such simple static rules by encoding a legend of sorts within the headers of the file.

The point is, for lossless compression, you're looking for different types of patterns in the data because patterns can be replaced with a shorthand for the that pattern and make the overall file shorter.

The problem with the Nueralink challenge is that the data appears to be mostly static. Which is to say it is closer to random than it is to well structured data. Random data tends not to exhibit large blocks of repeating data. Which means that the patterns which exist are shorter and repeat less often. When that happens it can be impossible to reliably come up with alternate notations for the data which are also shorter than the original data.

BUT... even if you COULD detect sufficient patterns, there is also a problem with the nature of the data. The data all comes from a single patient, in a single session, performing a single task. Any patterns which exist in the data could be biases from that patient's brain, or the type of activity or even just the way that the the brain was responding that day. In short, while incredibly verbose, it is still a sample set of 1. Which is statistical rubbish. If anyone completed the challenge, their algorithm would likely fail on the very next data set.

To make my point, a trivial solution would be to write an algorithm which uses the original files as references. When encoding, checks if it is one of the files and writes the index of that file with a delimiter to a file. On decoding, it chunks the file based on the delimiter and uses the reference file to reproduce. It would be an absolutely useless compression algorithm as it would only work on this exact set of files. The entire individual files ARE the pattern it is looking for. And while it is obviously the worst case possible for specialized compression algorithms, I'm not sure anything which may be able to reach 200x compression for this data would actually perform any better.

I'm not saying that their task [as stated] is impossible. It is framing things incorrectly. What they are actually implying that they want on the other hand, that may be impossible.

Now, because their ask is so absurd... I cannot fault the claims that the comp-sci community is wrong for sticking to the rigid definitions. Afterall, my examples and explanation above are a quite rudimentary comp-sci take on the challenge. I would expect most people with a background in comp-sci to come to same conclusion. So, in a sense you're wrong as well. 

Next, we move onto Theo and friends. I think that their argument is the next logical path. Choose an alternate definition for lossless. And, I agree, the data they are producing is likely MUCH more usable in general. I have 2 problems though... it fails the verification. Semantics? Absolutely, given the state of this shit show, but it should still make you a bit less cocky about your response. Next... even taking this broader definition of "lossless"... it is STILL being used incorrectly. No amount of context can fix it. 

Now, diving deeper into why this alternate use of the word "lossless" is still incorrect. This is the realm of signal processing, but also... statistics and probability. In a mathematical sense, noise is any data which doesn't belong in the set. This is normally caused by faulty instrumentation or readings. Scientists often omit extreme values as anomalies or because it makes the data more readable, but it is always important to understand the rationale and the impact or data reliability. Realistically, ANY noise reduction POTENTIALLY eliminates valid data points.

Let's take audio processing. I have a microphone which SHOULD record sound in the range of 80Hz to 35KHz but my data contains some data points below 80Hz and some above 35KHz. Is it completely safe to remove this data as noise? Not really. Especially if you don't know how it got there. It could be electronic or some other form of interference (AKA legitimate noise), but it could also be a  variation in the manufacturing of the microphone. The manufacturer may only guarantee the fidelity of the audio signal within a certain range, but it likely sensitive to varying degrees from unit to unit outside of those ranges. This is not noise. At least, not in the strictest sense. And, similarly, speaking of interference, there is a chance that some of the noise within the products specified range is caused by interference depending on your environment. That noise would be harder (though, perhaps not impossible) to detect and remove. 

My point is this... simple noise reduction via filtering is an attempt to draw a safe boundaries around the data. If successful, almost all of what you eliminate will be bad and almost all of what you keep will be good. Even when the problem is well understood it is almost impossible to filter out 100% of the noise and keep 100% of the good data. I doubt, very strongly, that brain wave data is something we understand well enough to do this with when we already can't do it with things like images, or audio. 

Other techniques beyond filtering are almost always lossy in every sense of the word. This usually happens when filtering can't remove enough noise because there is too much overlap. In these cases data is often smoothed or averaged or ran through some matrix transforms. And at this point you're basically molding the data to your opinion/expectations. You're basically admitting that the data isn't great, but good enough to infer what it should roughly look like or rules it should follow and then forcing it to do so. Once done, the data is forever altered.

If you're reading this and you're pulling your hair out because I'm focusing on the digital signal processing and it sounds like I'm ignoring the analog, then you've probably read too much into Theo's explanation. If you're digital transform eliminates or alters LEGITIMATE data points from the analog data and this skews the data set, then you won't be able to convert the digital data back into an accurate analog or other output at the end.

We're talking brain waves here, so let's say traumatic events or chemical changes that affect amplitude of brain waves/activity... maybe caffeine or other drugs a patient might be taking. 

I'm not a scientist in the field of brain waves, but I have a working knowledge of statistics and enough knowledge of just how complex the human brain is to suspect with a high degree of confidence that any quick and simple filter or noise reduction technique is likely to discard some amount of legitimate data. I'm not even going to bother to theorize amounts I'm confident enough to say that I don't believe that the amount of loss would ever be 0 in any strategy aggressive enough to cut the payload down to 1/200th the size. 

And so, I'm going to say that even under an alternate definition of lossless, the claim from the X post is still rubbish.


Now one thing still nagged me. All of this stuff shown on Theo's video is just fairly standard signal processing. And the data from Neuralink was in WAV files. An audio file format. A format that would make a lot of sense in either audio processing or signal processing in general, but not a lot of sense otherwise.

Here is my chance to be wrong as well!

Originally I had assumed a bespoke solution like Neuralink would use a binary format. I would only expect WAV files if they had already drawn the parallels between their work and audio signal processing. So, it wouldn't be a stretch to assume that at least some people working on this project have a working knowledge of audio signal processing. Further, if they have people who understand audio signal processing, it also seems plausible to assume that a large part of the reason for handling this data in WAV format would be to apply similar techniques.

And, if we can accept all of that... it also seems fair to accept that they know enough to ask for this sort of solution if it would meet their needs. Which implies that, for some reason, they need the data to remain in this format and for it to remain lossless.

So the next question I have is; who, or what is consuming the data? My best guess, at some stage there is an AI Model trained on the supplied data format and the desire or need for compression after training. I'll also leave it as an outside possibility that the person with the most seniority is on the team processing the data and refuses to accept any form of data loss, but my money is on AI model.

Why? Musk may be a bit unhinged at times, but he definitely doesn't strike me as an idiot. Especially not where software is concerned. He has a working understanding of programming. So it is less of a stretch if the limitation is not so simple as a stubborn developer or dev team. AI fits well because it could be that their current model consumes the data in precisely this format and it is very good at the job, but also could have been quite expensive to train. 

If this were the case, accepting any manipulation of the data would come with an inherent need to train a new model. And that new model would need to outperform the old model. Since better data doesn't guarantee a better model it could be a very expensive unknown that they want to avoid. For instance, it apparently cost OpenAI $5M worth of GPUs to train ChatGPT. Neuralink may be leasing compute time, so you can imagine how expensive this sort of thing can be.


At the end of the day, I agree in part with the linked video. Which is to say, while they are wrong to claim it is lossless in any sense of the word, I think that the sort of losses suggested are required (in combination with other approaches) to get to the level of compression required. Neuralink needs to adapt whatever internal issues are imposing this lossless requirement or they'll run out of money before they solve anything. 

Ultimately, we are all wrong for entertaining this nonsense. If this challenge were posed by anyone other than Elon or one of the major tech companies we'd all rightly write it off for the nonsense it is. Our biggest mistake is wasting our collective breath debating this. 

Comments

Popular Posts