Why we over-estimate Generative AI
I was going to write another rant post about how mad I am that everyone keeps saying that AI will render software developers obsolete. And I got sick of the idea of re-hashing that again. So, instead I felt it more constructive to tackle WHY people continue to believe this.
The biggest problem I can see is that we judge AI as we might judge a junior developer going through an interview. We choose challenges for AI on foundational topics.
The problem? This is exactly what AI is trained to do. Not succeed at interviews, but they are trained on large amounts of data. And we fail to realize that a LOT of that training data covers foundational programming topics. Either directly, or indirectly.
Two examples I see a LOT are asking GenAI to make a web server or a snake game. Now... go search "Github snake game" or "Github web server". Github is one of the primary sources for training AI models and there are far too many examples of these. And there are tons under permissive licenses or even no license at all.
Why so many examples of these sorts? Well, outside of huge successful FOSS projects... a lot of companies doing hiring will require applicants to post their solutions or Github. A lot of colleges and universities. A lot of coding boot camps. Etc...
Yes, I would be impressed if someone came in for interview, I asked them to code a snake game and they spit out a fully functioning game the language of my choosing in under 10 seconds. ChatGPT is a computer so it is fast, and its training data likely contained hundreds of examples of snake games.
When evaluating something like ChatGPT you need to give it real world problems. No one is going to hire you to write a web server or a snake game. You also need to test it on other real world scenarios like tackling design requests, feature requests and scope creep. And once again, those need to be tested within the scope of real world problems.
Here is a summary of a real-world example I ran into the other day. I was working on a product which contains a component which runs as a Visual Studio 2015 extension. To deal with some compatibility issues with newer versions of another project it interacted with, I needed the extension to update all of the projects in my solution from one known version of .Net Framework to another. I had already achieved this, but I needed the code to reload all of the affected projects.
Rather than spend all day researching I asked ChatGPT to write a function which would do what I wanted in this context.
How did it go? Well, it succeeded. However, it did so by removing the projects from the solution and re-adding them back. This A) broke all of the inter-project dependencies and B) it added them back in the root of the solution.
I told ChatGPT about the outcome and asked it to fix the issue. The solution no longer compiled, and didn't respect the folder hierarchy, only attempting to fix projects in the root. I asked it again. It spit out the exact same code, but proclaimed to have fixed the issue.
At this point, I took over. While it hadn't solved the issue, it had provided enough of a solution to narrow down my searches and I was able to adapt the ChatGPT code to get a solution pretty quickly after that. It is entirely possible that starting over, or tweaking my request that I could eventually arrive at a solution produced by ChatGPT which worked. But, not with any certainty and certainly not with any certainty over how long it might take.
This is not an isolated incident. Virtually every practical application of ChatGPT has gone this way for me. It has saved me time by narrowing down APIs or basic approaches, but it has never been able to fully resolve even a single non-trivial requirement (AKA, solve a problem I didn't already know the answer to).
If you think you can make a successful company on the backs of code which something like ChatGPT can reliably produce, then by all means, see if you can skip the part where you hire developers. I'd be interested to see if you are able to succeed in that business and stay in business over time on that model. I don't think you will, as any kid in their basement can replicate your products and any kid with programming chops can replicate and improve upon it.
As seen above, GenAI excels at the sorts of things that would impress us in an interview or the sorts of tests we might think of on the spot. And that is because those problems are over-represented in AI training data. There ability to answer those particular queries is not representative of their overall value and skill.
Comments
Post a Comment