Hasufell's blog haskell and tech, mostly

Yet another opinion on LLMs

TOC

Opinion

After I’ve been exposed to LLMs more frequently at work doing Haskell, there are a couple of things that have crystallized for me:

  1. they are exceptionally poor at reasoning
  2. they lie more often than not
  3. they are gigantic guessing/templating machines

What is good use of LLMs

The issue with using LLMs isn’t just the well-known hallucination issue, which is btw. massively understated, it’s also that it will not by default go out of its way to find anything but the most average answer.

If you’re dealing with complicated problems you’ll end up with messy prompts and lots of back and forth, because the agent will just shuffle through possible answers, starting with the most direct one. You’ll keep pointing out flaws or saying it’s plain wrong and it will just try the next possibility.

After all, it doesn’t really reason, it’s just guessing. But once we know that, that’s fine.

The consequences of that is what I feel many people easily overlook:

  • the only reasonable way to use LLMs is when you can trivially verify the output, or…
  • when you have superior domain knowledge and it becomes a sparring partner

This can be tricky. When dealing with e.g. bugs, there’s often a way to simply try the suggested solution and see if it solves the issue. When you don’t have a way to easily verify the output via trial and error, then you’ll have to have extensive knowledge in the domain, so you can guide the LLM and tell it where it’s wrong. But once you don’t know the domain very well, you’re screwed. And that’s unfortunately one of the main use cases: I found myself asking Claude a lot of questions about linking and FreeBSD issues. I got lied to and went off-course so many times, that I’m confident to say it made me less productive. But when I used it in domains I have extensive knowledge myself, I could spot the lies and expose them. Then after a while it would get things right. Whether that actually saved me any time, I’m also not sure, but I’m sure there are cases where it did. But even then I rarely enjoyed the interactions.

Another implication is that using LLMs for summaries or to broadly explaining a codebase is improper use as well, because you have no idea how accurate it is. On the other hand, using it for navigating a codebase is much less risky, because if it sends you in a wrong direction, the worst thing that can happen is that you read more code.

Using LLMs to build things

I’m not into vibe coding at all. My opinion is that it isn’t much different from copy-pasting code from StackOverflow. Back in the day, we used to call those people poor engineers. Today with the use of AI it’s suddenly fancy.

There are many reasons why I think it’s inappropriate use:

  • it’s not good at novel problems (as in: things outside of the training set)
  • it still produces average code
  • it doesn’t really design anything organically

I think the best comparsion is the blog post Be aware of the Makefile effect, which argues that most people copy paste a known good solution into a new context and then make small adjustments. I think vibe coding takes this to a whole new level. You start with a piece of code that Claude throws out and then adjust it here and there until it works.

This is not how a good programmer discovering a new problem works though: you think about the issue holistically, develop a mental model and then decide how you express it in code, weighing competing concerns like extensibility, performance, simplicity and so on.

When vibe coding, the output of code is too much and too fast for these fundamental mental processes to have enough time to take place. You’re busy navigating the pieces and composing them and interacting with the prompt. But you’re not really designing anything. You’re guiding a copy-paste machine. I think it’s highly unlikely that the code would look the same way if you had written it from scratch. And I think that’s a useful metric to have.

But you could say it excels in prototyping and maybe that’s true… but prototypes have the tendency to become the first released version. You’d have to actually delete the code and start from scratch and only use the experience you gained from the prototype. But did you actually gain experience there?

The social impact

I don’t want to talk about the energy consumption problem, the AI funding bubble, etc., but about the social impact on us programmers.

I’ve had cases, where:

  • people commit AI generated files/patches and ask for a review
  • I ask someone to explain something to me and they run my question through an LLM and paste the output to me
  • during an online argument, someone suddenly pastes AI-slop into the chat, because maybe they got bored of the conversation

I find all of these cases incredibly frustrating. If you want me to review the output of your Claude conversation, you’re essentially asking me to do your job. It’s not a review. If I ask for your help, I’m asking you. I can run my questions through an AI prompt myself, I don’t need you for that. It’s almost a case of “Let Me Google That For You” and I find it disrespectful.

Apart from that, there’s already growing concern that AI is actually bad for learning and the video Veritasium: What Everyone Gets Wrong About AI explores that question as well, but also something more fundamental: the difference between two systems of thought:

  • system one: fast, partly subconscious thinking (also utilizing experience/memory to come to an answer more quickly)
  • system two: slow, conscious, effortful, methodical thinking

Neither of them is good or bad, but we have a tendency to over-utilize system one, because it’s good when dealing with high throughput and lots of stimuli… filtering through data quickly. But sometimes it leads us off-course, especially when we underestimate the problem.

When we’re using LLM to support us in the quest to maximize our productivity, I’d argue we’re losing our ability to make use of system two even more, because instead of taking a step back and thinking deeply about the problem, I can just have LLM shuffle through possible answers, try them out and move on. But I’ll also not remember the interaction, nor the answer! There were no “aha” moments.

Despite the impact on the individual or on teams, there’s also more concerning ecosystem effects. A study is arguing that Vibe Coding Kills Open Source and it reduces welfare despite higher productivity.

I personally think it also makes people engage less directly with maintainers and projects, because instead of filing bugs, they can just ask AI to find an answer or a workaround.

I have many more concerns: will people even bother writing documentation, will they even read this blog post or just skim through an AI summary? How will all of this shape the way we collaborate? Right now, I am mostly pessimistic.

LLM use outside of tech

AI reddit

I’ve had more success using LLMs outside of tech as a websearch on steroids. Instead of going through 30 different reddit threads, I can get an approximate answer in seconds and also have it list all the sources. I use that when researching on things related to my road bike, new wheel sets, etc. In the end, all the information is verifiable and I use it more as a gateway to more information (as in: I actually go to the reddit threads and read them). Similar when I search about products or trying to find local shops that offer a specific inventory.

On the other hand, last time I searched for information regarding banking fees on international transfers, I got lied to by Gemini 4 times in a row after I verified all the information manually.

What now?

I like programming. And I like collaboration, pair programming, design discussions and so on.

I don’t think that AI use is making any of that more fun. To me it’s mostly a productivity gold rush. There’s some nice use cases, but most of what has been promised, didn’t actually happen. The caveats are huge and there’s many negative side effects of people relying more on these tools.

Although I don’t really want to go deep into the policial side of the topic, some notable Haskellers like Audrey Tang have expressed that “AI is a parasite that fosters polarization”.

Managers seem to push for AI use, because that’s what is expected of everyone now. You’re not on top of technology or productivity if you’re not neck deep in AI subscriptions.

I find it a bit comical at times, because if you had a work colleague who lies 20% of the time with confidence, you’d fire them. But with the new LLM technology, we seem fine with that. Maybe because it’s much cheaper than an employee and maybe because many people don’t even notice that they’re consuming false information, because they don’t adhere to what I call “good use of LLM”.

There’s also some evidence that it may not actually boost productivity:

I’ve also noticed that when you bring up these opinions that AI advocates often blame the user and say “you just don’t know how to use AI correctly”. It starts to feel a bit like a religious war at times.

It certainly is not just another technology. It’s quite different from previous technological advancements.