Hasufell's blog haskell and tech, mostly

Yet another opinion on LLMs

TOC

Opinion

After I’ve been exposed to LLMs more frequently at work doing Haskell, there are a couple of things that have crystallized for me:

  1. they are exceptionally poor at reasoning
  2. they lie more often than not
  3. they are gigantic guessing/templating machines

What is good use of LLMs

The issue with using LLMs isn’t just the well-known hallucination issue, which is btw. massively understated, it’s also that it will not by default go out of its way to find anything but the most average answer.

If you’re dealing with complicated problems you’ll end up with messy prompts and lots of back and forth, because the agent will just shuffle through possible answers, starting with the most direct one. You’ll keep pointing out flaws or saying it’s plain wrong and it will just try the next possibility.

After all, it doesn’t really reason, it’s just guessing. But once we know that, that’s fine.

The consequences of that is what I feel many people easily overlook:

  • the only reasonable way to use LLMs is when you can trivially verify the output, or…
  • when you have superior domain knowledge and it becomes a sparring partner

This can be tricky. When dealing with e.g. bugs, there’s often a way to simply try the suggested solution and see if it solves the issue. When you don’t have a way to easily verify the output via trial and error, then you’ll have to have extensive knowledge in the domain, so you can guide the LLM and tell it where it’s wrong. But once you don’t know the domain very well, you’re screwed. And that’s unfortunately one of the main use cases: I found myself asking Claude a lot of questions about linking and FreeBSD issues. I got lied to and went off-course so many times, that I’m confident to say it made me less productive. But when I used it in domains I have extensive knowledge myself, I could spot the lies and expose them. Then after a while it would get things right. Whether that actually saved me any time, I’m also not sure, but I’m sure there are cases where it did. But even then I rarely enjoyed the interactions.

Another implication is that using LLMs for summaries or to broadly explaining a codebase is improper use as well, because you have no idea how accurate it is. On the other hand, using it for navigating a codebase is much less risky, because if it sends you in a wrong direction, the worst thing that can happen is that you read more code.

Using LLMs to build things

I’m not into vibe coding at all. My opinion is that it isn’t much different from copy-pasting code from StackOverflow. Back in the day, we used to call those people poor engineers. Today with the use of AI it’s suddenly fancy.

There are many reasons why I think it’s inappropriate use:

  • it’s not good at novel problems (as in: things outside of the training set)
  • it still produces average code
  • it doesn’t really design anything organically

I think the best comparsion is the blog post Be aware of the Makefile effect, which argues that most people copy paste a known good solution into a new context and then make small adjustments. I think vibe coding takes this to a whole new level. You start with a piece of code that Claude throws out and then adjust it here and there until it works.

This is not how a good programmer discovering a new problem works though: you think about the issue holistically, develop a mental model and then decide how you express it in code, weighing competing concerns like extensibility, performance, simplicity and so on.

When vibe coding, the output of code is too much and too fast for these fundamental mental processes to have enough time to take place. You’re busy navigating the pieces and composing them and interacting with the prompt. But you’re not really designing anything. You’re guiding a copy-paste machine. I think it’s highly unlikely that the code would look the same way if you had written it from scratch. And I think that’s a useful metric to have.

But you could say it excels in prototyping and maybe that’s true… but prototypes have the tendency to become the first released version. You’d have to actually delete the code and start from scratch and only use the experience you gained from the prototype. But did you actually gain experience there?

The social impact

I don’t want to talk about the energy consumption problem, the AI funding bubble, etc., but about the social impact on us programmers.

I’ve had cases, where:

  • people commit AI generated files/patches and ask for a review
  • I ask someone to explain something to me and they run my question through an LLM and paste the output to me
  • during an online argument, someone suddenly pastes AI-slop into the chat, because maybe they got bored of the conversation

I find all of these cases incredibly frustrating. If you want me to review the output of your Claude conversation, you’re essentially asking me to do your job. It’s not a review. If I ask for your help, I’m asking you. I can run my questions through an AI prompt myself, I don’t need you for that. It’s almost a case of “Let Me Google That For You” and I find it disrespectful.

Apart from that, there’s already growing concern that AI is actually bad for learning and the video Veritasium: What Everyone Gets Wrong About AI explores that question as well, but also something more fundamental: the difference between two systems of thought:

  • system one: fast, partly subconscious thinking (also utilizing experience/memory to come to an answer more quickly)
  • system two: slow, conscious, effortful, methodical thinking

Neither of them is good or bad, but we have a tendency to over-utilize system one, because it’s good when dealing with high throughput and lots of stimuli… filtering through data quickly. But sometimes it leads us off-course, especially when we underestimate the problem.

When we’re using LLM to support us in the quest to maximize our productivity, I’d argue we’re losing our ability to make use of system two even more, because instead of taking a step back and thinking deeply about the problem, I can just have LLM shuffle through possible answers, try them out and move on. But I’ll also not remember the interaction, nor the answer! There were no “aha” moments.

Despite the impact on the individual or on teams, there’s also more concerning ecosystem effects. A study is arguing that Vibe Coding Kills Open Source and it reduces welfare despite higher productivity.

I personally think it also makes people engage less directly with maintainers and projects, because instead of filing bugs, they can just ask AI to find an answer or a workaround.

I have many more concerns: will people even bother writing documentation, will they even read this blog post or just skim through an AI summary? How will all of this shape the way we collaborate? Right now, I am mostly pessimistic.

LLM use outside of tech

AI reddit

I’ve had more success using LLMs outside of tech as a websearch on steroids. Instead of going through 30 different reddit threads, I can get an approximate answer in seconds and also have it list all the sources. I use that when researching on things related to my road bike, new wheel sets, etc. In the end, all the information is verifiable and I use it more as a gateway to more information (as in: I actually go to the reddit threads and read them). Similar when I search about products or trying to find local shops that offer a specific inventory.

On the other hand, last time I searched for information regarding banking fees on international transfers, I got lied to by Gemini 4 times in a row after I verified all the information manually.

What now?

I like programming. And I like collaboration, pair programming, design discussions and so on.

I don’t think that AI use is making any of that more fun. To me it’s mostly a productivity gold rush. There’s some nice use cases, but most of what has been promised, didn’t actually happen. The caveats are huge and there’s many negative side effects of people relying more on these tools.

Although I don’t really want to go deep into the policial side of the topic, some notable Haskellers like Audrey Tang have expressed that “AI is a parasite that fosters polarization”.

Managers seem to push for AI use, because that’s what is expected of everyone now. You’re not on top of technology or productivity if you’re not neck deep in AI subscriptions.

I find it a bit comical at times, because if you had a work colleague who lies 20% of the time with confidence, you’d fire them. But with the new LLM technology, we seem fine with that. Maybe because it’s much cheaper than an employee and maybe because many people don’t even notice that they’re consuming false information, because they don’t adhere to what I call “good use of LLM”.

There’s also some evidence that it may not actually boost productivity:

I’ve also noticed that when you bring up these opinions that AI advocates often blame the user and say “you just don’t know how to use AI correctly”. It starts to feel a bit like a religious war at times.

It certainly is not just another technology. It’s quite different from previous technological advancements.

Dynamic GHC matrix in GitHub CI

TOC

How to

When new GHC versions are released, I often have to go through all my library CI systems and update the ghc version matrix in my github action. You could argue a CI generation system like haskell-CI makes this somewhat simpler, but I’m a strong opponent of generating CI configurations, because:

  • it makes debugging harder (where does the line in my 800 LOC config come from?)
  • it requires frequent manual updates (e.g. when new GHC versions are released)
  • it requires to learn the generation script/format and keep up with updates
  • the output is often awful to read, verbose and lacks commentary

Instead, you can dynamically get the latest, say, 5 major GHC versions in GHCup via:

ghcup -s GHCupURL list -r -t ghc -c available | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -5

That’s nice… we could just be done now, but I also want to make sure GHCup’s recommended and latest versions are always tested. That requires us to resolve the major versions (X.Y) back to the latest full version (X.Y.Z) and also avoid duplicates due to adding recommended and latest.

Here’s the full code:

# store all available ghc versions
all="$(ghcup -s GHCupURL list -r -t ghc -c available)"
# get the recommended version
rec=$(echo "${all}"         | grep recommended | awk '{ print $2 }')
# get the latest version
latest=$(echo "${all}"      | grep latest      | awk '{ print $2 }')
# get the last 5 major versions
other_major=$(echo "${all}"                    | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -5)
# resolve the major versions back to their respective latest version (e.g. 9.6 -> 9.6.7)
other=$(for v in $other_major ; do point_releases=$(echo "$all" | awk '{ print $2 }' | grep --color=never "^$v.") ; echo "${point_releases}" | tail -n1  ; done)
# sort and deduplicate
selected=$(echo -n ${rec} ${latest} ${other} | tr " " "\n"  | sort -Vu)

At the time of writing, we’ll get:

$ echo "$selected"
9.6.7
9.8.4
9.10.3
9.12.2
9.14.1

This looks fine. Now the remaining question is how to make this work with GitHub CI matrix. We can easily utilize the GITHUB_OUTPUT feature like so:

jobs:
  tool-output:
    runs-on: ubuntu-latest
    outputs:
      ghc_versions: ${{ steps.gen_output.outputs.ghc_versions }}

    steps:

    - uses: haskell/ghcup-setup@v1

    - name: Generate output
      id: gen_output
      run: |
        all="$(ghcup -s GHCupURL list -r -t ghc -c available)"
        rec=$(echo "${all}"         | grep recommended | awk '{ print $2 }')
        latest=$(echo "${all}"      | grep latest      | awk '{ print $2 }')

        other_major=$(echo "${all}" | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -5)
        other=$(for v in $other_major ; do point_releases=$(echo "$all" | awk '{ print $2 }' | grep --color=never "^$v.") ; echo "${point_releases}" | tail -n1  ; done)

        selected=$(echo -n ${rec} ${latest} ${other} | tr " " "\n"  | sort -Vu)
        selected_json=$(echo -n $selected | jq -c -r -R 'split(" ") | [ .[] | if length > 0 then . else empty end ]')

        echo "${selected}"
        echo "${selected_json}"

        echo ghc_versions="${selected_json}" >> "$GITHUB_OUTPUT"

  build:
    runs-on: ${{ matrix.os }}
    needs: ["tool-output"]
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, ubuntu-24.04-arm, macOS-15-intel, macOS-latest, windows-latest]
        ghc: ${{ fromJSON(needs.tool-output.outputs.ghc_versions) }}
    steps:
    - uses: actions/checkout@v4

    - name: Install dependencies (Ubuntu)
      if: runner.os == 'Linux'
      run: |
        sudo apt-get -y update
        sudo apt-get -y install build-essential curl libffi-dev libgmp-dev libncurses-dev pkg-config

    - uses: haskell/ghcup-setup@v1
      with:
        ghc: ${{ matrix.ghc }}
        cabal: latest

    - name: Build
      run: |
        cabal update
        cabal test --test-show-details=direct all

Done, now we never have to update our CI manually, just because a new GHC version arrived.

We can also make use of repository variables to change the number of GHC versions tested on the fly without changing CI configuration, e.g.:

    - name: Generate output
      id: gen_output
      run: |
        # ... snip ...

        other_major=$(echo "${all}" | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -${GHC_TEST_NUM:=5})

        # ... snip ...
      env:
        GHC_TEST_NUM: ${{ vars.GHC_TEST_NUM }}

Reacting to failures

So what does this mean for a regular project? Your CI will likely suddenly fail some day, because a new GHC release dropped. But that’s exactly what we want. It requires our attention.

To make this even more immediate, it’s good to also have your CI run every night, like so:

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  schedule:
    - cron: '0 0 * * *'

But we might not want to constantly have CI failures when we’re waiting for some of our dear hackage colleagues to update our dependencies, until our package starts working again. So we want some sort of exclusion list. We can do this dynamically too via repository variables again. We’ll end up with something like:

    - name: Generate output
      id: gen_output
      run: |
        all="$(ghcup -s GHCupURL list -r -t ghc -c available)"
        rec=$(echo "${all}"         | grep recommended | awk '{ print $2 }')
        latest=$(echo "${all}"      | grep latest      | awk '{ print $2 }')

        other_major=$(echo "${all}" | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -${GHC_TEST_NUM:=5})
        other=$(for v in $other_major ; do point_releases=$(echo "$all" | awk '{ print $2 }' | grep --color=never "^$v.") ; echo "${point_releases}" | tail -n1  ; done)

        selected=$(echo -n ${rec} ${latest} ${other} | tr " " "\n"  | sort -Vu)
        ghc_exclude=( $GHC_EXCLUDE )
        selected_filtered=()
        for ghc in $selected ; do
          printf '%s\0' "${ghc_exclude[@]}" | grep --quiet --color=never -F -x -z -- $ghc || selected_filtered+=( $ghc )
        done
        unset ghc
        selected_json=$(echo -n ${selected_filtered[@]} | jq -c -r -R 'split(" ") | [ .[] | if length > 0 then . else empty end ]')

        echo "${selected}"
        echo "${selected_filtered}"
        echo "${selected_json}"

        echo ghc_versions="${selected_json}" >> "$GITHUB_OUTPUT"
      shell: bash
      env:
        GHC_TEST_NUM: ${{ vars.GHC_TEST_NUM }}
        GHC_EXCLUDE: ${{ vars.GHC_EXCLUDE }}

All put together

At last, all put together, we get:

name: Haskell CI

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  schedule:
    - cron: '0 0 * * *'

jobs:
  tool-output:
    runs-on: ubuntu-latest
    outputs:
      ghc_versions: ${{ steps.gen_output.outputs.ghc_versions }}

    steps:

    - uses: haskell/ghcup-setup@v1

    - name: Generate output
      id: gen_output
      run: |
        all="$(ghcup -s GHCupURL list -r -t ghc -c available)"
        rec=$(echo "${all}"         | grep recommended | awk '{ print $2 }')
        latest=$(echo "${all}"      | grep latest      | awk '{ print $2 }')

        other_major=$(echo "${all}" | awk '{ print $2 }' | awk -F '.' '{ print $1 "." $2}' | sort -Vu | tail -${GHC_TEST_NUM:=5})
        other=$(for v in $other_major ; do point_releases=$(echo "$all" | awk '{ print $2 }' | grep --color=never "^$v.") ; echo "${point_releases}" | tail -n1  ; done)

        selected=$(echo -n ${rec} ${latest} ${other} | tr " " "\n"  | sort -Vu)
        ghc_exclude=( $GHC_EXCLUDE )
        selected_filtered=()
        for ghc in $selected ; do
          printf '%s\0' "${ghc_exclude[@]}" | grep --quiet --color=never -F -x -z -- $ghc || selected_filtered+=( $ghc )
        done
        unset ghc
        selected_json=$(echo -n ${selected_filtered[@]} | jq -c -r -R 'split(" ") | [ .[] | if length > 0 then . else empty end ]')

        echo "${selected}"
        echo "${selected_filtered}"
        echo "${selected_json}"

        echo ghc_versions="${selected_json}" >> "$GITHUB_OUTPUT"
      shell: bash
      env:
        GHC_TEST_NUM: ${{ vars.GHC_TEST_NUM }}
        GHC_EXCLUDE: ${{ vars.GHC_EXCLUDE }}

  build:

    runs-on: ${{ matrix.os }}
    needs: ["tool-output"]
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, ubuntu-24.04-arm, macOS-15-intel, macOS-latest, windows-latest]
        ghc: ${{ fromJSON(needs.tool-output.outputs.ghc_versions) }}
    steps:
    - uses: actions/checkout@v4

    - name: Install dependencies (Ubuntu)
      if: runner.os == 'Linux'
      run: |
        sudo apt-get -y update
        sudo apt-get -y install build-essential curl libffi-dev libffi8 libgmp-dev libgmp10 libncurses-dev pkg-config libtinfo6 libncurses6

    - uses: haskell/ghcup-setup@v1
      with:
        ghc: ${{ matrix.ghc }}
        cabal: latest

    - name: Build
      run: |
        cabal update
        cabal test --test-show-details=direct all

You can see these dynamic GHC matrices in action here:

A few notes

You may also notice that we don’t use haskell-actions/setup here, but the ghcup-setup one, which is simpler, more predictable and maintained by GHCup developers.

Prior Art

GitHub runner-images do something similar when preinstalling GHCup.

GHCup discontinued (or why you should use Nix)

After working on GHCup for 6 years, I’ve realized that it’s obsolete. Everyone should just use nix.

Nix is a purely functional package manager with declarative configuration, reproducible builds and high quality packages. In fact, it is so simple that you just have to understand:

Here’s an example of a nix flake:

{
  nixConfig.allow-import-from-derivation = true;
  description = "cabal-audit's flake";
  inputs = {
    # flake inputs
    nixpkgs.url = "github:nixos/nixpkgs/nixpkgs-unstable";

    # flake parts
    parts.url = "github:hercules-ci/flake-parts";
    pre-commit-hooks.url = "github:cachix/git-hooks.nix";
    devshell.url = "github:numtide/devshell";
    # end flake parts
    # end flake inputs
  };
  outputs = inputs:
    inputs.parts.lib.mkFlake {inherit inputs;} {
      systems = ["x86_64-linux" "aarch64-linux" "x86_64-darwin" "aarch64-darwin"];
      imports = [inputs.pre-commit-hooks.flakeModule inputs.devshell.flakeModule];

      perSystem = {
        config,
        pkgs,
        lib,
        ...
      }: let
        hlib = pkgs.haskell.lib.compose;
        hspkgs = pkgs.haskell.packages.ghc98.override {
          overrides = import ./nix/haskell-overlay.nix {inherit hlib;};
        };
      in {
        # this flake module adds two things
        # 1. the pre-commit script which is automatically run when committing
        #    which checks formatting and lints of both Haskell and nix files
        #    the automatically run check can be bypassed with -n or --no-verify
        # 2. an attribute in the checks.<system> attrset which can be run with
        #    nix flake check which checks the same lints as the pre-commit hook
        pre-commit = {
          check.enable = true;
          settings.hooks = {
            cabal-fmt.enable = true;
            fourmolu.enable = true;
            hlint.enable = true;

            alejandra.enable = true;
            statix.enable = true;
            deadnix.enable = true;
          };
        };

        devShells.plain-haskell = hspkgs.callPackage ./nix/haskell-shell.nix {
          inherit (pkgs.haskell.packages.ghc98) haskell-language-server fourmolu;
        };

        # https://flake.parts/options/devshell for more information; one of the advantages is
        # the beautiful menu this provides where one can add commands that are offered and loaded
        # as part of the devShell
        devshells.default = {
          commands = [
            {
              name = "lint";
              help = "run formatting and linting of haskell and nix files in the entire repository";
              command = "pre-commit run --all";
            }
            {
              name = "regen-nix";
              help = "regenerate nix derivations for haskell packages";
              command =
                builtins.readFile (lib.getExe config.packages.regen-nix);
            }
          ];
          devshell = {
            name = "cabal-audit";
            packagesFrom = [config.devShells.plain-haskell];
            packages = [pkgs.cabal2nix pkgs.alejandra];
            startup.pre-commit.text = config.pre-commit.installationScript;
          };
        };

        packages = {
          inherit (hspkgs) cabal-audit;
          inherit (pkgs) groff;
          default = config.packages.cabal-audit;
          cabal-audit-static = pkgs.pkgsStatic.callPackage ./nix/static.nix {};
          regen-nix = pkgs.writeShellApplication {
            name = "regen-cabal-audit-nix";
            runtimeInputs = [pkgs.cabal2nix pkgs.alejandra];
            text = let
              v = "ef73a3748f31d8df1557546b26d2d587cdacf459";
              cmd = pkg: ''
                cabal2nix https://github.com/haskell/security-advisories.git \
                  --revision ${v} \
                  --subpath code/${pkg}/ > ./${pkg}.nix
              '';
            in ''
              pushd "$PRJ_ROOT"/nix
              ${lib.concatStrings (map cmd ["osv" "cvss" "hsec-core" "hsec-tools"])}
              cabal2nix ../. > ./cabal-audit.nix
              alejandra ./.
              popd
            '';
          };
        };
      };
    };
}

I’m sure you’ll get it after going through the documentation above.

To summarize, the benefits are:

  • concise documentation
  • easy to understand concepts
  • ecosystem always moves in one direction
  • space efficient
  • great error messages and ways to debug
  • follows KISS
  • follows unix (do one thing and do it well)

As such, GHCup will now be on life support for at least 6 months before we pull the plug. Adjust your CI accordingly.