Is Generative AI Actually Useful?

Hendrik Erz

Abstract: Yes. Generative AI has plenty of good uses. And by now, setting it up locally with models you control and output that remains confidential is easier than ever. In this article, I share some use-cases I have found to be particularly useful, and some caveats I discovered along the way.

Published on Thursday, October 31st, 2024 by Hendrik | 13 min reading time

It has been almost two years since OpenAI has launched its first version of “ChatGPT,” a Large Language Model (LLM) capable of generating text in a chat-like manner. That was on November 30, 2022. Now, generative AI – particularly instruction-tuned models that can “chat” with people – are everywhere. Just a few days ago, Apple finally released its first batch of “Apple Intelligence” features that enhance macOS and iOS on a few select devices with the capabilities of generative AI. But even the open source landscape is now in a much better place, with tools such as “GPT4all” and “LocalAI” offering ready-made solutions to run these models locally, without compromising confidentiality or costing money (except for a computer that is at least somewhat capable of running them).

It has never been easier to use generative AI models to “chat,” summarize, reformulate, and simply generate text. And after initial doubts and criticism, I have finally integrated generative AI models into my daily work. Thus, I can confidently say that generative AI is actually useful.*

* but: Naturally, all of this comes with some caveats and limitations. In this article, I share with you the ways in which I am using generative AI in my day-to-day work, how you can get started, too, and where you should be cautious.

How I Got Here

First, a few words on how I, personally, got to today. When OpenAI launched its own GPT model in November 2022, I did not use it. In fact, it took quite some time until I finally created an account (for work) and loaded their app onto my phone. Until today, I only use it for very generic things that don’t give OpenAI any confidential or personal information about me, and only if I don’t have my computer with me.

One of my biggest concerns was and is handing over data and telemetry to a company that – no surprise here – turned out to be more “CloseAI” than “OpenAI.” Back then I knew that running these models locally will become possible, and I was right. Very soon after the launch of ChatGPT, llama.cpp was started, a project that did two things: first, provide a relatively simple (albeit terminal-based) interface to run LLMs locally, and, second, provide a mechanism to make the models smaller and less resource-intensive (quantization). However, it took about half a year before that solution became feasible, and more than a year until better solutions (that is, software with proper graphical interfaces) arrived.

In fact, in the beginning of 2024, the state of running generative AI locally was still so dire that I decided to create my own app for running GPT models locally, Local Chat. I have since stopped working on it, and instead settled on GPT4all from Nomic AI for the same purpose, as the app is slowly reaching a point where I think it’s usable.

All in all, I haven’t really changed my position on generative AI. I always knew there are some good uses for it, and I remained critical of proprietary solutions that involved sending my own confidential or personal information over to a company that has about the same accountability as an orange cat. What it took was three ingredients: good and small open models that can run on my hardware; a proper platform to interact with these models (a command-line interface was not going to make the cut); and simply time to try out various ways of using these models.

About one and a half years ago, I suggested two use-cases that were already cristallizing at the time: text reformulation and simple guidance for common tasks. These have remained two use-cases I can recommend, but now with more clarity. So let’s dive in!

Generative AI as a Writing Tool

The largest use-case for which many use generative AI at the moment is as a versatile writing tool. People utilize LLMs for rewriting or summarizing text, and generating ideas.

Indeed, telling a model to work with existing text usually works well, even with less performant local models. What is crucial here is that you as the user can easily validate the model output, since you already have the original text. Should a model confabulate things that did not occur in the source text, you have an easy time spotting these instances. This makes this use relatively safe and ergonomic, since you don’t have to spend any extra time validating the model’s output. I personally don’t use models to reformulate text for me very often, but this is only because I am already a good writer (one of the reasons I maintain this blog is to become a better writer). Those who don’t yet trust their abilities as well will certainly benefit from this use-case.

A second use-case that I face more regularly, however, is generating ideas. There are always these instances where I have some coarse strands of ideas floating in my head, but they aren’t yet all that concrete. In those situations, having a little back-and-forth with a model can help clarify ideas and develop arguments. It is basically a fancy version of rubber duck debugging. This use-case again is comparatively safe, since at this stage you usually know the basics of the topic already and can validate what the model says and selectively ignore false information. That being said, you can also use it to develop ideas for something you know little about, but then you’ll have to make sure to verify anything you’d like to use separately afterwards.

Generative AI for Information Retrieval

LLMs can also be used for basic information retrieval – emphasis on basic. Something I noticed is that these things excel at providing base recipes and, as I already mentioned one and a half years ago, providing basic guidance. They will horribly fail from time to time, but more often than not basic recipes for pancakes or waffles are quite useful. I am regularly asking my local model for such base recipes, simply to verify, e.g., ratios of flour to water. Again, however, as with the rewriting task, this only works because I already know how to cook. These simple information retrieval-tasks work because this information is “at the tip of my tongue,” but I can’t quite recall it. If you’re not a good cook, you should most definitely double-check any recipes it gives you, or turn to Google directly. The same holds true for basic guidance for, e.g., building a computer, or troubleshooting your TV. These models can give you the broad strokes of what you need to do, which you can then follow after having read the corresponding manuals.

This leads me to another surprising use-case that I have found for these models: LLMs as a pre-search-stage. We all know that Google is getting worse rather than better, but while experimenting with LLMs I found something interesting: Often it’s not that Google won’t tell us the answer to our questions; it’s more that we don’t even know what to ask. In the past months, I’ve stumbled upon quite a few situations in which a Google search returned only garbage. But, posing the same question to a local model (and sometimes a bit of back and forth) will typically make it generate some specific keywords. When I then punch these into Google, I frequently get the actual result I was looking for. This has worked surprisingly well for me, and it might for you. Again, here validation of the model generation is already built in, as it will end with you actually googling something.

Lastly, the elephant in the library: Retrieval Augmented Generation, or RAG. I have experimented a bit with it, since this is very easy to do with GPT4all, but in my experience it is a bit hit-or-miss. It will usually find and use relevant documents, but often there are irrelevant matches, and so I’ll often still have to do a plain old search through my documents to find what I need. I would rather count this under the “idea generation” task rather than getting actually useful results, but like the Google-search-preparation task in the paragraph above, it makes finding things a tad easier.

Generative AI for Writing Code

Many people have started using generative AI models to write code. I also did some experiments, and it does produce quite good results. None that would run, however, and I recommend you never copy and paste code verbatim from the model output. But I have already had numerous situations in which the generated code would use a proper algorithm that I could adapt. Or the generated code referenced a library I didn’t yet know. Or the code would provide hints for the general structure of the solution. Again, this I would rather count under the idea-generation and information retrieval tasks, but code generation is already quite good with many models.

Caveats: Validation and Time Efficiency

Before ending this article with a few recommendations for what I use, let me summarize the two primary caveats I have found while experimenting with LLMs: validation and time efficiency.

As you could see, every use-case I talked about involves some form of validation. LLMs are still horribly good at making things up, and so you can never really trust model outputs. This means that you should always have validation built into any task you use generative AI for. My use-cases for generative AI all do that. Either I ask them to take something already existing and do something with it, meaning it is easy for me to verify the quality of the result. Or I use them as a first stage to generate some “starting point,” after which I can then work with their output, having many opportunities for weeding out misinformation.

The second caveat is time efficiency. As you can see, the way validation for my use-cases work is to have validation as a natural part of the process. This means I don’t have to think specifically about “Oh, is this right?” It will become apparent almost (!) by itself, with little conscious thinking required on my side. This is the second caveat: I don’t use generative AI in ways where validation isn’t built-in like that. I use LLMs in instances where using them gives me a time benefit. If there is no time benefit, then there is no advantage of using generative AI in the first place. For example, turning an outline of arguments into a text is something I would never let an AI do, because there are just too many opportunities for the model to confabulate things. Rather, I write everything out first and then, if necessary, I let a model reformulate individual paragraphs. If I let some model turn an entire outline into a text, this would mean I have to read it twice: First for factuality, then for style. And then probably for factuality again, because after a few pages our concentration usually goes down the drain.

A Quick Review of Tools and Models I Use

But now it’s time to give you a few recommendations for tools and models that I found useful, and you may, too.

Open Source Tool: GPT4All

The primary tool I use for interacting with models currently is GPT4All. Albeit developed by a company, Nomic AI, it is open source and freely available for all operating systems. It comes with a list of good models and is easy to use, and you don’t have to write any code to use them. The app is still somewhat clunky and the updating process doesn’t work automatically yet, but from all solutions that I tried, this one is the best. In addition to a simple setup, it also offers Retrieval Augmented Generation out of the box and a few other goodies that make working with the app and models excellent. The developers appear to be relatively on top of things, so I expect the remaining teething issues to be fixed in the coming months.

Apple Intelligence

I can’t say too much about this yet, since it only runs on my MacBook and not my iPhone (how dare I use an iPhone 13 in 2024!) and it’s still a far cry from what Apple promised and others (such as GPT4All) already provide. One can really feel how afraid Apple is to let something loose onto people that confabulates things or says lewd words. (By the way, Apple has still not added a Swedish translation model to iOS, and I’m furious that I still have to use Google Translate for that.) On the one hand, I can understand this and believe that erring on the side of caution is always the smart choice. On the other hand, I get the feeling Apple is driving it a bit too far with the caution. But two things I already found are quite nice: The message summaries which make working with emails a breeze, and the summarization tools. As for the rest, we’ll have to wait until it ships, and give it some time to see how well it really works. I literally installed it yesterday, and the results so far seem … underwhelming.

My Favorite Model: Llama 3 8B

After I have experimented with a ton of different models, I have settled on the instruct version of Llama 3 8B (requires 8 GB of VRAM) for most of my tasks. I don’t trust generative AI enough for slight differences in the training of various models to be a reason for me to have multiple ones (especially since these things are absolute behemoths and resource hogs), so for simplicity’s sake, I settled on a single good one.

I feel that Llama 3 8B works very well for all the tasks I have outlined here. However, that model – even in its quantized form – is still very heavy, so unless you have a decent GPU (or a MacBook Pro, sorry Windows & Linux users), you probably won’t be happy with it.

There are some other models that I also found great, especially in case Llama models are too heavy for your computer. These are:

Microsoft’s Phi-models (currently Phi-3) that are tiny and run on even the worst potato computer out there with decent results
Mistral’s OpenOrca models seem to be good all-rounders that work decent but are not as heavy as Llama models
WizardCoder is a model that I found to be good at code generation, albeit it tends to be overly verbose, generating every possible solution to coding problems

Concluding Thoughts

After two years of AI-hype, we are in a much better place than when OpenAI was the only player in town. Day by day, use-cases and models are refined, the open source community tries to improve the usability of these models, and we as a society seem to learn ways of coexisting with large language models.

We are most certainly not at the end of the road, and this won’t be the last article dedicated to working with generative AI, but I’m happy that it becomes more and more usable. I am happy with how the generative AI world is evolving (albeit I remain skeptical, especially of scientific applications of GPT models) and am faithfully looking forward to seeing more developments.

If you have any additional questions or want to discuss this article with me, please don’t hesitate to message me on Bluesky!