Seven years ago, I first worked with a large dataset. Right now, I again am back to the drawing board, trying to massage a huge chunk of data into something I can work with. This made me think of the old days.
For the past decade, the world has gotten used to the famous triad of social media: Facebook, Twitter, and Reddit. However, after the very long dying period of Facebook, Twitter and Reddit have turned into awful experiences over the course of just a few months. Time for some personal reflection on the glorious 2010s of social media.
A paper has shown that a compression algorithm – gzip – outperforms some large language models (LLMs) in some tasks. This has the NLP community in uproar. In this article, I dissect what has just happened, and what it means for language modeling at large.
You may have already stumbled upon them: "longtermists" who believe that sentient AI is basically around the corner and will kill us all. In today’s piece, I want to portray this belief as messianist, as a variation of the apocalyptic Christian perspective. Viewed as such, the insistence of these people to warn about impeding doom makes sense, as does the inability of critics to address them.
Every country has its own approach to ethical vetting of research. While many countries have no real prescriptions on that, Sweden decided to do it proper and wrote ethical vetting requirements into its legal code. This has a set of drawbacks, however, and right now, Swedish researchers are rising up against the sometimes detrimental effects that the law can have.
More than two years ago, Apple began its transition away from Intel processors to their own chips: Apple Silicon. The transition has been a sometimes bumpy ride, but after years of waiting, today I feel the ride is coming to an end. In this article, I reflect on the journey behind us.
Yes, I'm still talking about large AI models. But today I want to highlight an aspect that has many people worried: what could be the effects of these models going forward? Luckily there is already a debate going on that focuses on these issues.
In a recent Open Letter, AI scientists and entrepreneurs are demanding a moratorium on the training of large AI models. In this article I argue that the letter is full of straw man arguments and does not significantly bear on the dangers emanating from AI.
Today's article is about a (relatively) new technique called Active Learning, that aims to annotate large corpora with as little effort as possible. Since it utilizes machine learning, decisions regarding metrics are of utmost importance. In this article I dive into deciding on a metric for resampling documents. It turns out that, depending on what situation you face, there are multiple valid options.