Do language models have a real understanding of language? No. But are they subsequently useless? Also no. In this article, I shed light onto a decades-old linguistic debate and what the state of the art of language models looks like. Brace yourself for a fierce discussion between Claude Shannon, John Firth, Noam Chomsky, and Ludwig Wittgenstein!
After almost a year of Corona-induced remote work, it's time to return to Sweden.
Last week I wrote about OCR error correction and the three general approaches to it. Over the week, I got practical and fixed OCR errors in my dataset. However, I used a fourth approach, which is much more efficient and fixed about 80% of all OCR errors. Read more on my progress!
If you have ever worked with text, you will know the huge pain that OCR (Optical Character Recognition) errors can pose. OCR has one job: Detect all the letters and numbers on an image and spit out a text file that contains the text that has been recognized. However, OCR errors still occur frequently. Here I outline what I've learned so far in this regard.
Every generation has their own catastrophic crisis to bear. Be it the French revolution, the two World Wars, Vietnam, the collapse of the Soviet Union or the 2008 financial crisis. For me and my generation, it was Afghanistan. And now, after twenty years – my whole adult life –, all of this is over. Afghanistan is lost, and with it all hope that, after more than fourty years, Afghanistan may return to its pre-war peace.
One of the biggest concerns I had over the past four years of developing Zettlr was that I'm not a designer by trade and apart from some obvious rules I know nothing about cool design stuff. In fact, all my furniture is from Ikea and the only art I have in my flat are some mass-produced band posters. That's how non-designery I am. But recently, I have come to the conclusion that this might not be as detrimental for Zettlr as I always thought.
I'm attending a summer school right now, so I don't have that much time to write up something awesome. But yesterday I finally resolved a very quiet error that had almost cost me my sanity. It occurred almost a year ago, and I wasn't able to fix it until yesterday. So, come and read a short story about what can go wrong if you forget what an INI-file is supposed to look like.
A few days ago, a scandal erupted in the Open Source Community: After being acquired by the Muse group, the free audio editor Audacity was supposed to receive an update that would include telemetry in the app. And users didn't like that. So today I want to talk about telemetry, what it is, and how the Audacity debacle prompted me to finally purge any form of data transmission from Zettlr myself.
A few months ago I received my first MacBook with an M1 chip – a.k.a.: It uses the ARM-architecture instead of the default Intel-architecture that most modern computers use. Back then I told you that installing PyTorch is a pain; and back then it was because the ecosystem wasn't fully adapted to the fact that suddenly MacBooks could also have ARM instruction sets. However, in the past six months, lots has changed. As I'm getting more and more emails by people asking whether or not my old instructions still hold true, I've decided it's time for an update.
After a short discussion on the Zettlr issue tracker about possibilities of making the app portable, I began a quick Google search about how to actually do that. Instead of finding ways of creating a portable version of an Electron app, I found a rabbit hole. And it was a rabbit hole I know by heart, because I've followed it down many, many times. It's a story about the bloat of the web and why Electron is bad. Here I ponder about the thought of efficiency, and why certain trade-offs are sometimes beneficial.