How I Work, Part IX: RSS Readers
This part of my How I Work series focuses on a somewhat archaic technology that has lost some of its relevancy in recent years. However, this technology — RSS, or Really Simple Syndication — has a lot to offer in terms of delivering relevant content in an age where your Twitter feed mostly consists of some random people disseminating funny, but ultimately noninformative, news. In this article I walk you through what RSS is, where it came from, and what it has to offer in 2022 where social networks are ubiquitous.
Why I don’t use Jupyter Notebooks anymore
Jupyter Notebooks are an awesome way to explain code to people, but for the past year I've tried to utilize them as a convenient way of prototyping data analyses in a layered fashion. My aim was to not having to rerun expensive calculations everytime some later part of my code raised an exception. But in the end, these notebooks hampered my progress and I've since switched back to running plain Python scripts from the terminal. Here's why I don't use notebooks anymore.
Sparse Matrices, or: How to store 10 GB of data in 100 MB
Working with large datasets requires to constantly improve your memory management skills. While I have figured out how to read in a lot of data efficiently a year ago, in today's article I show you how I learned to store information efficiently once I had read it in.
Open Source has a Sustainability Crisis
I have already written about the somewhat desolate status of Open Source. However, given the recent incident involving colors.js and GitHub user Marak, I argue that these dumpster fires are being fueled by what I call a Sustainability Crisis of Open Source. There is no easy way out, but we cannot forsake Open Source either.
Death by Proxy
Happy New Year everyone! Let me kick off this year on the blog with a piece on something that has recently caused me some headaches. This new thing is called “Proxy” and, while it is a pretty fancy new way of handling your data in JavaScript, it can make you sad real quick. It is one of these things that can cause very exotic and disturbing looking errors which don’t make sense at first.
Analyse: Der Koalitionsvertrag zwischen SPD, Bündnis 90/Die Grünen, und der FDP
Am 24. November 2021 veröffentlichten die prospektiven Koalitionspartner SPD, Bündnis 90/Die Grünen und die FDP ihren "Ampel"-Koalitionsvertrag für die kommende Legislaturperiode. Vor allem die FDP schien viel Erfolg in den Koalitionsverhandlungen zu haben. In diesem Artikel überprüfe ich diese Hypothese.
Why You Shouldn't Use SQLite
While SQLite databases are very convenient for storing application data, you should not store any significant amount of research data in them. In this article, I explain why.
Do Colorless Green Ideas Really Sleep Furiously?
Do language models have a real understanding of language? No. But are they subsequently useless? Also no. In this article, I shed light onto a decades-old linguistic debate and what the state of the art of language models looks like. Brace yourself for a fierce discussion between Claude Shannon, John Firth, Noam Chomsky, and Ludwig Wittgenstein!
Back to Sweden
After almost a year of Corona-induced remote work, it's time to return to Sweden.
OCR Error Correction, Take 2
Last week I wrote about OCR error correction and the three general approaches to it. Over the week, I got practical and fixed OCR errors in my dataset. However, I used a fourth approach, which is much more efficient and fixed about 80% of all OCR errors. Read more on my progress!