Blog | Hendrik Erz

Blog

Why I don’t use Jupyter Notebooks anymore

Jupyter Notebooks are an awesome way to explain code to people, but for the past year I've tried to utilize them as a convenient way of prototyping data analyses in a layered fashion. My aim was to not having to rerun expensive calculations everytime some later part of my code raised an exception. But in the end, these notebooks hampered my progress and I've since switched back to running plain Python scripts from the terminal. Here's why I don't use notebooks anymore.


Sparse Matrices, or: How to store 10 GB of data in 100 MB

Working with large datasets requires to constantly improve your memory management skills. While I have figured out how to read in a lot of data efficiently a year ago, in today's article I show you how I learned to store information efficiently once I had read it in.


Open Source has a Sustainability Crisis

I have already written about the somewhat desolate status of Open Source. However, given the recent incident involving colors.js and GitHub user Marak, I argue that these dumpster fires are being fueled by what I call a Sustainability Crisis of Open Source. There is no easy way out, but we cannot forsake Open Source either.


Death by Proxy

Happy New Year everyone! Let me kick off this year on the blog with a piece on something that has recently caused me some headaches. This new thing is called “Proxy” and, while it is a pretty fancy new way of handling your data in JavaScript, it can make you sad real quick. It is one of these things that can cause very exotic and disturbing looking errors which don’t make sense at first.


Analyse: Der Koalitionsvertrag zwischen SPD, Bündnis 90/Die Grünen, und der FDP

Am 24. November 2021 veröffentlichten die prospektiven Koalitionspartner SPD, Bündnis 90/Die Grünen und die FDP ihren "Ampel"-Koalitionsvertrag für die kommende Legislaturperiode. Vor allem die FDP schien viel Erfolg in den Koalitionsverhandlungen zu haben. In diesem Artikel überprüfe ich diese Hypothese.


Why You Shouldn't Use SQLite

While SQLite databases are very convenient for storing application data, you should not store any significant amount of research data in them. In this article, I explain why.


Do Colorless Green Ideas Really Sleep Furiously?

Do language models have a real understanding of language? No. But are they subsequently useless? Also no. In this article, I shed light onto a decades-old linguistic debate and what the state of the art of language models looks like. Brace yourself for a fierce discussion between Claude Shannon, John Firth, Noam Chomsky, and Ludwig Wittgenstein!


Back to Sweden

After almost a year of Corona-induced remote work, it's time to return to Sweden.


OCR Error Correction, Take 2

Last week I wrote about OCR error correction and the three general approaches to it. Over the week, I got practical and fixed OCR errors in my dataset. However, I used a fourth approach, which is much more efficient and fixed about 80% of all OCR errors. Read more on my progress!


On the State of OCR Correction

If you have ever worked with text, you will know the huge pain that OCR (Optical Character Recognition) errors can pose. OCR has one job: Detect all the letters and numbers on an image and spit out a text file that contains the text that has been recognized. However, OCR errors still occur frequently. Here I outline what I've learned so far in this regard.


You are viewing page 2 of 6