The horrifying beauty of JavaScript

Hendrik Erz

Abstract: When I started programming, I hated JavaScript. It was inconsistent and had weird quirks. Now I can look back on five years of heavy JavaScript coding and have learned to live with the weirdness. However, the language never ceases to amaze, and so today I share an innocent looking code snippet and dive into the horrifying beauty of it.

Published on Thursday, September 1st, 2022 by Hendrik | 12 min reading time

A few days ago, I wrote a piece of JavaScript code and it made me wonder. Wonder because it looked absolutely amazing, perfectly clean, and aesthetically pleasing. But, at the same time, something seemed off. I then posted the snippet to Reddit to ask some people who are more educated in programming than me to see what their thoughts were. And they confirmed my fears: It was both an absolute beauty and the material of nightmares. The snippet I gave them was the following:

let lastSeenIndex = 0
return text
  .replace(/^\`{1,3}.+?^\`{1,3}$/gsm, '')
  .replace(/^-{3}.+?^(?:-{3}|\.{3})$/gsm, '')
  .replace(/^#{1,6}\s+/gm, '')
  .replace(/^\s*[-+*]\s\[[x\s]\]\s/gmi, '')
  .split(/[\.:!?]\s+|\n/ig)
  .map(sentence => {
    const idx = text.indexOf(sentence, lastSeenIndex)
    lastSeenIndex = idx + sentence.length

    let rangeEnd = lastSeenIndex
    if ('.:!?'.includes(text.charAt(rangeEnd))) {
      rangeEnd++
    }

    return {
      from: offset + idx,
      to: offset + rangeEnd,
      sentence: sentence
        .replace(/[*_]{1,3}[^_*]+[_*]{1,3}/g, '')
        .replace(/\[\[[^\]]+\[\[/g, '')
        .replace(/!\[[^\]]+\]\([^)]+\)/g, '')
        .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
        .replace(/\[[^[\]]*@[^[\]]+\]/, '')
    }
  })
  .filter(v => v.sentence.length >= 2)
  .map(v => {
    const words = v.sentence.trim().split(' ').filter(word => word !== '')
    const score = readabilityAlgorithms[algorithm](words)
    return { from: v.from, to: v.to, score }
  })
  .map(v => scoreDecorations[v.score].range(v.from, v.to))

What is happening?

So, first of all, what is actually happening here? Well, the primary aim of this piece of code (which is 100% of the body of a function) is to take a string of Markdown formatted code, convert it to pseudo-plain-text, split it into sentences, score those sentences using a readability algorithm, and use these scores to tell the editor in which the text is displayed how to color the sentence.

Of course, for more suspense and also because it looked cleaner, I stripped all the comments I made, so here’s what the different parts do (line by line):

Remove fenced code blocks
Remove YAML metadata blocks
Remove heading indicators
Remove list indicators
Split the text roughly at (Latin-script) sentence ends
Determine the offset where the sentences occur in the original text (so that I can tell the editor where a given color starts and where it ends) and return an object with that information and the sentence, from which I additionally strip inline markup:
1. Remove bold and italic formatting
2. Remove images
3. Remove the URLs of Markdown links, keeping only the title
4. Remove citations (Pandoc-style)
Remove all sentences deemed to short (they must have 3 or more words)
Tokenize the sentences
Score the tokens and return all that (basically replacing the sentence with its score)
Create a set of “decorations” that the editor understands

This is in and of itself a very beautiful piece of code because it takes a string and gradually converts that string into a set of colors to be applied on a code editor. There are almost no variables involved, and, as you may already have noticed, all of that happens within the return statement of the function.

Why does it work?

Time to look closer at what the code is doing, to gradually unearth the monstrosity at the heart of this piece of code. There are two types of values and four methods in the piece which are central to disentangling what is happening here. First, the whole endeavor begins with a string until the split-method, which returns an Array of strings for the first time. Afterwards, the Array contains three different types of data: First simple string primitives, then custom-made objects with the range properties, and lastly decorations (which are themselves objects, but because they have a well-defined interface, I distinguish them from my “I need that kind of information now, so let’s just create an object to contain it”).

The four methods that are important here are:

replace(what, how): Searches for what in a string, replaces every occurrence with how and finally returns the new string. I am using regular expressions because I need to replace patterns instead of fixed character sequences. If you are curious, give a tutorial on Regular Expressions a read.
split(where): Takes either a string or (again, in this instance, a pattern) and divides the string into a list of strings.
map(callback): Takes an array and creates a new array made from whatever you return from the callback you pass to the function. The callback is expected to take the original value and transform it (read: map it) to whatever you need.
filter(callback): Similar to map, but this time the callback is expected to return a Boolean indicating whether the element should be retained in the new array, or removed.

The first two functions are methods that only work with strings, the second two functions only work with Arrays. replace takes a string as input and returns a string. split takes a string, but instead returns an array of strings. map and filter both take arrays as input and spit out modified arrays.

But wait!, you may now say. Where are the inputs coming from?! And here we’re getting closer to the monstrosity.

Is this Functional? Is this OOP?

Programmers want to be consistent in what they do. Additionally, since writing code requires a high level of abstract thinking, it is advisable to follow certain patterns when writing code that help you not make a mistake somewhere down the line. As a programmer, you need to always think abstract because every computer program is just a set of functions, and functions are defined as reusable pieces of code that take some input and transform it. Think about it: You write computer programs not because you already have some kind of table that assigns every possible input one output, but rather because you want to simplify what you are doing: Instead of creating long lists of what output should follow which input, you try to find general rules to convert any input into an appropriate output.

There are, broadly speaking, two “big” types of programming paradigms out there: functional programming and object oriented programming (or, OOP for short). Both have pros and cons, and the important insight here is that it’s always advisable to use only one of those.

Functional programming follows the idea that you have some state that describes your program, but the state should be immutable. Instead of modifying (= mutating) some parts of your state, functional programming always takes one state and returns that new state. OOP, on the other hand, treats the state itself as an entity that should do whatever it pleases with itself. That means that you create a single state and then call methods on that state that mutate it. To give you an example, here’s a quick example.

Functional programming

function startState () {
    return {
        someVariable: 'Hello, world!',
        step: 0
    }
}

function newState (oldState) {
    return {
        someVariable: doSomething(oldState.someVariable),
        step: oldState.step + 1
    }
}

let state = startState() // Returns the start state
state = newState(state) // Takes the state and returns a modified version

Object-Oriented Programming

class MyState {
    constructor () {
        this.someVariable = 'Hello, world!'
        this.step = 0
    }

    doSomething () {
        /* Do something with someVariable here */
        this.step++
    }
}

const state = new MyState() // Equivalent to startState() above
state.doSomething() // Equivalent to newState() above

I am telling you all of this because the code snippet I presented to you initially is actually a very weird mixture of both paradigms! It is on the one hand functional because there are no side-effects¹ involved, but it also looks like OOP, because you are calling methods on an object. So now we can see more of the monstrosity of the code.

Functional OOP?!

In my Reddit post, I already hinted at the fact that this code is a very weird mix of functional and OOP styles. Strictly speaking, the code is functional, but, because the methods are implemented on the string objects, we can get rid of local variable assignments during the transformations.

This is something that is just awful once you know about the functional and OOP paradigms: Strings are not really primitives in JavaScript, but they are also immutable, i.e. you cannot change them. Whichever function you call, it will always return a new string. The code I’ve written above cleverly makes use of that in order to chain the transformations: Since replace() returns the transformed string, I immediately call another replace() on whatever the previous function has returned, i.e. the new string.

The sole benefit of this is literally that I save myself the hassle of adding text = in front of the .replace-calls, and similarly returnValue = in front of the .map and .filter-calls. In fact, the code is much closer to functional code than it might seem on the onset. It is a perfect example of one benefit that functional programming has over OOP: compositionality.

In fact, many commentors suggested to me to fully commit to the functional style in this case, and reformulate the code so that it becomes clearer what it does and migrate all the regular expression madness from there into their own respective and – most importantly – descriptive functions. Turned fully functional, the code could look like this:

  text = removeBlocks(text)
  text = removeBlockFormatting(text)
  const sentences = text2sentences(text)
  // sentences2ranges would, after extracting the range, call another
  // function for each sentence: sentence = removeInlineMarkup(sentence)
  const sentencesWithRanges = sentences2ranges(sentences, text)
  // scoreSentences would also call a function per sentence:
  // words = tokenize(sentence)
  const scoresWithRanges = scoreSentences(sentencesWithRanges)
  return makeDecorations(scoresWithRanges)

The benefits of this over my snake-chain-monstrosity are clear:

The code is fully compositional, so whenever I needed to remove code blocks at another point in my program, I could reuse the same function
If, at some point, I needed to apply another transformation, it would be much simpler to add it to the improved code
The improved code is fully testable, i.e. I can write an automatic check for every function that tests whether the function actually does what I expect it to do
It is more declarative, i.e. hides away the actual implementation. This means that I can, in the background, adapt the actual removal process should the syntax change, so that the function removeBlocks will always remove blocks, no matter what they look like.

Why is JavaScript Like This?

At this point, one last question remains: why does JavaScript do this to us?

We all know the quirks of the language, such that you can actually write programs just using empty arrays, and other weirdness. The main cause of this is often identified as the fact that JavaScript has been developed in an insanely short time frame of less than a month. Due to this, many quirks remained and now have to be maintained for backwards compatibility.

The “Everything is an object”-mentality of the language, however, is probably due to preference, which now stuck with the language forever. But there is an actual benefit to this weirdness. One commentor made a very interesting note:

This is why JS needs a pipeline operator!

To quickly explain: A pipe(line) operator basically takes the output from one function and immediately passes it to the next one. One example is R:

myVariable %>%
    transform1(additionalParameter) %>%
    transform2() %>%
    transform3()

What this quirky %>% does is it takes myVariable, inserts it as the first argument to transform1, then takes the return value from that and inserts it as the first argument to transform2, whose output finally is passed into transform3. To a certain degree, the %>% -operator fulfills exactly the same function as the .-operator in JavaScript.

There is a difference, though: While R’s pipe-operator only requires pure functions², in JavaScript you have to define those functions on the so-called “prototype chain” of the Object you want to apply them to because the first argument for functions such as reduce is not an actual argument, but this. So in the end these functions are, after all, bound to the type of the object. You can’t just say “Hey, but now I want to split my array” — because arrays don’t have a split-method.

It is this that will forever prevent pure functional programming styles in the context of JavaScript, but also purely OOP-oriented styles (because, oh my, you can do so many stupid things with objects) and will remain some weird mixture between the two.

And that’s the horrifying beauty of JavaScript.

A side effect happens when you call a function and, as a result of calling that function, some part of your state at the very other end of your program suddenly has changed. A side effect is a result of a function call that is not obvious. OOP is full of side effects, because you call a function that does not return anything, but rather intransparently modifies the object itself. ↩
Apparently other definitions of "pure" are more strict than this, but I simply refer to a function being "pure" if it just takes some input, transforms it, returns something and does not touch any other part of your program, i.e. the no-side-effect thing. ↩