Understanding Files and Folders: A Primer | Hendrik Erz

Abstract: In the past years, educators have noticed that younger generations began having difficulties understanding the traditional file-centric model of computers. In this article, I provide a primer in what the file-centric model is and how it works, targeted at people who grew up learning only the app-centric model of smartphones.


In today’s article, I offer a simple, straightforward primer on what files and folders are, and how you can use them properly. The target audience for this article are people who have interacted with digital content mostly through smartphones and tablets, and less with (“real”) computers. The article is especially targeted at social science students in the quantitative subfields that need to work with data, R, Python, or anything in this realm. If you’re an instructor, feel free to pass this article around so that everyone is on the same page before going into R and data analysis.

So, if you’re one of the people who have not been navigating file trees as you grew up, read on to learn what they are, where they come from, why we need them, and how you can properly utilize them! After reading this article, you’ll know probably more about files and folders than any of your friends (unless they study computer science, that is)!

Aside: Motivation

First, let me devote a few paragraphs on why I am writing this article, since this exact topic has occupied my mind for years by now. If you’re not interested, simply jump to the next section.

Long ago I watched a video by Tom Scott, titled “A Christmas Computer Bug, and the Future of Files”. I was reminded of this video by a conversation that I had a few days ago. During ASA 2024 in Montréal I was at a bar talking to a colleague. We were chatting about teaching coding to social science students, and he mentioned that in a recent course, he had to devote an entire day to explaining to his students what files were. To this, I replied, “No way!”, and he continued to explain that now a generation of students is coming of age that grew up not with computers, but with smartphones and tablets. Those students have never consciously interacted with files and folder structures, and that apparently led, among other things, to some students dumping everything they ever did on their computer onto the Desktop.

Looking back at the upload date of that video (eight years ago!), I am honestly surprised at how true Scott’s predictions were. It’s not that any of this was unexpected. But I disagree with him and believe that files and folders are here to stay. I am not going into the depth of the discussion on whether it is good or bad for people never to “see” files and folders here, however. Here’s one take on that.

Hiding away the files and folders on our smartphones makes sense for mobile devices and I think it would be weird to add Finder or Windows Explorer to a smartphone. But if the result of the proliferation of mobile devices is that people never even learn what files and folders are, then this is a problem, because files and folders will never go away.

More so, if you want to work with quantitative data, you need to know what files and folders are. And I don’t say this to give you chores. I’m saying this because understanding how that works helps you directly. In data science, there is not a single app that could do everything you must do to your data before you submit your paper. If computers used the same model as mobile devices, you would constantly be “sharing” files back and forth between a set of apps. Having one file and opening this from various programs is much faster and more ergonomic.

And that’s why this article exists. I believe nobody should be spending an entire day explaining what files are. The topic is so simple, you can do it in a few minutes and some practice. So let’s understand files and folders together! I’ll add some more depth to the explanations, because I personally like to comprehensively understand concepts.

I: What are Files and Folders?

First the “big picture” on what files and folders are meant to be. The reason they’re called “file” and “folder” is that they are metaphors that are intended to help people understand what they are meant for. They stem from an analog past when people still used (paper) files and folders. Technically, a file is a resource, and a folder is called a directory. But that’s very abstract, so thinking about them in terms of files and folders helps tremendously.

A folder is straightforward to understand: It’s a container for several files and works exactly like folders you may remember from school (if you were still using them). Whenever you need to group a set of files thematically, you should create a new folder. For example, one folder per course at university. You should also create entire hierarchies of folders, such as one “big” folder for your entire university program, then one per course within that big folder.

Files are somewhat more difficult to understand, however. “File” is simply the gerund of “to file,” an act in which someone takes a set of documents to some institution, such as a court. This already alludes to the fact that a “file” is not just a “document,” as many (including me until now) believe. Here, the German terminology is actually helpful (finally!): The German word for file is “Datei,” which is etymologically related to “Akte.” An “Akte” is a collection of closely related pieces of data (including, but not exclusively text), making the description of what a file is less vague than in English.

One way to phrase it, then, might be the following: A file is a collection of pieces of data (images, text, audio, etc.) that forms a coherent whole, and in which no part can be separated from the others without losing its meaning. A scientific research paper, for example, is a file: The plots in the paper don’t make much sense without the textual description, and the text would be harder to comprehend without plots. Thus, we usually collect both text and plots in a single file – e.g., a Word document.

A file is still somewhat vague, however, so it makes sense to take some time to go deeper and understand what a file is, technically.

II: Your Computer has no Concept of Files

Let me start with a surprising fun fact: Your computer has no concept of files. Okay, more specifically, the hardware doesn’t.

On a hardware level, all data is just lists of zeroes and ones. So if your computer or smartphone has “256 GB of Storage” (usually a Solid State Drive, or SSD), this means that it contains a list of 256,000,000,000 consecutive bytes, or numbers. This list has a fixed length. If you need more storage, you need to buy, e.g., an external hard drive. And that will be another (fixed-length) list of numbers that you can fill with data. You can really think of your computer/smartphone storage as a single, very long list of numbers, because that’s essentially what it is. There is no “file” on your storage. Just some zeros and ones.

How your hardware sees files

So, where do files then come from? Well, the file system. A file system is a piece of software that sits between you and your storage. When you create a new file and save it to your computer, whichever app you use for this will tell the operating system (that is not yet the file system!) to create a file. The operating system itself, however, has no idea where it can put the file. Instead, it will tell the file system to create the file. Think of the file system as just another application that runs on your computer and whose sole purpose is to read and write files.

The file system is the only part of your computer that actually talks to your storage. It’s the only part that knows where in that very long list of numbers there is some space for the file. And it’s the only part that knows where your “Term Paper_final_final_really_final.docx” is when you need to open it once more for a round of revisions. To do so, the file system maintains a list of all the files and where to find them.

Why is the file system something different from the operating system? Well, because there are a bunch of different ones. There are many ways to organize data in a list of numbers. Some are better for very large files, others are better for very tiny files. But all of them can manage long lists of numbers. By separating the file system from your operating system, your computer is able to work with more than one file system. Windows, macOS, and Linux traditionally all use different file systems, but they can also work together. This is the reason you can transfer files between computers and MacBooks using USB keys: Because your computer supports several file systems.

Common file systems that you will encounter are NTFS (for Windows), APFS (for Macs), ext4 (for Linux), and FAT 32 (for USB keys; understood by all operating systems). Just so you know, for future reference.

III: What are Files and Why are They Necessary?

Files are necessary because depending on how the zeros and ones should be interpreted differ. There is no way for your computer to just look at the raw zeros and ones and know “Ah, this is a Word document!”1 There are plain text files, Word documents, images, movies, tabular data, R data files, and so on. On the storage, all of these files look the same: different arrangements of zeroes and ones. Let’s say you want to look at a picture on your phone. In order to display it, the phone needs to load in all the zeroes and ones that comprise that image. For that, it will ask the file system where that image starts and ends, and only load these numbers. Each file must follow a very strict layout of the zeros and ones. Images, for example, must contain the information on each pixel in red, green, and blue (RGB) in a very specific order. The photo app understands this order and can thus turn the zeros and ones into an image that we can view.

If your file system loaded a few additional numbers after the end of the image from your storage, then your phone couldn’t display the photo, because it doesn’t know what these other numbers mean. They would not conform to the strict format that your photos app expects, and so it will likely show you an error instead of the image. This also works the other way around, however: If you, for example, crop the photo, you make it smaller. This has led to a serious security issue in Android phones some time ago. What Android essentially did when you cropped a photo was that it simply told your file system to move the “This file ends at number X” signpost back by a few numbers. It left the cropped-away part of the image data untouched. Therefore, if someone could tell the file system to indeed load a few more numbers after the official “end” of the photo, you could restore the cropped part. This worked because the zeros and ones after the end of the image still conformed to the correct data structure, so a photo app can interpret them correctly.

So files are necessary because your hardware doesn’t know where a photo ends and your thesis starts. The concept of files is basically all about siloing different types of data. Both images and Word documents are just numbers, but by interpreting one as image data, and the other as text data, you get something you recognize — as a photo of your cat, or your thesis.

To combine this insight with the more metaphorical meaning of a file as “Akte,” we are very free in what we call a “file.” You could come up with your very own file format, if you wanted to! Just give it a filename extension of your choice, and write a program that can open it, that is: Correctly interpret whatever the data structure in there is.

Not everything makes sense being stored together, however. Audio and video often go together, and text and images. But images and tabular data less so. This is why you’ll often only see a very small set of file types in your work:

  • Rich formatted text documents that combine text and images and that end in .docx (or .odt or .pages)
  • Images of various types (i.e., various ways of ordering the colored pixels in an image) that end in .png, .jpeg, or .gif
  • Movies/videos that contain both image data and audio data and which end in, e.g., .mp4 or .mov
  • Plain-text files which only include text and that end in .txt
  • Code that contains plain text using a specific syntax that R or Python can understand and that ends in, e.g., .R or .py
  • Jupyter notebooks (.ipynb) that combine text, code, and images

All of these files are just zeros and ones to your computer, but a program that can open them knows what structure the data in these files has, and can interpret it correctly.

Filename Extensions

In this section, I have frequently spoken of “if a file ends with XYZ.” This file ending is officially called filename extension. I want to say a bit more about this concept, because a lot of this builds on conventions, rather than technical limitations.

A filename extension is always separated from the rest of the filename by a period, or dot (.). Anything before the last dot is considered the filename, while everything after it is considered the filename extension. The filename extension is used by your operating system to know which program to open a file with. Any file that ends in .jpg, for example, is treated as a JPEG image. If you double-click such a file in the file browser (i.e., Finder or Explorer), the computer will open the associated app and tell it to show the file. Likewise, programs will look at the filename extensions to determine what file it actually is. This is especially useful for apps that can work with multiple types of files.

You can, of course, add multiple periods to a filename. But what will be considered the extension is only the text after the last period in the filename. So the files cat.photo.jpg and cat.jpg will both be considered to be JPEG photos. You can use this to your advantage: Sometimes, when regenerating a data file, I rename the previous data file from, say, data.csv to data.backup.csv to indicate that it is a backup.

Many operating systems hide these extensions, because you really should not change them in most cases. If you renamed a file from .jpeg to .docx, for example, the operating system would try to open the file with Word instead of your photo app, and this would lead Word to throw an error, because it can’t open photos. This can also be used to deceive you: Scammers will often send emails which have an executable attached, but the scammer might name it cat.jpg.exe so that what you see is only cat.jpg and might wrongfully assume it’s just a photo. But as soon as you open it, the computer will run the .exe-file and thus potentially infect your computer with a virus. Therefore, whenever you get a new computer, I recommend you to always make sure it displays all file extensions, everywhere.

There are three special cases that are good to know: Omitting filename extensions, hidden files, and app bundles.

First, you can omit the filename extension. If your filename does not contain any period, the file is considered not to have a filename extension. This frequently happens with system files on Unix/Linux systems, including macOS. When there are files that only the operating system itself will open (that is, you can be sure that only a specific app will work with a file), you can omit the filename extension. Remember: Any app can read any file, regardless of filename extension, but ensuring that the filename extension conforms to the data contained therein ensures consistency. Even on smartphones, where you don’t see any filenames, apps will add extensions to files. When you share a file with another app, it can look at the filename extension and infer what type of file it is.

Second, you can make a filename “extension only” by letting it start with a period. These files are called “dot files.” Unix computers including Linux and macOS will treat these files as “hidden files.” Hidden files are frequently used by apps to store some configuration, and your file browser won’t show these files by default. However, there is a setting to show them. In case you’re on macOS, try opening a finder window, navigate to your user folder, and then press Cmd+Shift+.. This will then show all hidden files and folders in it, which can be quite a few depending on how long you’ve been using the computer. You can safely hide those files and folders, as they’re only managed by apps. This will reduce clutter when you yourself have to find some files.

Thirdly, a smart but also odd feature of macOS: App bundles. Nobody has ever said that only files can end with filename extensions. On macOS, you can also add extensions to folders. This will then make macOS treat them as bundles, i.e., as files. For example, apps on macOS are actually just folders with all the data the app needs inside them. By naming the folder something.app, you make Finder treat the folder as a file, and double-clicking will execute them instead of opening the folder. I personally think it stretches the metaphors a bit, but when it comes to application setups, this is certainly the more comfortable way as opposed to the clumsy setup process that Windows uses.

IV: Locating Files Using Paths

But how does your file system know where a file starts, and where it ends? Well, that’s the main part of your file system: Remembering those positions. It does so using paths.

Essentially, a file system can work by literally maintaining the computer scientific equivalent of an Excel spreadsheet with three columns: The start and end positions of each file and some unique name (that is, the filename). A filename, therefore is just a description for the data stored between those two positions. Now you know why you should “give your files meaningful names.” ;)

How your computer remembers where files are.

However, modern computers and smartphones have billions of files, and anyone who has had the pleasure to work even with a moderately sized Excel spreadsheet knows that, at a certain size, it will just become unbearably slow. A similar thing happens on your computer’s storage. If such a list of files becomes too long, then your computer will take ages to open some file, because the file system has to go through its Excel spreadsheet from the first line to the last line. So if your file happens to be the last entry in that table, you’re in for a treat.

This is why such flat file systems have long been abandoned. Instead, computers nowadays use hierarchical file systems. A hierarchical file system has what is known as a directory tree. That’s not an actual tree, but rather a tree data structure. Let’s quickly unwrap this.

First, let’s again imagine our file system Excel spreadsheet. Let’s assume you have a thousand files in that list, and let’s assume it takes a very long time for your computer to actually find out where the file number 1,000 is on your storage. So you want to make that faster. Remember that your computer basically just goes through this list from start to finish until it has found what you are looking for. So what can we do to make that faster? We simply make the list shorter!

Let’s say we split up that 1,000 file table into ten, each with only 100 entries. We give each of these ten new tables a name, let’s say letters from “A” through “J.” Then, we create a new table and in that we only store where each of the ten additional tables is. Now finding files on your computer will be much faster. Let’s say you search for the file “C/thesis.docx”. Then, your computer first looks through the “main” table and finds the location of the table “C.” Then it looks into that one to find the location of the file “thesis.docx”. This is much faster! Let’s do the math: If you store all 1,000 files in a single table and need to find the location of the very last entry of that table, your computer needs to look at 999 irrelevant entries before it finds the correct one. In the new tree structure, it can completely ignore most of these files, because they are in different tables. In the new system, the worst case is that your computer needs to check ten entries for the “sub-tables” and then 99 of the entries in that table before it finds your file. That’s just 109 entries versus 999 – we could skip a whopping 890 entries!

Now, let’s say you create an intermediary layer of additional tables which you give names like “Documents,” “Pictures,” “Videos,” and “Downloads.” You can hopefully see where this leads to: Instead of “C/thesis.docx” we usually write “C:\Documents\thesis.docx” — which you hopefully already recognize!

To locate a file, you simply use these table-names and separate them with a backslash (or forward slash if you use a proper operating system). Et voilà: You have a path.

V: Folders, or: Directories

This leads us to the question of what folders really are. Folders are essentially only names for such intermediary tables. Whenever you create a new folder, you essentially tell your computer to create a new empty file table. This also explains why folders don’t take up any space: They are just names for these tables. The storage space that these tables need is negligible, because people much smarter than me have optimized this well.

This also means that you should always create folders — it’s better to have too many folders than too few. Let’s get quickly back to the example from the start: There are many people who throw all their stuff onto the Desktop, or into the Downloads folder and never sort the files away. This can actually make your computer slower, because even though modern computers are very fast, they still have to sift through lists of files, and if these become very long, it will take a noticeable time to open files.2

Of course, you shouldn’t go overboard with folders; at some point there are diminishing returns. Before I introduce you to starting points of how to start organizing your files on your computer, let’s quickly build the bridge between smartphones and computers so that you understand what the major differences are. With the knowledge about files and folders you have now, it should be straight forward.

VI: Smartphones vs. Computers

The first step to understanding the differences between how your smartphone and your computer handles files is to understand that it’s all about abstractions. As I alluded to earlier, even “file” and “folder” are abstract metaphorical concepts meant to make it easier to grasp. The same also applies to the way smartphones and computers handle how you work with files.

On a very basic level, your smartphone is app-centric, while your computer is file-centric. What do I mean by that? Well, on your smartphone, the first action you do is always open an application. Files are secondary: You can share them, but only from within some app. Your computer, however, has files first. You do need a program to navigate the files – Windows Explorer or Apple’s Finder, for example – but then you double-click a file which will then open it with the default application.

Both of these abstractions (this is called UX design, by the way) have their benefits and drawbacks.

Smartphones have not been designed to work with files. Instead, they have been designed around apps. Usually, you have one app for a single use-case — messengers, for example, or the already mentioned photos app. These apps often have everything they need included, so you don’t need to work with files — everything you do, you do from within that one app. For example, a messenger you just open, and then you have your text messages. There are no files anywhere to be seen. This makes working with a smartphone simple: If you want to message someone, you open that app. If you want to check your emails, you do it in the Mail app.

However, this design reaches its limits once you actually do need files. For example, we frequently add photos to our messages, so we give our messenger access to our photos. If we don’t give our messenger access to photos, you can still share them, but you would have to go into the photos app and “share” the photo with the messenger. That would be cumbersome, so smartphone makers have made it easy for applications to just directly access your folder full of photos. In other words, the smartphone design reaches its limits once you need to work with one file from several apps. Here’s where the app-centric concept starts to break down.

On the computer, apps having access to the same files is commonplace. For example, if you have a dataset in CSV format, you probably have some CSV-viewer installed so that you can preview the file. But RStudio or whatever data analysis software you use also needs to access this. The star of the show here is the file, not the app, which is a mere tool to do something useful with your file. This also has its limits, of course, as this now requires you to actually think about the folder structure you use. If you use a bad folder structure, it will be difficult to find your files again.

What smartphones have done is essentially to hide the full (programmers would say “qualifying”) path name to your files. On your smartphone, all your apps have a piece of storage in which they can store their files. Your photos app, for example, has access to a piece of storage where it can store all your photos. Your note-taking app likewise has a storage space where it can store your notes. For your smartphone, these are just folders somewhere on its storage. To you, they are not presented as files in a folder.

Your photos app likely chooses a grid or mosaic layout where you can see all photos in a big, colorful wall. Your notes app may show you all those notes in a list (which is very reminiscent of a file list), but many present them to you as cards, and even sort them. For example, they may display your two most opened notes always at the top, even if they’re really at the very end of the list of files in that folder.

In essence, when it comes to understanding files and folders, a lot about what matters to you is in what way these files are presented to you by your application. So if you have trouble understanding how files and folders on your computer work, it’s not that you are uneducated or anything; it’s rather that, to understand files and folders on your computer, you have to get accustomed to the way it displays them to you.

Both Finder (macOS) and the Explorer (Windows) display files very generically. These are file browsers, so their purpose is to allow you to navigate your entire computer, and not just a single folder where every file is of the same type. They cannot by default show you preview pictures of your photos, because there may very well be some text files in between on your computer.

That’s why, on your computer, everything is organized around the metaphors of files and folders. Because that’s the most efficient way to display files in a filetype-agnostic way.

VII: What all those Folders on Your Computer are for

Before I leave you with some final tips on organizing your files and folders, let us have a quick rundown of some default folders on your computer. Understanding this will already give you a good understanding of where you can put your files.

First, your computer contains two broad categories of files: Files it needs to work (i.e., stuff the operating system itself needs), and your files. For that purpose, every computer has a “home” folder that is exclusively meant for your personal files. On Windows, that folder is located at C:\Users\you, on macOS under /users/you and on Linux under /home/you, where you stands for your username.

That folder is never empty, however. Every computer offers a set of pre-defined folders: Documents, Pictures, Videos, and Music – maybe some others. These should be self-explanatory. If you use software to back up your data to the cloud, you may also have a folder for OneDrive, Nextcloud, or Dropbox.

VIII: Ways to Structure Folders

Finally, some advice on what you can do with your files. I’m going to share some experience from myself to give you some pointers, but ultimately you’ll have to find out what works for you. This process takes some trial and error, and especially time! And these structures will change as you switch careers.

First, you should always store any of your own files inside your user folder (i.e., for me at /Users/hendrik, on my Windows computer at C:\Users\hendrik). This makes it easy to back it up (which you should definitely do from time to time), and ensures they don’t get mixed up with system data that you don’t have to back up. In addition, a lot of software is built around assuming that your own files are somewhere in this folder. So by sticking to this simple rule, you’ll already prevent a lot of headache for yourself.

Second, try to be smart when it comes to using folders. I personally don’t use all the default folders my computer comes with. I don’t make videos, for example.3 However, I have a Nextcloud account where I back up all (and I don’t say this lightly) of my files, which amount to about 20 GB. This folder is in my home folder and is constantly synchronized to my server so that I have two backups of my data at all times.

I also use the “Documents” folder, but only for my coding work. Why? Because I organize those using git repositories, so they are backed up via GitHub already. There is no need to stuff these (sometimes very large) repositories into my Nextcloud.

Below these two “big” locations, I have a more complex file structure. The “Documents” folder is organized simply, and usually I have my repositories (which I also delete from time to time because, again, these are backed up) in one single folder because I usually have no more than 30 or 40.

The Nextcloud structure is more sophisticated. First, I make use of the PARA principle for very coarse sorting of my non-Uni-related stuff. My university folder (which is just called “PhD”) then has folders for my papers, this blog, notes, my courses, teaching, and conferences.

Basically, the more important something is, the higher up it’s in the hierarchy: I don’t have a “Papers” subfolder, because I don’t write that many of them, and I need to access them daily, so I left out that (possible and reasonable) additional folder. The conferences, courses, and teaching folders have subfolders in them for each conference/course, including a year where applicable (I take courses only once, so no need for year numbers, but I teach and visit conferences each year).

Inside almost every of these folders I have one file called §§ Etherpad.md which is essentially a scratchpad for some notes that I need to jot down in relation to the folder (=event or course). The name is arbitrary; I came up with it thanks to a friend and a software called Etherpad. The §§ at the beginning are purely used for instances in which files are sorted alphabetically, so that these files show up at the top of the list. The “Etherpad” is just my own mental model for what a scratchpad looks like, because this is how I’ve always used Etherpad. For you, it very likely is a different word.

Apart from these more rigid rules, I do allow myself some flexibility in adding or removing additional folders. Thinking about what I do, I have found that I often tend to delete subfolders if I realize that I access files very often to make it less cumbersome to find them. Likewise, I tend to add subfolders if I realize that I have two distinct categories of files in the same folder that my brain wants to distinguish.

The most important part when finding a good solution for you is: Try to find a balance between folders and ease of access. Also, never work against your brain. If your brain somehow always associates cat pictures with taxes, then by all means put the “cat pictures” folder in the same folder as your tax reports, because this will make you ten times faster in finding either of those.

Conclusion

I hope this explainer has helped you understand what files and folders are, and why they are so important. Smartphones have invented a different model of how to store files, and modern searches tend to get slowly better at finding relevant files for you. It is important to understand the distinction between the app-centric model of the smartphone and the file-centric model of the computer. Both have their clear benefits, but it is paramount to understand both of them.

Especially if you do data work, you’ll need to have those pieces of data organized in files and folders out of two reasons. First, these tend to be either large or many. Large files are difficult for the operating system to handle (including search), while many will reduce your computer’s speed if you keep all of them in the same folder. And second, the app-centric model of the smartphone only works because we don’t do complex tasks on it. On the computer, there is no single app to do all data work. The file-centric model ensures you can keep multiple apps open and work on the data from all of them, without having to “share” a multi-GB-file around several times.

Update (Sept 19, 2024): I have added a section on filename extensions. Thanks to Dustin Stoltz for suggesting it!


1 Before someone now complains: Yes, there are so-called “magic numbers” that can identify a file type from the raw bitstream, but I’m already getting very complex, so I’m leaving this out.
2 In case you now say “Hah, but that’s no longer a problem with up-to-date, fast computers”, you are wrong. I have a folder full of downloaded U.S. Congressional Speech transcripts, one file per day. If I open that folder in my file explorer, it will hang — on a 2023 MacBook Pro M2 Pro with 16 GB RAM. So yes, having too many files in a single folder is a bad idea, even today.
3 Okay, that was a blatant lie. But I don’t store those in this folder.

Suggested Citation

Erz, Hendrik (2024). “Understanding Files and Folders: A Primer”. hendrik-erz.de, 18 Aug 2024, https://www.hendrik-erz.de/post/understanding-files-and-folders-a-primer.

Ko-Fi Logo
Send a Tip on Ko-Fi

Did you enjoy this article? Leave a tip on Ko-Fi!

← Return to the post list