yt-dlp invalid filename

How to avoid filename errors when using yt-dlp

So, you’re downloading a video. Or a list of videos, let’s make it interesting. You’ve got everything set up, you fire up yt-dlp and set it to download.

And all of a sudden, you start getting filename errors. Can’t write file, invalid filename. Turns out yt-dlp is trying to set the filename from the video’s title, and that one has foreign characters, smileys, emoticons, or other unusual symbols in it. Turns out Linux, your favorite Pi distro, or even Windows doesn’t like those and simply aborts saving the file.

So, what do you do? Is there a way to solve this problem?

yt-dlp -o

Let’s start with the -o (or --output) option in yt-dlp. This one allows you to define a custom name for your downloaded files. This is particularly useful when you want a quick and dirty solution to a messed up filename, and don’t really care much about the filename.

So if I’m downloading a video, and the filename gets messed up and yt-dlp crashes, I would simply fix it with -o

yt-dlp -o myvideo.mp4 https://myvideourl.whatever

That one simply downloads the video at myvideourl.whatever, and puts it into a file called myvideo.mp4

If you want a longer filename, or your URL contains special characters such as ? or & (typical in video sites) remember to use quotes:

yt-dlp -o "My really long name.mp4" "https://myvideourl.whatever&videoref=4&index=9&param=vbkRGSDf4583SS"

The -o option also lets you specify tokens, which get replaced when downloading the file. For example if for some strange reason I wanted to name the file according to the creator of the video:

yt-dlp -o "%(creators)s.%(ext)s" https://myurl.whatever

In this case %(creators)s gets replaced with the name of the video creators (if it’s available) and %(ext)s is replaced with the filename extension the video site gives you. The “s” character at the end of the token tells yt-dlp to treat whatever comes through as a string… it’s a python language rule, not really worth diving into. Just don’t forget to include it.

There’s a whole bunch of tokens you can use to name your video files whatever you might imagine in yt-dlp. You can find the full list of tokens on the yt-dlp documentation, in the section “Output Template”. For those not familiar with the Github interface, visit the link and simply scroll down. You’ll find the readme after the files list. If you want to know more about the %(var)s format and what it means, check the python documentation here.

Let’s autonumber with yt-dlp

So now, let’s take it a step further. We know how to specify output filenames. But -o doesn’t seem that useful if we’re doing a huge list of videos, right? What are we gonna do, take our daily list of 38 news shorts and issue a yt-dlp command individually for each one?

Of course not.

yt-dlp autonumber filename

The -o command has an autonumber feature. If you’re downloading multiple videos, and don’t care about filenames, this lets you bypass filename errors.

The parameter, you probably guessed it, is %(autonumber)s

Let’s say I have a list of video URLs in a file called myvids.txt. I want to feed that into yt-dlp and have it autonumber my files. So in the end I get 0001.mp4, 0002.mp4, and so on. The command goes like this:

yt-dlp -a myvids.txt -o "%(autonumber)s.%(ext)s"

Using autonumber solves the invalid character problem in your filenames pretty well. Especially when you’re downloading from a list.

By the way, the extension might change depending on what the video format is. mp4 is pretty common nowadays, but you might also find that on occasion you get a filename with WEBM extension or even FLV.

It’s worth mentioning that any time you’re not supervising your downloads, you should use the --no-playlist option on yt-dlp. If you point yt-dlp at a playlist on Youtube, for example, it’ll download the whole playlist by default. And on Youtube, it’s not that easy to tell a video link from a playlist link when you’re quickly clicking and copying URLs.

If you’re not careful, you might come back an hour later to find 3 GB of videos on your computer, with numerical filenames going from 1 to 496. And good luck sorting that one out.

So the complete command for yt-dlp to download videos using a list, with autonumbered filenames, and no playlist downloading would be this:

yt-dlp -a myvids.txt -o "%(autonumber)s.%(ext)s" --no-playlist

Characters not allowed in filenames

So, why is it such a pain to get filenames to work correctly in cases like the ones I described? Well, the reason is that your operating system, be it Linux, Windows, MacOS or another, has certain characters that it can’t process in filenames.

characters not allowed in filenames

These characters are used by the operating system to signal directory names, for substitutions, to redirect outputs, and a whole lot of other uses. If you put them into a filename, the operating system would likely crash or do something unpredictable trying to access that file. So instead of lighting that dumpster fire, your OS tells you invalid filename and that’s that.

The characters that will usually cause problems on your operating system if you try to put them in file names, are the following:

Not a good idea in Linux

  • Forward slash (/).
  • Pipe or vertical bar (|)
  • Ampersand (&)
  • Question mark and asterisk (? *)
  • NULL (0) byte

Not a good idea in Windows

  • Less than and greater than: < >
  • Slashes, both forward and backslash: / \
  • Colon (:)
  • Quotes (“). Although you can use them to specify long filenames with spaces in them.
  • Pipe (|). Also called a vertical bar.
  • Ampersand (&)
  • Question mark and asterisk (? *)
  • Low ASCII characters (0 to 31)

Also, in Windows you can’t name your files CON, PRN, AUX, NUL, COM1-COM9, LPT1-LPT9.

In general, do yourself a favor and avoid using anything in your file names that’s not a letter, a number, a dash (-) or a low dash (_). Just because your OS says it supports a gazillion characters in filenames, doesn’t mean you have to get creative.

But what if I use quotes?

Sure. You can probably get away with some weird filenames if you use quotes. For example in Linux (specifically in the BASH shell) if you try to create a file called “Tom&Jerry.mp4” like this you get an error:

touch Tom&Jerry.mp4

But if you do it like this, it works:

touch "Tom&Jerry.mp4"

I can even use quotes to get something even more sacrilegious on Linux. For example "Tom*Jerry.mp4"

Now, why you would want to do that, aside from plain and simply curiosity, is beyond me. If you have files named like that, although the OS might allow them, managing them is going to be a royal pain. In the best of cases your filesystem will encode the names right and you won’t notice much. In the worst of cases you’ll run your favorite program and try to access the file, and your OS is going to interpret and expand the special characters… and you’ll probably regret naming your file the way you did.

weird filenames why not

And don’t even get me started on what will happen if you need to code a script in a hurry, to do something with those files.

So even though you might be able to generate forbidden file names using quotes, don’t. Doing that is a very good way to completely mess up and render your files inaccesible, when you least expect it and most need them.

What are CON, LPT, COM for?

You might have noticed that I mentioned that Windows doesn’t allow files named CON, PRN, AUX, NUL, COM, LPT, and so forth. Why?

Well, back when dinosaurs roamed the earth, computer developers had a problem. They needed an easy way to give users access to the computer’s ports, and other devices. An easy way that didn’t involve coding an entire program just to get something sent to the printer. And back then, USB and universal driver standards weren’t exactly easy to come by.

So programmers in the DOS era (and maybe even before that) created special “files” that were hard coded into the operating system, that allowed communication with the computer’s ports and devices. When you tried to read or write from those “files”, you’d be talking to whatever hardware device was connected to the port. No drivers needed (well, sort of).

Since then, certain file names were forbidden in DOS, and since Windows got much of it’s stuff from DOS, it also kept the forbidden file names. Some of the more common restricted filenames you’ll find:

  • LPT. LPT means “Line printer terminal”. Yeah… get that: line printer. It’s from the time when printers understood a page not by pixels or dots, but by lines and characters. LPT1 through LPT9 are ports usually reserved for printers, way back when, they were known as “parallel ports”.
  • COM. COM usually meant serial ports and you normally had COM1 through COM3 or COM4. You could have up to COM9 and beyond, but since each port usually required a dedicated device and controller card to support it (a modem, mouse, whatever), it was pretty strange to see beyond COM4 installed on your computer. COM ports later became USB ports when serial ports turned into the Universal Serial Bus (USB). And in Windows if you check deep in your USB properties, you’ll probably see USB devices are still referred to as “COM”s.
  • NUL. NUL is pretty much a software black hole. Whatever you throw in there, is gone. Usually when you have a program that outputs a ton of text you don’t want, you have the text sent to NUL so you don’t have to worry about it.
  • CON. CON is a device that usually refers to your screen and keyboard (“console”). If you send something to CON, it gets displayed on the screen. If you read from CON, you get whatever is waiting at the keyboard.

There’s a lot more devices that work like this, but these are the main ones you’ll find on DOS/Windows machines. On modern Linux, you usually have a ton of software devices and they’re located in the /dev directory. They work similar to what I described above.

If you put those reserved words into a filename, your operating system can’t tell whether you want to perform a command on the corresponding device, or on a file called that. And to avoid dealing with the confusion, it doesn’t allow you to name files with reserved words.

One important thing to note is that you can’t name using a reserved word, but you can use it if you add more stuff to the filename, so that the name is different from the reserved device name. For example you can’t name a file “COM1.txt”, but it’s totally valid and harmless to name it “COM1-.txt” or “COM1 files.txt” or “aCOM1.txt”

legacy dos devices

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *