Sunday, May 24, 2020

Unicode Filename Support in TAR Files in Windows 10

No Unicode Filename Support in ZIP Files in Windows 10

I recently discovered that the built-in ZIP file support in Windows 10 does not support Unicode filenames. For example, I had a file name “みんな一列に.jpg” and tried to add it to a ZIP file, but got the following error:

No Unicode Filename Support in TAR Files in Windows 10

Windows 10 has added built-in support for TAR files through a command line tool. Since I was unable to put files with Unicode filenames into ZIP files in Windows, I tried using this tool to put them into a TAR file. I used the following command to add the “みんな一列に.jpg” file to a new TAR file called test.tar:
D:\Temp>dir /b *.jpg
みんな一列に.jpg

D:\Temp>tar -cvf test.tar *.jpg
a ??????.jpg
Unfortunately, it replaced the Unicode characters with question marks (?). This replacement of characters can also be seen when performing the following command:
D:\Temp>tar -tvf test.tar
-rw-rw-rw-  0 0      0       37044 Nov 10  2004 ??????.jpg

WSL to the Rescue

Windows 10 has the ability to run UNIX command line tools using the Windows Subsystem for Linux. See https://docs.microsoft.com/en-us/windows/wsl/install-win10 for instructions on how to install the Windows System for Linux. Linux also contains a command line tool for creating TAR files. I attempted to create a TAR file containing the “みんな一列に.jpg” file using the following command in an Ubuntu bash shell:
matthew@KIKI2015:/mnt/d/Temp$ ls *.jpg
みんな一列に.jpg
matthew@KIKI2015:/mnt/d/Temp$ tar -cvf test.tar *.jpg
みんな一列に.jpg
I first verified the filename was preserved using the following command:
matthew@KIKI2015:/mnt/d/Temp$ tar -tvf test.tar
-rwxrwxrwx matthew/matthew 37044 2004-11-10 00:54 みんな一列に.jpg
I then extracted the file to a new folder using the following command:
matthew@KIKI2015:/mnt/d/Temp$ cd test
matthew@KIKI2015:/mnt/d/Temp/test$ tar -xvf ../test.tar
みんな一列に.jpg
It even appeared correctly in Windows File Explore:

Back to Windows 10

I was curious to see how the Windows 10 tar program would react to the TAR file created in Linux, so I tried listing the contents of the TAR file using a Windows 10 command prompt window:
D:\Temp>tar -tvf test.tar
-rwxrwxrwx  0 matthew matthew 37044 Nov 10  2004 pü+péôpü¬S+Çsêùpü½.jpg
Unsurprisingly it did not interpret the filename correctly. Interestingly, extract the file resulted in yet a different filename: みんな一列に.jpg.

Hopefully Microsoft will add support for Unicode filenames in ZIP or TAR files in a future update to Windows, but until then, WSL can be used.