Sunday, May 24, 2020

Unicode Filename Support in TAR Files in Windows 10

No Unicode Filename Support in ZIP Files in Windows 10

I recently discovered that the built-in ZIP file support in Windows 10 does not support Unicode filenames. For example, I had a file name “みんな一列に.jpg” and tried to add it to a ZIP file, but got the following error:

No Unicode Filename Support in TAR Files in Windows 10

Windows 10 has added built-in support for TAR files through a command line tool. Since I was unable to put files with Unicode filenames into ZIP files in Windows, I tried using this tool to put them into a TAR file. I used the following command to add the “みんな一列に.jpg” file to a new TAR file called test.tar:
D:\Temp>dir /b *.jpg

D:\Temp>tar -cvf test.tar *.jpg
a ??????.jpg
Unfortunately, it replaced the Unicode characters with question marks (?). This replacement of characters can also be seen when performing the following command:
D:\Temp>tar -tvf test.tar
-rw-rw-rw-  0 0      0       37044 Nov 10  2004 ??????.jpg

WSL to the Rescue

Windows 10 has the ability to run UNIX command line tools using the Windows Subsystem for Linux. See for instructions on how to install the Windows System for Linux. Linux also contains a command line tool for creating TAR files. I attempted to create a TAR file containing the “みんな一列に.jpg” file using the following command in an Ubuntu bash shell:
matthew@KIKI2015:/mnt/d/Temp$ ls *.jpg
matthew@KIKI2015:/mnt/d/Temp$ tar -cvf test.tar *.jpg
I first verified the filename was preserved using the following command:
matthew@KIKI2015:/mnt/d/Temp$ tar -tvf test.tar
-rwxrwxrwx matthew/matthew 37044 2004-11-10 00:54 みんな一列に.jpg
I then extracted the file to a new folder using the following command:
matthew@KIKI2015:/mnt/d/Temp$ cd test
matthew@KIKI2015:/mnt/d/Temp/test$ tar -xvf ../test.tar
It even appeared correctly in Windows File Explore:

Back to Windows 10

I was curious to see how the Windows 10 tar program would react to the TAR file created in Linux, so I tried listing the contents of the TAR file using a Windows 10 command prompt window:
D:\Temp>tar -tvf test.tar
-rwxrwxrwx  0 matthew matthew 37044 Nov 10  2004 pü+péôpü¬S+Çsêùpü½.jpg
Unsurprisingly it did not interpret the filename correctly. Interestingly, extract the file resulted in yet a different filename: みんな一列に.jpg.

Hopefully Microsoft will add support for Unicode filenames in ZIP or TAR files in a future update to Windows, but until then, WSL can be used.