HPFS and FAT filename characters
0. Contents of filechar.zip:
FILECHAR.ABS this text
FILECHAR.CMD OS/2 REXX script to create FILECHAR.nnn
FILECHAR.437 file name characters for codepage 437
FILECOLD.850 ditto codepage 850 (old 850 without euro)
FILECHAR.850 ditto codepage 858 (new 850 with euro)
FILECHAR.004 ditto codepage 1004
FILECHAR.W2K ditto codepage 1252 among others on W2K
FILECHAR.REX NT ooREXX script to create FILECHAR.W2K
1. Introduction
You don't need this file (FILECHAR.ABS) to use FILECHAR.CMD.
FILECHAR.CMD is a trivial OS/2 REXX script used to determine
all legal filename characters and their AKAs on HPFS and FAT.
If you are only interested in my results see the five files
FILECHAR.437 (new result after CHCP 437)
FILECHAR.850 (new result after CHCP 850)
FILECOLD.850 (old result after CHCP 850)
FILECHAR.004 (new result after CHCP 1004)
FILECHAR.W2K (see below, result for NTFS)
These files have been created on my system by commands like
CHCP 437 & FILECHAR > FILECHAR.437
CHCP 850 & FILECHAR > FILECHAR.850
The old FILECOLD.850 reflects results before installing the
new "Euro-codepage" (codepage 850 with Euro-symbol hex. D5).
On my WARP 3 system "old" is fixpack 17, and "new" is e.g.
fixpack 40. I never intended to publish FILECHAR.CMD, but
the different results for with vs. without Euro-symbol are
IMHO quite alarming.
2. Configuration
If all legal filename characters depending on file system
(FAT vs. HPFS etc.), codepage (437 vs. 850 etc.), and even
installed fixpack are documented somewhere, then please
tell me where... Until then FILECHAR.CMD works by trial
and error. You have to "configure" FILECHAR.CMD for your
system by editing two lines, replace...
HPFS.. = 'D:\TMP\' /* HPFS directory */
OFAT.. = 'F:\TMP\' /* FAT directory */
... by existing HPFS- and FAT-directories on your system.
You may use root-directories, e.g. OFAT.. = 'C:', or even
other file systems, as long as you have write access and
know how to interpret the results.
Hint: FILECHAR.CMD deletes all created temporary files
---?---$, and this works faster on drives without "DELDIR".
3. Operation
FILECHAR.CMD simply tries to create 255 files ---?---$ in
both directories, where ? is hex. 01 .. hex. FF (255), by
appending the letter ? to ---?---$. For some characters
like # the file ---#---$ finally contains only # on both
FAT and HPFS in codepage 437 or 850.
The file ---Z---$ probably contains Z and z for HPFS and
FAT: OS/2 would treat files zzz, ZZZ, zZz, etc. as the same
file, although HPFS supports mixed case filenames. If you
have write access on a *NIX-filesystem then zzz, ZZZ, and
zZz would be three different files. The most (in)famous
examples are makefile, Makefile, MAKEFILE, etc. ;-)
The file ---E---$ may contain 6 characters in codepage 437:
E, e, é, ê, ë, and è are treated as identical in file names.
Of course FILECHAR.CMD does not only create ---?---$ files,
it also evaluates and eventually deletes these files.
4. Usage
FILECHAR -h usage info (dito -u, -?, etc.)
FILECHAR -- long result lines (upto 255 columns)
FILECHAR short result lines (upto 79 columns)
The short format skips 0 .. 9 (known unique legal characters)
to get less result lines. The long format contains all valid
filename characters, about 164 columns in "new" codepage 850.
The output format should be obvious. Characters in a line
marked by HPFS (or FAT) are valid in HPFS (FAT) filenames.
Characters in a line marked by "aka" are treated as identical
with the character(s) in the same column, notably the next
HPFS- and FAT-line character above it. Short format only:
If there is no aka-line below a HPFS- or FAT-line, then
these characters are legal and unique.
In long format you get exactly one long HPFS-line and one
long FAT-line with as many aka-lines as needed in the worst
case. In the "new" codepage 850 there is only one aka-line,
i.e. at most lower and upper case are treated as identical.
Characters in a line marked by "not" are not supported on
HPFS (or FAT). HPFS does not support "/:<>\| in addition
to anything below hex. 20 (32, space). FAT does not support
"+,./:;<=>[\]|. Often programs have difficulties with the
characters +,.;=[] working on HPFS but not in a FAT.
5. Caveats
FILECHAR.CMD only tests ---?---$. So if characters depend
on the position within a filename, then FILECHAR.CMD cannot
detect it. Examples: leading or trailing spaces generally
don't work, but spaces within a name are okay (even in a FAT,
compare "WP ROOT. SF" etc.). Trailing dots don't work on
HPFS, a leading dot may have a special meaning (*NIX), many
programs treat the last dot as THE DOT, and in a FAT dots are
not supported (except from the implicit 8+3 dot).
For "FAT" read "good old DOS FAT", all I know about FAT32 is
that it exists.
6. W2K and ooREXX
The text above was written 2002. Six years later I repeated
this test on W2K using ooREXX FILECHAR.REX. In essence the
same old script, only renaming FILECHAR.CMD to FILECHAR.REX,
replacing all "HPFS" by "NTFS", and using directories C:\TMP
for the FAT16-tests and D:\TMP for the NTFS-tests.
For the result see FILECHAR.W2K. Good news, apparently the
supported characters do NOT depend on the actual codepage.
In other words the results after CHCP 850 and CHCP 1252 were
identical. But still interesting, read FILECHAR.W2K after a
CHCP 1252 (or 1004 on OS/2), this shows the simple logic:
All windows-1252 letters are treated as case-insensitive, but
on a FAT four are mapped to similar US-ASCII characters. The
four special pairs are umlauted Y, Scaron, Zcaron, and OElig,
i.e. all pairs with "ANSI" letters in the range 0x80 to 0x9F.
Ten non-letters in the range 0x80 to 0x9F are also mapped to
similar US-ASCII characters on a FAT, permille to percent is
an example.
On a FAT 0x85 (hellip, three dots) is an oddity, I'm not sure
how to interpret the result. Creating ---?---$ for ? := 0x85
fails in the SysFileTree() existence test, therefore 0x85 is
noted as "not permitted" on a FAT. But a file with long name
---.---$ was created for the ordinary 0x2E dot, 0x85 ended up
in this ---.---$ file, and counted as alias for 0x2E. Unless
you know what you are doing better stay away from using 0x85
in NT file names on a FAT… :-)
Last update: 26 Sep 2008 12:00 by F.Ellermann