Home Explore Blog CI



man-pages

22th chunk of `less.man`
bd3a876241320be630ec9a4b353a7891d1f4b632a79572c80000000100000fc2
 Selects the UTF‐8 encoding of the ISO 10646 character set.  UTF‐8 is special in that it supports multi‐byte characters in the input file.  It is the only character set that supports multi‐byte characters.

       windows
              Selects a character set appropriate for Microsoft Windows (cp 1251).

       In rare cases, it may be desired to tailor less to use a character set other than the ones definable by LESSCHARSET.  In this case, the environment variable LESSCHARDEF can be used  to  define  a  character  set.   It
       should be set to a string where each character in the string represents one character in the character set.  The character "." is used for a normal character, "c" for control, and "b" for binary.  A decimal number may
       be used for repetition.  For example, "bccc4b." would mean character 0 is binary, 1, 2 and 3 are control, 4, 5, 6 and 7 are binary, and 8 is normal.  All characters after the last are taken to be the same as the last,
       so characters 9 through 255 would be normal.  (This is an example, and does not necessarily represent any real character set.)

       This table shows the value of LESSCHARDEF which is equivalent to each of the possible values for LESSCHARSET:
            ascii      8bcccbcc18b95.b
            dos        8bcccbcc12bc5b95.b.
            ebcdic     5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b
                       9.8b8.17b3.3b9.7b9.8b8.6b10.b.b.b.
            IBM‐1047   4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc
                       191.b
            iso8859    8bcccbcc18b95.33b.
            koi8‐r     8bcccbcc18b95.b128.
            latin1     8bcccbcc18b95.33b.
            next       8bcccbcc18b95.bb125.bb

       If neither LESSCHARSET nor LESSCHARDEF is set, but any of the strings "UTF‐8", "UTF8", "utf‐8" or "utf8" is found in the LC_ALL, LC_CTYPE or LANG environment variables, then the default character set is utf‐8.

       If that string is not found, but your system supports the setlocale interface, less will use setlocale to determine the character set.  setlocale is controlled by setting the LANG or LC_CTYPE environment variables.

       Finally, if the setlocale interface is also not available, the default character set is latin1.

       Control  and binary characters are displayed in standout (reverse video).  Each such character is displayed in caret notation if possible (e.g. ^A for control‐A).  Caret notation is used only if inverting the 0100 bit
       results in a normal printable character.  Otherwise, the character is displayed as a hex number in angle brackets.  This format can be changed by setting the LESSBINFMT environment variable.  LESSBINFMT may begin with
       a "*" and one character to select the display attribute: "*k" is blinking, "*d" is bold, "*u" is underlined, "*s" is standout, and "*n" is normal.  If LESSBINFMT does not begin with a "*", normal attribute is assumed.
       The remainder of LESSBINFMT is a string which may include one printf‐style escape sequence (a % followed by x, X, o, d, etc.).  For example, if LESSBINFMT is "*u[%x]", binary characters  are  displayed  in  underlined
       hexadecimal surrounded by brackets.  The default if no LESSBINFMT is specified is "*s<%02X>".  Warning: the result of expanding the character via LESSBINFMT must be less than 31 characters.

       When  the  character  set  is utf‐8, the LESSUTFBINFMT environment variable acts similarly to LESSBINFMT but it applies to Unicode code points that were successfully decoded but are unsuitable for display (e.g., unas‐
       signed code points).  Its default value is "<U+%04lX>".  Note that LESSUTFBINFMT and LESSBINFMT share their display attribute setting ("*x") so specifying one will affect both; LESSUTFBINFMT is read  after  LESSBINFMT
       so its setting, if any, will have priority.  Problematic octets in a UTF‐8 file (octets of a truncated sequence, octets of a complete but non‐shortest form sequence, invalid octets, and stray trailing

Title: Less: Character Set Customization and Control/Binary Character Display
Summary
This section details how to customize character sets in `less` using the `LESSCHARDEF` environment variable, providing examples of its usage to mimic predefined character sets like ascii, dos, ebcdic, etc. It outlines the fallback mechanism for determining the character set if neither `LESSCHARSET` nor `LESSCHARDEF` are set, checking for UTF-8 in environment variables and utilizing the `setlocale` interface if available, defaulting to latin1 otherwise. It explains how control and binary characters are displayed, including the use of caret notation or hexadecimal representation, and how the `LESSBINFMT` environment variable can customize the display format of these characters. Furthermore, it describes the `LESSUTFBINFMT` variable, which applies specifically to unprintable Unicode code points in UTF-8, sharing display attribute settings with `LESSBINFMT`.