yhdev/dat/ncdu/jsonfmt.md

% Ncdu JSON Export File Format

This document describes the file format that ncdu 1.9 and later use for the
export/import feature (the `-o` and `-f` options). Check the [ncdu
manual](/ncdu/man) for a description on how to use that feature.

## Top-level object

Ncdu uses [JSON](http://json.org/) notation as its data format. The top-level
object is an array:

    [
      <majorver>,
      <minorver>,
      <metadata>,
      <directory>
    ]

## Versioning

The `<majorver>` and `<minorver>` elements indicate the version of
the file format. These are numbers with accepted values in the range of
`0 <= version <= 10000`. Major version must be `1`. Minor version is `0` for
ncdu 1.9 till 1.12, `1` for ncdu 1.13 till 1.15.2 for the addition of the
extended mode and `2` since ncdu 1.16 for the addition of the `nlink` field.
The major version should increase if backwards-incompatible changes are made
(preferably never), the minor version can be increased to indicate additions to
the existing format.

## Metadata

The `<metadata>` element is a JSON object holding whatever (short) metadata
you'd want. This block is currently (1.9-1.16) ignored by ncdu when
importing, but it writes out the following keys when exporting:

progname
:   String, name of the program that generated the file, i.e. `"ncdu"`.

progver
:   String, version of the program that generated the file, e.g. `"1.10"`.

timestamp
:   Number, UNIX timestamp as returned by the POSIX `time()` function at the time
    the file was generated. Note that this may not necessarily be equivant to when
    the directory has been scanned.

## Directory Info

A `<directory>` is represented with a JSON array:

    [
      <infoblock>,
      <directory>, <directory>, <infoblock>, ...
    ]

That is, the first element of the array must be an `<infoblock>`. If the
directory is empty, that will be its only element. If it isn't, its
subdirectories and files are listed in the remaining elements. Each
subdirectory is represented as a `<directory>` array again, and each file
is represented as just an `<infoblock>` object.

## The Info Object

An `<infoblock>` is a JSON object holding information about a file or
directory.  The following fields are supported:

name
:   String _(required)_. Name of the file/dir. For the top-level directory (that
    is, the `<directory>` item in the top-level JSON array), this should be
    the full absolute filesystem path, e.g. `"/media/harddrive"`. For any items
    below the top-level directory, the name should be just the name of the item.

    The name will be in the same encoding as reported by the filesystem (i.e.
    [readdir()](http://manned.org/readdir.3)). The name may not exceed 32768 bytes.

asize
:   Number. The apparent file size, as reported by `lstat().st_size`. If absent, 0
    is assumed. Accepted values are in the range of `0 <= asize < 2^63`.

dsize
:   Number. Size of the file, as consumed on the disk. This is obtained through
    `lstat().st_blocks*S_BLKSIZE`. If absent, 0 is assumed. Accepted values are in
    the range of `0 <= dsize < 2^63`.

dev
:   Number. The device ID. Has to be a unique ID within the context of the exported
    dump, but may not have any meaning outside of that.  I.e. this can be a
    serialization of `lstat().st_dev`, but also a randomly generated number only
    used within this file. As long as it uniquely identifies the device/filesystem
    on which this file is stored.  This field may be absent, in which case it is
    equivalent to that of the parent directory. If this field is absent for the
    parent directory, a value of 0 is assumed. Accepted values are in the range of
    `0 <= dev < 2^64`.

ino
:   Number. Inode number as reported by `lstat().st_ino`. Together with the Device
    ID this uniquely identifies a file in this dump. In the case of hard links, two
    objects may appear with the same (`dev`,`ino`) combination. As of ncdu
    1.16, this field is only exported if `st_nlink > 1`. A value of 0 is
    assumed if this field is absent, which is fine as long as the `hlnkc` field
    is false and `nlink` is 1, otherwise everything with the same `dev` and
    empty `ino` values will be considered as a single hardlinked file.
    Accepted values are in the range of `0 <= ino < 2^64`.

hlnkc
:   Boolean. `true` if this is a file with `lstat().st_nlink > 1`. This field
    redundant if the `nlink` field is also set, but is still included in new
    dumps for backwards compatibility with ncdu versions prior to 1.16. If both
    this and the `nlink` fields are absent, `false` is assumed.

read\_error
:   Boolean. `true` if something went wrong while reading this entry. I.e. the
    information in this entry may not be complete. For files, this indicates that
    the `lstat()` call failed. For directories, this means that an error occurred
    while obtaining the file listing, and some items may be missing. Note that if
    `lstat()` failed, ncdu has no way of knowing whether an item is a file or a
    directory, so a file with `read_error` set might as well be a directory. If
    absent, `false` is assumed.

excluded
:   String. Set if this file or directory is to be excluded from calculation for
    some reason. The following values are recognized:

    `"pattern"`
    :   If the path matched an exclude pattern.

    `"otherfs"` or `"othfs"`
    :   If the item is on a different device/filesystem. Every version of ncdu
        versions recognizes `"otherfs"` when importing, but versions 1.20 or
        2.4 and earlier wrote `"othfs"` when exporting. Later versions
        recognize both strings and output `"otherfs"`.

    `"kernfs"`
    :   If the item has been excluded with `--exclude-kernfs` (since ncdu 1.15).

    `"frmlink"`
    :   If the item is a firmlink and hasn't been followed with
        `--follow-firmlinks` (since ncdu 1.15).

    Excluded items may still be included in the export, but only by name. `size`,
    `asize` and other information may be absent. If this item was excluded by a
    pattern, ncdu will not do an `lstat()` on it, and may thus report this item as
    a file even if it is a directory.

    Other values than mentioned above are accepted by ncdu, but are currently
    interpreted to be equivalent to "pattern". This field should be absent if the
    item has not been excluded from the calculation.

nlink
:   (since ncdu 1.16) Number, the value of `lstat().st_nlink`. If this field is
    present and has a value larger than 1, this file is considered for hardlink
    counting.  Accepted values are in the range `1 <= nlink < 2^32`. If absent,
    `1` is assumed.

notreg
:   Boolean. This is `true` if neither S\_ISREG() nor S\_ISDIR() evaluates to true.
    I.e. this is a symlink, character device, block device, FIFO, socket, or
    whatever else your system may support. If absent, `false` is assumed.

### Extended information

In addition, the following fields are exported when _extended information_ mode
is enabled (available since ncdu 1.13). See the `-e` flag in
[ncdu(1)](/ncdu/man) for details.

uid
:   Number, user ID who owns the file. Accepted values are in the range
    `0 <= uid < 2^31`.

gid
:   Number, group ID who owns the file. Accepted values are in the range
    `0 <= uid < 2^31`.

mode
:   Number, the raw file mode as returned by
    [lstat(3)](https://manned.org/lstat.3). For Linux systems, see
    [inode(7)](https://manned.org/inode.7) for the interpretation of this
    field.  Accepted range: `0 <= mode < 2^16`.

mtime
:   Number, last modification time as a UNIX timestamp. Accepted range:
    `0 <= mtime < 2^64`. As of ncdu 1.16, this number may also include an
    (infinite precision) decimal part for fractional seconds, though the
    decimal part is (currently) discarded during import.

## Miscellaneous notes

As mentioned above, file/directory names are **not** converted to any specific
encoding when exporting. If you want the exported info dump to be valid JSON
(and thus valid UTF-8), you'll have to ensure that you have either no non-UTF-8
filenames in your filesystem, or you should process the dump through a
conversion utility such as `iconv`. When browsing an imported file with ncdu,
you'll usually want to ensure that the filenames are in the same encoding as
what your terminal is expecting. The browsing interface may look garbled or
otherwise ugly if that's not the case.

Another important thing to keep in mind is that an export can be fairly large.
If you write a program that reads a file in this format and you care about
handling directories with several million files, make sure to optimize for
that. For example, prefer the use of a stream-based JSON parser over a JSON
library that reads the entire file in a single generic data structure, and only
keep the minimum amount of data that you care about in memory.

## Example Export

Here's a simple example export that displays the basic structure of the format.

    [
      1,
      0,
      {
        "progname"  : "ncdu",
        "progver"   : "1.9",
        "timestamp" : 1354477149
      },
      [
        { "name"   : "/media/harddrive",
          "dsize"  : 4096,
          "asize"  : 422,
          "dev"    : 39123423,
          "ino"    : 29342345
        },
        { "name"   : "SomeFile",
          "dsize"  : 32768,
          "asize"  : 32414,
          "ino"    : 91245479284
        },
        [
          { "name"   : "EmptyDir",
            "dsize"  : 4096,
            "asize"  : 10,
            "ino"    : 3924
          }
        ]
      ]
    ]

The directory described above has the following structure:

    /media/harddrive
    ├── SomeFile
    └── EmptyDir