242 lines
9.5 KiB
Markdown
242 lines
9.5 KiB
Markdown
% Ncdu JSON Export File Format
|
|
|
|
This document describes the file format that ncdu 1.9 and later use for the
|
|
export/import feature (the `-o` and `-f` options). Check the [ncdu
|
|
manual](/ncdu/man) for a description on how to use that feature.
|
|
|
|
## Top-level object
|
|
|
|
Ncdu uses [JSON](http://json.org/) notation as its data format. The top-level
|
|
object is an array:
|
|
|
|
[
|
|
<majorver>,
|
|
<minorver>,
|
|
<metadata>,
|
|
<directory>
|
|
]
|
|
|
|
## Versioning
|
|
|
|
The `<majorver>` and `<minorver>` elements indicate the version of
|
|
the file format. These are numbers with accepted values in the range of
|
|
`0 <= version <= 10000`. Major version must be `1`. Minor version is `0` for
|
|
ncdu 1.9 till 1.12, `1` for ncdu 1.13 till 1.15.2 for the addition of the
|
|
extended mode and `2` since ncdu 1.16 for the addition of the `nlink` field.
|
|
The major version should increase if backwards-incompatible changes are made
|
|
(preferably never), the minor version can be increased to indicate additions to
|
|
the existing format.
|
|
|
|
## Metadata
|
|
|
|
The `<metadata>` element is a JSON object holding whatever (short) metadata
|
|
you'd want. This block is currently (1.9-1.16) ignored by ncdu when
|
|
importing, but it writes out the following keys when exporting:
|
|
|
|
progname
|
|
: String, name of the program that generated the file, i.e. `"ncdu"`.
|
|
|
|
progver
|
|
: String, version of the program that generated the file, e.g. `"1.10"`.
|
|
|
|
timestamp
|
|
: Number, UNIX timestamp as returned by the POSIX `time()` function at the time
|
|
the file was generated. Note that this may not necessarily be equivant to when
|
|
the directory has been scanned.
|
|
|
|
## Directory Info
|
|
|
|
A `<directory>` is represented with a JSON array:
|
|
|
|
[
|
|
<infoblock>,
|
|
<directory>, <directory>, <infoblock>, ...
|
|
]
|
|
|
|
That is, the first element of the array must be an `<infoblock>`. If the
|
|
directory is empty, that will be its only element. If it isn't, its
|
|
subdirectories and files are listed in the remaining elements. Each
|
|
subdirectory is represented as a `<directory>` array again, and each file
|
|
is represented as just an `<infoblock>` object.
|
|
|
|
## The Info Object
|
|
|
|
An `<infoblock>` is a JSON object holding information about a file or
|
|
directory. The following fields are supported:
|
|
|
|
name
|
|
: String _(required)_. Name of the file/dir. For the top-level directory (that
|
|
is, the `<directory>` item in the top-level JSON array), this should be
|
|
the full absolute filesystem path, e.g. `"/media/harddrive"`. For any items
|
|
below the top-level directory, the name should be just the name of the item.
|
|
|
|
The name will be in the same encoding as reported by the filesystem (i.e.
|
|
[readdir()](http://manned.org/readdir.3)). The name may not exceed 32768 bytes.
|
|
|
|
asize
|
|
: Number. The apparent file size, as reported by `lstat().st_size`. If absent, 0
|
|
is assumed. Accepted values are in the range of `0 <= asize < 2^63`.
|
|
|
|
dsize
|
|
: Number. Size of the file, as consumed on the disk. This is obtained through
|
|
`lstat().st_blocks*S_BLKSIZE`. If absent, 0 is assumed. Accepted values are in
|
|
the range of `0 <= dsize < 2^63`.
|
|
|
|
dev
|
|
: Number. The device ID. Has to be a unique ID within the context of the exported
|
|
dump, but may not have any meaning outside of that. I.e. this can be a
|
|
serialization of `lstat().st_dev`, but also a randomly generated number only
|
|
used within this file. As long as it uniquely identifies the device/filesystem
|
|
on which this file is stored. This field may be absent, in which case it is
|
|
equivalent to that of the parent directory. If this field is absent for the
|
|
parent directory, a value of 0 is assumed. Accepted values are in the range of
|
|
`0 <= dev < 2^64`.
|
|
|
|
ino
|
|
: Number. Inode number as reported by `lstat().st_ino`. Together with the Device
|
|
ID this uniquely identifies a file in this dump. In the case of hard links, two
|
|
objects may appear with the same (`dev`,`ino`) combination. As of ncdu
|
|
1.16, this field is only exported if `st_nlink > 1`. A value of 0 is
|
|
assumed if this field is absent, which is fine as long as the `hlnkc` field
|
|
is false and `nlink` is 1, otherwise everything with the same `dev` and
|
|
empty `ino` values will be considered as a single hardlinked file.
|
|
Accepted values are in the range of `0 <= ino < 2^64`.
|
|
|
|
hlnkc
|
|
: Boolean. `true` if this is a file with `lstat().st_nlink > 1`. This field
|
|
redundant if the `nlink` field is also set, but is still included in new
|
|
dumps for backwards compatibility with ncdu versions prior to 1.16. If both
|
|
this and the `nlink` fields are absent, `false` is assumed.
|
|
|
|
read\_error
|
|
: Boolean. `true` if something went wrong while reading this entry. I.e. the
|
|
information in this entry may not be complete. For files, this indicates that
|
|
the `lstat()` call failed. For directories, this means that an error occurred
|
|
while obtaining the file listing, and some items may be missing. Note that if
|
|
`lstat()` failed, ncdu has no way of knowing whether an item is a file or a
|
|
directory, so a file with `read_error` set might as well be a directory. If
|
|
absent, `false` is assumed.
|
|
|
|
excluded
|
|
: String. Set if this file or directory is to be excluded from calculation for
|
|
some reason. The following values are recognized:
|
|
|
|
`"pattern"`
|
|
: If the path matched an exclude pattern.
|
|
|
|
`"otherfs"` or `"othfs"`
|
|
: If the item is on a different device/filesystem. Every version of ncdu
|
|
versions recognizes `"otherfs"` when importing, but versions 1.20 or
|
|
2.4 and earlier wrote `"othfs"` when exporting. Later versions
|
|
recognize both strings and output `"otherfs"`.
|
|
|
|
`"kernfs"`
|
|
: If the item has been excluded with `--exclude-kernfs` (since ncdu 1.15).
|
|
|
|
`"frmlink"`
|
|
: If the item is a firmlink and hasn't been followed with
|
|
`--follow-firmlinks` (since ncdu 1.15).
|
|
|
|
Excluded items may still be included in the export, but only by name. `size`,
|
|
`asize` and other information may be absent. If this item was excluded by a
|
|
pattern, ncdu will not do an `lstat()` on it, and may thus report this item as
|
|
a file even if it is a directory.
|
|
|
|
Other values than mentioned above are accepted by ncdu, but are currently
|
|
interpreted to be equivalent to "pattern". This field should be absent if the
|
|
item has not been excluded from the calculation.
|
|
|
|
nlink
|
|
: (since ncdu 1.16) Number, the value of `lstat().st_nlink`. If this field is
|
|
present and has a value larger than 1, this file is considered for hardlink
|
|
counting. Accepted values are in the range `1 <= nlink < 2^32`. If absent,
|
|
`1` is assumed.
|
|
|
|
notreg
|
|
: Boolean. This is `true` if neither S\_ISREG() nor S\_ISDIR() evaluates to true.
|
|
I.e. this is a symlink, character device, block device, FIFO, socket, or
|
|
whatever else your system may support. If absent, `false` is assumed.
|
|
|
|
### Extended information
|
|
|
|
In addition, the following fields are exported when _extended information_ mode
|
|
is enabled (available since ncdu 1.13). See the `-e` flag in
|
|
[ncdu(1)](/ncdu/man) for details.
|
|
|
|
uid
|
|
: Number, user ID who owns the file. Accepted values are in the range
|
|
`0 <= uid < 2^31`.
|
|
|
|
gid
|
|
: Number, group ID who owns the file. Accepted values are in the range
|
|
`0 <= uid < 2^31`.
|
|
|
|
mode
|
|
: Number, the raw file mode as returned by
|
|
[lstat(3)](https://manned.org/lstat.3). For Linux systems, see
|
|
[inode(7)](https://manned.org/inode.7) for the interpretation of this
|
|
field. Accepted range: `0 <= mode < 2^16`.
|
|
|
|
mtime
|
|
: Number, last modification time as a UNIX timestamp. Accepted range:
|
|
`0 <= mtime < 2^64`. As of ncdu 1.16, this number may also include an
|
|
(infinite precision) decimal part for fractional seconds, though the
|
|
decimal part is (currently) discarded during import.
|
|
|
|
## Miscellaneous notes
|
|
|
|
As mentioned above, file/directory names are **not** converted to any specific
|
|
encoding when exporting. If you want the exported info dump to be valid JSON
|
|
(and thus valid UTF-8), you'll have to ensure that you have either no non-UTF-8
|
|
filenames in your filesystem, or you should process the dump through a
|
|
conversion utility such as `iconv`. When browsing an imported file with ncdu,
|
|
you'll usually want to ensure that the filenames are in the same encoding as
|
|
what your terminal is expecting. The browsing interface may look garbled or
|
|
otherwise ugly if that's not the case.
|
|
|
|
Another important thing to keep in mind is that an export can be fairly large.
|
|
If you write a program that reads a file in this format and you care about
|
|
handling directories with several million files, make sure to optimize for
|
|
that. For example, prefer the use of a stream-based JSON parser over a JSON
|
|
library that reads the entire file in a single generic data structure, and only
|
|
keep the minimum amount of data that you care about in memory.
|
|
|
|
## Example Export
|
|
|
|
Here's a simple example export that displays the basic structure of the format.
|
|
|
|
[
|
|
1,
|
|
0,
|
|
{
|
|
"progname" : "ncdu",
|
|
"progver" : "1.9",
|
|
"timestamp" : 1354477149
|
|
},
|
|
[
|
|
{ "name" : "/media/harddrive",
|
|
"dsize" : 4096,
|
|
"asize" : 422,
|
|
"dev" : 39123423,
|
|
"ino" : 29342345
|
|
},
|
|
{ "name" : "SomeFile",
|
|
"dsize" : 32768,
|
|
"asize" : 32414,
|
|
"ino" : 91245479284
|
|
},
|
|
[
|
|
{ "name" : "EmptyDir",
|
|
"dsize" : 4096,
|
|
"asize" : 10,
|
|
"ino" : 3924
|
|
}
|
|
]
|
|
]
|
|
]
|
|
|
|
The directory described above has the following structure:
|
|
|
|
/media/harddrive
|
|
├── SomeFile
|
|
└── EmptyDir
|