ncdu: Added export file format documentation
This commit is contained in:
parent
62b3ed6ff1
commit
77755d18cc
2 changed files with 230 additions and 2 deletions
224
dat/ncdu-jsonfmt
Normal file
224
dat/ncdu-jsonfmt
Normal file
|
|
@ -0,0 +1,224 @@
|
|||
=pod
|
||||
|
||||
This document describes the file format that ncdu 1.9 uses for its
|
||||
export/import feature (the C<-o> and C<-f> options). Check the L<ncdu
|
||||
manual|http://dev.yorhel.nl/ncdu/man> for a description on how to use that
|
||||
feature.
|
||||
|
||||
=head2 Top-level object
|
||||
|
||||
Ncdu used L<JSON|http://json.org/> notation as its data format. The top-level
|
||||
object is an array:
|
||||
|
||||
[
|
||||
<majorver>,
|
||||
<minorver>,
|
||||
<metadata>,
|
||||
<directory>
|
||||
]
|
||||
|
||||
=head2 Versioning
|
||||
|
||||
The C<< <majorver> >> and C<< <minorver> >> elements indicate the version of the file
|
||||
format. These are numbers with accepted values in the range of
|
||||
C<< 0 <= version <= 10000 >>. Major version must be C<1>, minor version is currently C<0>. The
|
||||
major version should increase if backwards-incompatible changes are made
|
||||
(preferably never), the minor version can be increased to indicate additions to
|
||||
the existing format.
|
||||
|
||||
=head2 Metadata
|
||||
|
||||
The C<< <metadata> >> element is a JSON object holding whatever (short)
|
||||
metadata you'd want. This block is currently (1.9) ignored by ncdu when
|
||||
importing, but it writes out the following keys when exporting:
|
||||
|
||||
=over
|
||||
|
||||
=item progname
|
||||
|
||||
String, name of the program that generated the file, i.e. C<"ncdu">.
|
||||
|
||||
=item progver
|
||||
|
||||
String, version of the program that generated the file, e.g. C<"1.9">.
|
||||
|
||||
=item timestamp
|
||||
|
||||
Number, UNIX timestamp as returned by the POSIX C<time()> function at the time
|
||||
the file was generated. Note that this may not necessarily be equivant to when
|
||||
the directory has been scanned.
|
||||
|
||||
=back
|
||||
|
||||
=head2 Directory Info
|
||||
|
||||
A C<< <directory> >> is represented with a JSON array:
|
||||
|
||||
[
|
||||
<infoblock>,
|
||||
<directory>, <directory>, <infoblock>, ...
|
||||
]
|
||||
|
||||
That is, the first element of the array must be an C<< <infoblock> >>. If the
|
||||
directory is empty, that will be its only element. If it isn't, its
|
||||
subdirectories and files are listed in the remaining elements. Each
|
||||
subdirectory is represented as a C<< <directory> >> array again, and each file
|
||||
is represented as just an C<< <infoblock> >> object.
|
||||
|
||||
=head2 The Info Object
|
||||
|
||||
An C<< <infoblock> >> is a JSON object holding information about a file or
|
||||
directory. The following fields are supported:
|
||||
|
||||
=over
|
||||
|
||||
=item name
|
||||
|
||||
String I<(required)>. Name of the file/dir. For the top-level directory (that
|
||||
is, the C<< <directory> >> item in the top-level JSON array), this should be
|
||||
the full absolute filesystem path, e.g. C<"/media/harddrive">. For any items
|
||||
below the top-level directory, the name should be just the name of the item.
|
||||
|
||||
The name will be in the same encoding as reported by the filesystem (i.e.
|
||||
L<readdir()|http://manned.org/readdir.3>). The name may may not exceed 32768
|
||||
bytes.
|
||||
|
||||
=item asize
|
||||
|
||||
Number. The apparent file size, as reported by C<lstat().st_size>. If absent, 0
|
||||
is assumed. Accepted values are in the range of C<< 0 <= asize < 2^63 >>.
|
||||
|
||||
=item dsize
|
||||
|
||||
Number. Size of the file, as consumed on the disk. This is obtained through
|
||||
C<lstat().st_blocks*S_BLKSIZE>. If absent, 0 is assumed. Accepted values are in
|
||||
the range of C<< 0 <= dsize < 2^63 >>.
|
||||
|
||||
=item dev
|
||||
|
||||
Number. The device ID. Has to be a unique ID within the context of the exported
|
||||
dump, but may not have any meaning outside of that. I.e. this can be a
|
||||
serialization of C<lstat().st_dev>, but also a randomly generated number only
|
||||
used within this file. As long as it uniquely identifies the device/filesystem
|
||||
on which this file is stored. This field may be absent, in which case it is
|
||||
equivalent to that of the parent directory. If this field is absent for the
|
||||
parent directory, a value of 0 is assumed. Accepted values are in the range of
|
||||
C<< 0 <= dev < 2^64 >>.
|
||||
|
||||
=item ino
|
||||
|
||||
Number. Inode number as reported by C<lstat().st_ino>. Together with the Device
|
||||
ID this uniquely identifies a file in this dump. In the case of hard links, two
|
||||
objects may appear with the same (C<dev>,C<ino>) combination. A value of 0 is
|
||||
assumed if this field is absent. This is currently (ncdu 1.9) not a problem as
|
||||
long as the C<hlnkc> field is false, otherwise it will consider everything with
|
||||
the same C<dev> and empty C<ino> values as a single hardlinked file. Accepted
|
||||
values are in the range of C<< 0 <= ino < 2^64 >>.
|
||||
|
||||
=item hlnkc
|
||||
|
||||
Boolean. C<true> if this is a file with C<< lstat().st_nlink > 1 >>. If absent,
|
||||
C<false> is assumed.
|
||||
|
||||
=item read_error
|
||||
|
||||
Boolean. C<true> if something went wrong while reading this entry. I.e. the
|
||||
information in this entry may not be complete. For files, this indicates that
|
||||
the C<lstat()> call failed. For directories, this means that an error occurred
|
||||
while obtaining the file listing, and some items may be missing. Note that if
|
||||
C<lstat()> failed, ncdu has no way of knowing whether an item is a file or a
|
||||
directory, so a file with C<read_error> set might as well be a directory. If
|
||||
absent, C<false> is assumed.
|
||||
|
||||
=item excluded
|
||||
|
||||
String. Set if this file or directory is to be excluded from calculation for
|
||||
some reason. The following values are recognized:
|
||||
|
||||
=over
|
||||
|
||||
=item C<"pattern">
|
||||
|
||||
If the path matched an exclude pattern.
|
||||
|
||||
=item C<"otherfs">
|
||||
|
||||
If the item is on a different device/filesystem.
|
||||
|
||||
=back
|
||||
|
||||
Excluded items may still be included in the export, but only by name. C<size>,
|
||||
C<asize> and other information may be absent. If this item was excluded by a
|
||||
pattern, ncdu will not do an C<lstat()> on it, and may thus report this item as
|
||||
a file even if it is a directory.
|
||||
|
||||
Other values than mentioned above are accepted by ncdu, but are currently
|
||||
interpreted to be equivalent to "pattern". This field should be absent if the
|
||||
item has not been excluded from the calculation.
|
||||
|
||||
=item notreg
|
||||
|
||||
Boolean. This is C<true> if neither S_ISREG() nor S_ISDIR() evaluates to true.
|
||||
I.e. this is a symlink, character device, block device, FIFO, socket, or
|
||||
whatever else your system may support. If absent, C<false> is assumed.
|
||||
|
||||
=back
|
||||
|
||||
=head2 Miscellaneous notes
|
||||
|
||||
As mentioned above, file/directory names are B<not> converted to any specific
|
||||
encoding when exporting. If you want the exported info dump to be valid JSON
|
||||
(and thus valid UTF-8), you'll have to ensure that you have either no non-UTF-8
|
||||
filenames in your filesystem, or you should process the dump through a
|
||||
conversion utility such as C<iconv>. When browsing an imported file with ncdu,
|
||||
you'll usually want to ensure that the filenames are in the same encoding as
|
||||
what your terminal is expecting. The browsing interface may look garbled or
|
||||
otherwise ugly if that's not the case.
|
||||
|
||||
Another important thing to keep in mind is that an export can be fairly large.
|
||||
If you write a program that reads a file in this format and you care about
|
||||
handling directories with several million files, make sure to optimize for
|
||||
that. For example, prefer the use of a stream-based JSON parser over a JSON
|
||||
library that reads the entire file in a single generic data structure, and only
|
||||
keep the minimum amount of data that you care about in memory.
|
||||
|
||||
=head2 Example Export
|
||||
|
||||
Here's a simple example export that displays the basic structure of the format.
|
||||
|
||||
[
|
||||
1,
|
||||
0,
|
||||
{
|
||||
"progname" : "ncdu",
|
||||
"progver" : "1.9",
|
||||
"timestamp" : 1354477149
|
||||
},
|
||||
[
|
||||
{ "name" : "/media/harddrive",
|
||||
"dsize" : 4096,
|
||||
"asize" : 422,
|
||||
"dev" : 39123423,
|
||||
"ino" : 29342345
|
||||
},
|
||||
{ "name" : "SomeFile",
|
||||
"dsize" : 32768,
|
||||
"asize" : 32414,
|
||||
"ino" : 91245479284
|
||||
},
|
||||
[
|
||||
{ "name" : "EmptyDir",
|
||||
"dsize" : 4096,
|
||||
"asize" : 10,
|
||||
"ino" : 3924
|
||||
}
|
||||
]
|
||||
]
|
||||
]
|
||||
|
||||
The directory described above has the following structure:
|
||||
|
||||
/media/harddrive
|
||||
├── SomeFile
|
||||
└── EmptyDir
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue