diff --git a/.gitignore b/.gitignore index b95af43..2c5acd9 100644 --- a/.gitignore +++ b/.gitignore @@ -70,7 +70,9 @@ dat/ncdu/man/2_2.pod dat/ncdu/man/2_3.md dat/ncdu/man/2_3.pod dat/ncdu/man/2_4.md -dat/ncdu/man/2_4.pod +dat/ncdu/man/2_4.mdoc +dat/ncdu/man/2_5.md +dat/ncdu/man/2_5.mdoc dat/nginx-confgen/changes.log dat/nginx-confgen/changes.md dat/nginx-confgen/man.md @@ -150,6 +152,7 @@ pub/ncdc/install.html pub/ncdc/man.html pub/ncdc/scr.html pub/ncdu.html +pub/ncdu/binfmt.html pub/ncdu/changes.html pub/ncdu/changes2.html pub/ncdu/feed.atom @@ -181,6 +184,7 @@ pub/ncdu/man/2_1.html pub/ncdu/man/2_2.html pub/ncdu/man/2_3.html pub/ncdu/man/2_4.html +pub/ncdu/man/2_5.html pub/ncdu/scr.html pub/nginx-confgen.html pub/nginx-confgen/changes.html diff --git a/Makefile b/Makefile index 8283044..5d054a9 100644 --- a/Makefile +++ b/Makefile @@ -51,9 +51,11 @@ PAGES=\ "ncdu.md"\ "ncdu/changes.log https://g.blicky.net/ncdu.git/plain/ChangeLog?h=master Ncdu 1.x Release History"\ "ncdu/changes2.log https://g.blicky.net/ncdu.git/plain/ChangeLog?h=zig Ncdu 2.x Release History"\ + "ncdu/binfmt.md"\ "ncdu/jsonfmt.md"\ - "ncdu/man.mdoc https://g.blicky.net/ncdu.git/plain/ncdu.1?h=v2.5 Ncdu 2.5 Manual"\ - "ncdu/man/2_4.pod https://g.blicky.net/ncdu.git/plain/ncdu.pod?h=v2.4 Ncdu 2.4 Manual"\ + "ncdu/man.mdoc https://g.blicky.net/ncdu.git/plain/ncdu.1?h=v2.6 Ncdu 2.6 Manual"\ + "ncdu/man/2_5.mdoc https://g.blicky.net/ncdu.git/plain/ncdu.1?h=v2.5 Ncdu 2.5 Manual"\ + "ncdu/man/2_4.mdoc https://g.blicky.net/ncdu.git/plain/ncdu.1?h=v2.4 Ncdu 2.4 Manual"\ "ncdu/man/2_3.pod https://g.blicky.net/ncdu.git/plain/ncdu.pod?h=v2.3 Ncdu 2.3 Manual"\ "ncdu/man/2_2.pod https://g.blicky.net/ncdu.git/plain/ncdu.pod?h=v2.2 Ncdu 2.2 Manual"\ "ncdu/man/2_1.pod https://g.blicky.net/ncdu.git/plain/ncdu.pod?h=v2.1 Ncdu 2.1 Manual"\ diff --git a/dat/guestbook.md b/dat/guestbook.md index 20a30e8..0aeebfc 100644 --- a/dat/guestbook.md +++ b/dat/guestbook.md @@ -17,6 +17,12 @@ respective issue tracker or send a mail to # Entries +`2024-08-17` - Young Lee +: hey, thanks for the nice app! + +`2024-08-08` - raksO +: Thanks! It is really helpful! ❤️ + `2024-07-12` - DIANA PUNKY : ncdu rocks! diff --git a/dat/index.md b/dat/index.md index f1b4210..72ac586 100644 --- a/dat/index.md +++ b/dat/index.md @@ -10,9 +10,14 @@ crap I've written over the years. :) ## Announcements Atom feed -`2024-07-24` - 2.5 released +`2024-09-27` - ncdu 2.6 released +: Adds a new binary export format that better works with parallel scanning, + offers built-in compression and supports browsing directory trees that are + too large to fit in memory. [Homepage](/ncdu) - [Changelog](/ncdu/changes2). + +`2024-07-24` - ncdu 2.5 released : Adds support for parallel scanning, improves import/export performance and - fixes a number of bugs. [Ncdu homepage](/ncdu) - [Changelog](/ncdu/changes). + fixes a number of bugs. [Homepage](/ncdu) - [Changelog](/ncdu/changes2). `2024-07-18` - ncdc 1.24.1 released : Just fixes a build error. [Homepage](/ncdc) - [Changelog](/ncdc/changes). diff --git a/dat/ncdu.md b/dat/ncdu.md index 9ecd669..e06c7d4 100644 --- a/dat/ncdu.md +++ b/dat/ncdu.md @@ -1,34 +1,49 @@ % NCurses Disk Usage -Ncdu is a disk usage analyzer with an ncurses interface. It is designed to find -space hogs on a remote server where you don't have an entire graphical setup -available, but it is a useful tool even on regular desktop systems. Ncdu aims -to be fast, simple and easy to use, and should be able to run in any minimal -POSIX-like environment with ncurses installed. +Ncdu is a disk usage analyzer with a text-mode user interface. It is designed +to find space hogs on a remote server where you don't have an entire graphical +setup available, but it is a useful tool even on regular desktop systems. Ncdu +aims to be fast, simple, easy to use, and should be able to run on any +POSIX-like system. -**NEWS FLASH!** Ncdu 2.5 adds support for parallel scanning, but it's not -(yet?) enabled by default. To give it a try, run with `-t8` to scan with 8 -threads. If you're running an unusual setup, such as networked storage, odd -filesystems, complex RAID configurations, etc, I'd love to hear about the -performance impact of this new feature. Feedback is welcome on the [issue -tracker](https://code.blicky.net/yorhel/ncdu/issues) or through mail @ -[projects@yorhel.nl](mailto:projects@yorhel.nl). -
-If you want to run benchmarks, `-0 --quit-after-scan` can be useful to disable -the browser interface, or run with `-0o/dev/null` to benchmark JSON export. +## Notable updates + +Parallel scanning +: Ncdu 2.5 adds support for parallel scanning, but it's not enabled by + default. To give it a try, run with `-t8` to scan with 8 threads. If you're + running an unusual setup, such as networked storage, odd filesystems, + complex RAID configurations, etc, I'd love to hear about the performance + impact of this new feature. Feedback is welcome on the [issue + tracker](https://code.blicky.net/yorhel/ncdu/issues) or to + [projects@yorhel.nl](mailto:projects@yorhel.nl).[^1] + +Binary export +: Ncdu 2.6 adds a new binary export format that better works with parallel + scanning, offers built-in compression and supports browsing directory + trees that are too large to fit in memory. To give it a try, use the `-O` + flag instead of `-o`. + +Colors +: Ncdu has had color support since version 1.13. Colors were enabled by + default in 1.17 and 2.0, and then later disabled again in 1.20 and 2.4 + because the text was not legible in all terminal configurations. + + If you do prefer the colors, add `--color=dark` to your [config + file](/ncdu/man#configuration). Maybe at some point in the future we'll + have colors that *are* readable in every setup. ## Download Atom feed Static binaries : Convenient static binaries for Linux. Download, extract and run; no compilation or installation necessary: - [x86](/download/ncdu-2.5-linux-x86.tar.gz) - - [x86_64](/download/ncdu-2.5-linux-x86_64.tar.gz) - - [ARM](/download/ncdu-2.5-linux-arm.tar.gz) - - [AArch64](/download/ncdu-2.5-linux-aarch64.tar.gz). + [x86](/download/ncdu-2.6-linux-x86.tar.gz) - + [x86_64](/download/ncdu-2.6-linux-x86_64.tar.gz) - + [ARM](/download/ncdu-2.6-linux-arm.tar.gz) - + [AArch64](/download/ncdu-2.6-linux-aarch64.tar.gz). Zig version (stable) -: 2.5 (2024-07-24 - [ncdu-2.5.tar.gz](/download/ncdu-2.5.tar.gz) - [changes](/ncdu/changes2)) +: 2.6 (2024-09-27 - [ncdu-2.6.tar.gz](/download/ncdu-2.6.tar.gz) - [changes](/ncdu/changes2)) Requires Zig 0.12 or 0.13. @@ -106,3 +121,8 @@ There's no shortage of alternatives to ncdu nowadays. In no particular order: - [K4DirStat](https://github.com/jeromerobert/k4dirstat) - Qt, treemap. - [xdiskusage](http://xdiskusage.sourceforge.net/) - FLTK, with a treemap display. - [fsv](http://fsv.sourceforge.net/) - 3D visualization. + + +[^1]: If you want to run benchmarks, `-0 --quit-after-scan` can be useful to + disable the browser interface, or run with `-0o/dev/null` to benchmark JSON + export. diff --git a/dat/ncdu/binfmt.md b/dat/ncdu/binfmt.md new file mode 100644 index 0000000..9964728 --- /dev/null +++ b/dat/ncdu/binfmt.md @@ -0,0 +1,364 @@ +% Ncdu Binary Export File Format + +This document describes the new binary file format added in ncdu 2.6. This +format offers the following advantages compared to the [JSON export file +format](/ncdu/jsonfmt): + +- Support for exporting data from a multithreaded filesystem scan with minimal + thread-local buffering and minimal synchronisation between threads. +- Support for reading the directory tree in depth-first, breath-first and mixed + iteration order, thus permitting interactive browsing through the tree + without reading the entire file. +- Cumulative directory sizes are included in the exported data, allowing + readers to display this data without walking through the entire tree. +- Built-in support for compression. + +These features come at the cost of increased complexity. The JSON format is +generally easier to work with and therefore still the recommended approach for +external tooling to interact with ncdu's export/import functionality. + +A binary export can be created with the `-O` option to ncdu. It is also +possible to convert to and from the JSON format: + +``` +ncdu -O export.ncdu / # Scan root, write to 'export.ncdu' +ncdu -f in.json -O out.ncdu # Convert from JSON to binary +ncdu -f in.ncdu -o out.json # Convert from binary to JSON +``` + +# Format description + +## File signature + +An exported file starts with the following file signature (in hex): + +``` +bf 6e 63 64 75 45 58 31 +``` + +Formatted as a C string, that is `"\xbfncduEX1"`. + +Non-backwards compatible changes to the export format should use a different +file signature. N.B. A different compression algorithm is a non-backwards +compatible change. + +## Block format + +The file signature is followed by one or more *blocks*. A block has the +following format: + +------- ------- --------------- +TypeLen 4 bytes Big-endian type + length of this block +Content n bytes n = Length - 8 +TypeLen 4 bytes Repeat of TypeLen +------- ------- --------------- + +The high 4 bits of the *TypeLen* indicate the block type, the lower 28 bits +encode the length of the block, including the header and footer. + +The *TypeLen* is repeated at the end of the block to allow for reading the file +in both forwards and backwards direction. + +The block type determines how the *Content* should be interpreted. There are +currently two block types: + +Type Meaning +---- -- + 0 Data block + 1 Index block +---- -- + +Parsers should ignore blocks with an unknown type. + +A valid file must have at least one data block and exactly one index block. The +index block must be the last block in the file. + +## Data blocks + +Data blocks have the following contents: + +---------------- ------- ------ +Number 4 bytes Big-endian unsigned block number +Compressed\_data n bytes +---------------- ------- ------ + +Every data block must have a unique number, starting from zero and ideally (but +not necessarily) allocated without gaps. Data blocks may appear in a different +order than their numbering. + +Data is compressed with [Zstandard](http://www.zstd.net/). Data must be +compressed in a single frame and the uncompressed size must be available +through `ZSTD_getFrameContentSize()`, so that readers can pre-allocate a +properly-sized buffer for decompression. + +The total length of a data block, including block header and footer, must not +exceed 16 MiB minus one byte. The total size of the decompressed data must also +not exceed 16 MiB minus one byte. + +The decompressed data consists of a stream of one or more *Items* (see below). + +## Index block + +The index block provides a lookup table for data blocks and a reference to the +root item: + +--------------- ---------- +Block\_pointers n\*8 bytes +Root\_itemref 8 bytes +--------------- ---------- + +*Block\_pointers* is an array containing an 8-byte pointer for each data block +in the file. Pointers are indexed by block number, so the first pointer is for +block number 0, the second pointer for block number 1, etc. Each pointer is +interpreted as a 64bit big-endian unsigned integer. The higher 40 bits indicate +the byte offset of the data block header, relative to the start of the file. +The lower 24 bits indicate the block length and must be equivalent to the +length in the *TypeLen* of the corresponding data block. An all-zero value +indicates that there is no block with this number in the file. + +The last 8 bytes of the index block represent an unsigned big-endian integer +that refers to the root item of the directory tree. See *Itemref* below. + +## Itemref + +An *Itemref* encodes a reference to an *Item*, there are two types: + +Absolute +: An absolute *Itemref* is a 64bit unsigned integer that encodes a block + number in the higher 40 bits and a byte offset of the start of the item + within the block in the lower 24 bits. Every item in the file has exactly + one absolute *Itemref* value. The *Root\_itemref* in the index block must + be absolute. + +Relative +: A relative *Itemref* is a negative integer that represents the byte offset + of the referenced item relative to the start of the item containing the + reference. Relative references can only reference a previously written item + within the same block. + +## Item + +An *Item* represents a file or directory entry, encoded as a +[CBOR](https://cbor.io/) map. Key/value pairs may be encoded in any order and +unknown keys are ignored. Summary of keys recognized by ncdu: + + Key Field Value +---- -------- -------- + 0 type i32 + 1 name String + 2 prev Itemref + 3 asize u64 + 4 dsize u64 + 5 dev u64 + 6 rderr bool + 7 cumasize u64 + 8 cumdsize u64 + 9 shrasize u64 + 10 shrdsize u64 + 11 items u64 + 12 sub Itemref + 13 ino u64 + 14 nlink u32 + 15 uid u32 + 16 gid u32 + 17 mode u16 + 18 mtime u64 +---- -------- -------- + +**Common fields for all items** + +type +: Mandatory. A negative value indicates that the item that has been excluded + from the size calculations for some reason, positive values are used for + different item types: + + --- -- + -4 Excluded with `--exclude-kernfs` + -3 Excluded with `-x` + -2 Excluded by pattern match + -1 Error while reading this entry + 0 Directory + 1 Regular file + 2 Non-regular file (symlink, device, etc) + 3 Hardlink candidate (i.e. stat().st_nlink > 1) + --- -- + + Unrecognized negative values are treated as equivalent to -2, unrecognized + positive values are treated as a non-regular file (type=2). + +name +: Mandatory. Ncdu always encodes the name as a byte string, but also accepts + UTF-8 text strings. Ncdu does not support indefinite-length CBOR strings, + the name must be encoded with a known length. + +prev +: Reference to the previous item in the same directory. This field must be + absent if this is the first item in a directory. This field forms a + singly-linked list of all items in a directory. + +**Fields for type >= 0** + +asize +: Apparent size of this file/directory as reported by `stat().st_size`. + Optional, defaults to 0. + +dsize +: Disk usage of this file/directory as reported by `stat().st_blocks` + multiplied by the block size. Optional, defaults to 0. + +**Fields for type = 0** + +dev +: Device number. Optional, defaults to the same device number as the parent + directory, or 0 of this is the root item. + +rderr +: Whether an error occurred while reading this directory. When *true*, an + error occurred while reading the directory list itself and the list may + therefore be incomplete. When *false*, an error occurred while reading a + child item. This implies that somewhere in this sub-tree there must be at + least one item of `type=-1` or a directory with `rderr=true`. + +cumasize +: Cumulative apparent size of this directory. Optional, defaults to 0. + +cumdsize +: Cumulative disk usage of this directory. Optional, defaults to 0. + +shrasize +: Shared apparent size. Optional, defaults to 0. + +shrdsize +: Shared disk usage. Optional, defaults to 0. + +items +: Cumulative number of items in this directory. Ncdu currently caps this + number to `2^32-1` when reading, but supports larger numbers when + exporting. Optional, defaults to 0. + +sub +: Reference to the last item in this directory, or absent if the directory is + empty. + +**Fields for type=3** + +ino +: Inode number. + +nlink +: Number of links to this inode. + +**Extended information** + +These fields are only exported when the `-e` flag is passed to ncdu. They are +relevant to all items with type >= 0. + +uid +: User id. + +gid +: Group id. + +mode +: File mode. + +mtime +: Last modification time as a UNIX timestamp. + +# Limitations + +Compressed data block size +: 16 MiB minus 1 byte. This limit comes from *Block\_pointers* in the index + block using 24 bits to encode the block length. + +Uncompressed data block size +: 16 MiB minus 1 byte. This limit comes from *Itemref* encoding item offset + in 24 bits. + +Largest data block number +: 33,554,428. The size of the index block is limited by the 28-bit length in + the block's *TypeLen* header, which limits the number of *Block\_pointers* + it can hold to `((2^28 - 1) - 16) / 8` (subtract one to get the maximum + block number because counting starts at 0). + +Compressed data size +: Excluding block overhead, the total amount of compressed data is limited to + about 1 TiB. This is limited by *Block\_pointers* using 40 bits to encode + the data block offset within the file. + +Uncompressed data size +: Limited by either the maximum number of data blocks or the compressed data + size, depending on compression ratio and the chosen data block size. + Assuming the number of data blocks is the limit, about 512 TiB of + uncompressed data can be stored with the maximum data block size of 16 MiB. + Ncdu's adaptive block size selection has a limit of about 40 TiB. + +The real question is how many items an export can hold with the above limits in +place. This will heavily depend on the average encoded item size and the +compression ratio, both of which can vary wildly from one directory structure +to another. + +I've had one report with ~1.4 billion files resulting in a ~21 GiB file. +Extrapolating from that and assuming the compressed data size is the limiting +factor, this format could hold ~68 billion items. Increasing the compression +level and using larger data block sizes to further improve compression ratio, +one could perhaps store about 100 billion items. On the one hand, that sounds +like an insane number nobody will ever reach. On the other hand, a decade ago I +couldn't imagine people having more than 100 million files, yet here we are. + +On the upside, all the major limitations can be attributed to the maximum size +and format of the index block. It's possible to implement an alternative index +format in the future that can be automatically switched to whenever any of the +above limits are exceeded, thus providing a seamless upgrade path without +breaking compatibility for the existing exports that do fit within the limits. + +# Security considerations + +Directory trees can get very large and you can easily exceed available RAM when +attempting to read everything into memory. Reading only small parts of the tree +can help cut down on memory use, but it's still a good idea to implement limits +or detect and handle when you're about to run out of memory. + +There are several places in the format where byte offets are used to refer to +blocks or items. These offets must be validated to ensure that they stay within +the bounds of the respective file or block. In particular, itemref offsets +could potentially refer to memory before (in the case of a relative itemref) or +after (absolute itemref) the decompressed data, and pointers in the index block +could refer to offsets beyond the end of the file. + +The CBOR encoding used for items is self-delimiting, but a badly formatted item +may not be properly terminated before the end of the decompressed block +contents. Readers should take care that this does not lead to reading past the +allocated buffer. + +In a well-formed directory tree, each item is referenced exactly once by either +the *Root\_itemref* or a *pref* or *sub* field. However, it is also possible to +construct a file where this is not the case, and implementers should be aware +that itemref loops are possible. + +# Implementation notes + +Data block size +: It is up to the file writer to choose a suitable data block size. This is a + compromise between compression efficiency and memory use: larger blocks + compress better but also require more memory, both for reading and writing. + Ncdu currently keeps 8 uncompressed blocks in memory when reading and one + block per thread when writing. Ncdu starts with blocks of 64 KiB, but + gradually increases the size to 2 MiB for very large directory trees in + order to not bloat the index size too much and to prevent running into the + maximum data block number limit. + +Testing +: If you're implementing a custom writer for this format, make sure to check + out the + [ncdubinexp.pl](https://code.blicky.net/yorhel/ncdu/src/ncdubinexp.pl) + script in the git repository. Ncdu only reads the parts of a file that it + actually needs, so passing a file to ncdu is no guarantee that it is + well-formed. The ncdubinexp.pl script is more thorough in validating file + correctness but misses a few invariants that ncdu does check for, so + the best way to verify a file is to run both: + + ``` + ncdu -f file.ncdu -o/dev/null # Read entire tree and export to /dev/null + ncdubinfmt.pl Info
  • Docs
  • Changes (2.x)