Ncdu 2 announcement + beta releases
This commit is contained in:
parent
ecb76f9797
commit
38b35f30b4
51 changed files with 553 additions and 15 deletions
|
|
@ -6,6 +6,8 @@ rare occasions are published on this page.
|
|||
|
||||
## Articles That May As Well Be Considered Blog Posts
|
||||
|
||||
`2021-07-22` - [Ncdu 2: Less hungry and more Ziggy](/doc/ncdu2)
|
||||
|
||||
`2019-08-13` - [From SQL to Nested Data Structures](/doc/sqlobject)
|
||||
: How to easily fetch complex nested data structures from a normalized
|
||||
relational database.
|
||||
|
|
|
|||
328
dat/doc/ncdu2.md
Normal file
328
dat/doc/ncdu2.md
Normal file
|
|
@ -0,0 +1,328 @@
|
|||
% Ncdu 2: Less hungry and more Ziggy
|
||||
|
||||
(Published on **2021-07-22**)
|
||||
|
||||
{.right}
|
||||
|
||||
[Ncdu (NCurses Disk Usage)](https://dev.yorhel.nl/ncdu) is a terminal-based
|
||||
disk usage analyzer for Linux and other POSIX-y systems, (formerly) written in
|
||||
C and available in the package repositories of most distributions.
|
||||
|
||||
Over the past months I have been working on a complete rewrite of ncdu and just
|
||||
released 2.0-beta1. Ncdu 2 is a full replacement of ncdu 1.x, keeping all the
|
||||
same features, the same UI, keybindings and command-line flags, all to make it
|
||||
a proper drop-in replacement. There is, after all, nothing more annoying than
|
||||
having to get re-acquainted with every piece of software you've installed every
|
||||
time you decide to update.
|
||||
|
||||
|
||||
# Why rewrite, then?
|
||||
|
||||
Of course, there wouldn't really be a point in rewriting ncdu if the new
|
||||
version doesn't improve at least *something*. This rewrite is, in fact, the
|
||||
result of me agonizing for a long time about several major shortcoming of ncdu
|
||||
1.x:
|
||||
|
||||
1. It has always been a bit of a memory hog. Part of this is inherent to what
|
||||
it does: analyzing a full directory tree and making it smoothly browsable
|
||||
requires keeping track of every node in that tree, and with large
|
||||
directories containing millions of files... yeah, that'll require some
|
||||
memory. I had, of course, already implemented all the low-hanging fruit to
|
||||
ensure that ncdu wasn't *overly* inefficient with memory, but there was
|
||||
certainly room for improvements.
|
||||
2. Ncdu 1.x does not handle hard link counting very efficiently and can in some
|
||||
(fortunately rare) cases [get stuck in an `O(n²)`
|
||||
loop](https://code.blicky.net/yorhel/ncdu/issues/121). This one is
|
||||
particularly nasty to fix without increasing memory use.
|
||||
3. Another hard link-related problem: hard links are counted only once in a
|
||||
directory's cumulative size, as is consistent with what most other
|
||||
hard link-supporting disk usage analyzers do. This is useful, since a file
|
||||
that has been "duplicated" by means of hard links only really occurs on disk
|
||||
once, the file's data is not actually duplicated. But at the same time this
|
||||
feature can be misleading: deleting a directory does not necessarily reclaim
|
||||
as much disk space as was indicated by ncdu, since it's possible for
|
||||
hard links *outside* of that directory to point to the same file data, and
|
||||
hence for that data to remain on disk. I've had a good idea on how to better
|
||||
present such scenarios, but an efficient implementation always eluded me.
|
||||
|
||||
None of the above issues are easy, or even possible, to solve within the data
|
||||
model of ncdu 1.x, so major changes to the core data model were necessary.
|
||||
Since each and every aspect of ncdu - both the UI and all algorithms - are
|
||||
strongly tied to the data model, this effectively comes down to a full rewrite.
|
||||
It was during a hike through the local forests that I finally came to a
|
||||
promising solution that addresses all three points.
|
||||
|
||||
|
||||
# But first: an apology
|
||||
|
||||
I'm sorry. I was anxious to try out that "promising solution" of mine and C
|
||||
isn't the kind of language that makes quick prototyping very easy or
|
||||
pleasurable. So I ended up prototyping with [Zig](https://ziglang.org/) instead
|
||||
and it ended up being more than just a prototype.
|
||||
|
||||
Zig is an excellent programming language and uniquely well suited for little
|
||||
tools like ncdu, but it currently has a **major** flaw: It's not even close to
|
||||
stable. There's no stable version of the language, the standard library nor of
|
||||
the compiler, and each new release comes with major breakage left and right. It
|
||||
also has a fair amount of (known) bugs and its portability story, while
|
||||
impressive for the stage the language is in, is not yet on par with C. The
|
||||
language is looking very promising and I have no doubt that Zig will eventually
|
||||
reach the level of stability and portability to make it a good target for ncdu.
|
||||
But, judging from an outsider's perspective, that's likely to take a few more
|
||||
years. And that's okay, after all, every language needs time to mature.
|
||||
|
||||
But what does that mean for ncdu? For regular users, probably not that much. I
|
||||
provide static binaries for Linux as I've always done, so you can just grab
|
||||
those and run the fancy new ncdu as usual. If you want to compile from source,
|
||||
you only need to grab the right Zig version and run `zig build`. Assuming you
|
||||
don't run into bugs, that is, but for the most part things tend to work just
|
||||
fine out of the box. For distributions, the Zig situation is rather more
|
||||
problematic, primarily because the version of ncdu is now strongly tied to the
|
||||
version of Zig, which in turn means that distributions are unable to upgrade
|
||||
these packages independently from each other. If they package Zig 0.8, then
|
||||
they may also have to package a version of ncdu that can be compiled with Zig
|
||||
0.8. If they want to upgrade to a newer version of Zig, they may not be able
|
||||
to do so without waiting for me to release a new version that works with that
|
||||
particular Zig, or maintain their own local patches for ncdu. The alternative
|
||||
is that distributions will have to support multiple versions of Zig at the same
|
||||
time, but few have the time and infrastructure to do that. Either solution is
|
||||
messy, and for that I apologize.
|
||||
|
||||
Considering the above, I will continue to maintain the C version of ncdu for as
|
||||
long as there are people who use it. *Maintenance* meaning pretty much what
|
||||
I've been doing for the past few years: not particularly active in terms of
|
||||
development, just occasional improvements and fixes here and there. I may also
|
||||
backport some additions from future 2.x versions back into the C version,
|
||||
especially with regards to visible interfaces (CLI flags, keybindings, UI,
|
||||
etc), to dampen the inevitable agony that arises when switching between systems
|
||||
that happen to have different versions installed, but features that seem like a
|
||||
pain to implement in the 1.x codebase will likely remain exclusive to the Zig
|
||||
version.
|
||||
|
||||
As long as there is no stable version of Zig yet, I will *try* to keep ncdu 2.x
|
||||
current with the Zig version that [most distributions have packages
|
||||
for](https://repology.org/project/zig/versions), which in practice generally
|
||||
means the latest tagged release.
|
||||
|
||||
<!--In the unlikely event that Zig turns out to be a miserable failure - never
|
||||
gets in a stable state or ends up never gaining sufficient adoption - I may
|
||||
also decide to do another rewrite in a more acceptable language in the future,
|
||||
be that C or Rust or whatever else does the trick. Or maybe I'll just give up
|
||||
and stick with the not-optimal-but-perfectly-usable C version. Who knows.-->
|
||||
|
||||
On the upside, ncdu 2.0 only requires the Zig compiler (plus the standard
|
||||
library that comes with it) and ncurses. There's no other external dependencies
|
||||
and none of that vendoring, bundling and insane package management stuff that
|
||||
haunts projects written in other fancy new languages.[^1]
|
||||
|
||||
|
||||
# Less hungry
|
||||
|
||||
Ncdu 2 uses less than half the memory in common scenarios, but may in some
|
||||
cases use *more* memory if there are a lot of hard links. Quick comparison
|
||||
against a few real-world directories:
|
||||
|
||||
| Test | #files | ncdu 1.16 | ncdu 2.0-beta1 |
|
||||
|:----------------|-------:|----------:|---------------:|
|
||||
| -x / | 3,8 M | 429 M | _162 M_|
|
||||
| -ex / | ~ | 501 M | _230 M_|
|
||||
| backup dir | 38.9 M | 3,969 M | _1,686 M_|
|
||||
| backup dir -e | ~ | 4,985 M | _2,370 M_|
|
||||
| many hard links | 1.3 M | _155 M_| 194 M |
|
||||
|
||||
I have to put a disclaimer here that both my desktop's root filesystem and my
|
||||
backups play into the strengths of ncdu 2.0: relatively many files per
|
||||
directory and, with ~10 bytes on average, fairly short file names. Nonetheless,
|
||||
you should still see significant improvements if your directory tree follows a
|
||||
different distribution. The big exception here is when you have a lot of
|
||||
hard links. The "many hard links" directory I tested above represents a
|
||||
hard link-based incremental backup of ~43k files, "duplicated" 30 times.
|
||||
|
||||
The reason for these differences in memory use are clear when you look at how
|
||||
many bytes are needed to represent each node in the tree:
|
||||
|
||||
| | ncdu 1.16 | ncdu 2.0-beta1 |
|
||||
|--------------|---------------------------|----------------|
|
||||
| Regular file | 78 | 25 |
|
||||
| Directory | 78 | 56 |
|
||||
| Hardlink | 78 + 8 per unique dev+ino | 36 + 20 per ino\*directory combination |
|
||||
|
||||
(These numbers assume 64-bit pointers and exclude storage for file names and
|
||||
overhead from memory allocation and hash tables. Extended mode (-e) uses an
|
||||
extra 18 bytes per node in both versions, and both versions use the same memory
|
||||
allocation strategy for the file names.)
|
||||
|
||||
While there is room for improvements in the hard link situation, the
|
||||
performance issue in ncdu 1.x that I mentioned earlier isn't really fixable
|
||||
without a memory increase. I've always been cautious with accepting an option
|
||||
to disable hard link detection altogether as the results may not be very
|
||||
useful, but maybe I'll reconsider that for a future release. A directory that
|
||||
can't be analyzed at all because you've ran out of memory isn't very useful,
|
||||
either.
|
||||
|
||||
Another difference that is worth mentioning: when refreshing a directory from
|
||||
within the browser, ncdu 1.x will allocate a fresh structure for the new tree
|
||||
and then, after the scan is complete, free the old structure. This may cause
|
||||
ncdu 1.x to temporarily use twice as much memory if you refresh the top-most
|
||||
directory. Ncdu 2.0 instead does an in-place update of the existing in-memory
|
||||
tree and thereby avoids this duplication. On the other hand, ncdu 2.0 is
|
||||
(currently) unable to re-use tree nodes that have been renamed or deleted, so
|
||||
frequently refreshing a directory that has many renames or deletions will
|
||||
increase memory use over time. I don't think this is a very common scenario,
|
||||
but should it become a problem, it *can* be fixed.
|
||||
|
||||
|
||||
# Shared links
|
||||
|
||||
As I mentioned in the introduction, counting hard links can be very confusing
|
||||
because they cause data to be shared between directories. So rather than try
|
||||
and display a directory's cumulative size a single number, these cases are
|
||||
better represented by a separate column. Here's what that looks like in ncdu
|
||||
2.0:
|
||||
|
||||
Amount of data shared between directories in `/usr`:
|
||||
|
||||

|
||||
|
||||
And the amount of unique data in each incremental backup[^2]:
|
||||
|
||||

|
||||
|
||||
To my knowledge, no other disk usage analyzer has this feature (but please do
|
||||
correct me if I'm wrong!)
|
||||
|
||||
You can, for the time being, switch between the two views by pressing 'u'. But
|
||||
if I keep assigning keys to each new feature I may be running out of available
|
||||
keys rather soon, so maybe I'll reclaim that key before the stable 2.0 release
|
||||
and implement a quick-configuration menu instead.
|
||||
|
||||
This feature does come with a large disclaimer: the displayed shared/unique
|
||||
sizes will be incorrect if the link count for a particular file changes during
|
||||
a scan, or if a directory refresh or deletion causes the cached link counts to
|
||||
change. The only way to get correct sizes when this happens is to quit ncdu and
|
||||
start a new scan, refreshing from the browser isn't going to fix it. There is
|
||||
currently no indicator or warning when this happens, that'll need to be fixed
|
||||
before I do a stable release.
|
||||
|
||||
|
||||
# Other changes
|
||||
|
||||
There's a bunch of other changes in ncdu 2 that came naturally as part of the
|
||||
rewrite. Some changes are good, others perhaps less so.
|
||||
|
||||
The good:
|
||||
|
||||
- Improved handling of Unicode filenames. It still doesn't handle Unicode
|
||||
combining marks, but at least it can now recognize full-width characters and
|
||||
it won't cut off filenames in the middle of a UTF-8 sequence.
|
||||
- Improved performance when using `--exclude-kernfs` thanks to caching the
|
||||
result of `statfs()` calls. I tried to measure it and only noticed a ~2%
|
||||
improvement at best, but it's something.
|
||||
- In the 'Links' tab in the info window for hard links, it is now possible to
|
||||
jump directly to the selected path.
|
||||
- The file browser now does a better job at remembering the position of the
|
||||
selected item on your screen when switching directories.
|
||||
|
||||
The ambiguous:
|
||||
|
||||
- Ncdu 2.0 doesn't work well with non-UTF-8 locales anymore, but I don't expect
|
||||
this to be a problem nowadays. It can still deal with non-UTF-8 filenames
|
||||
just fine, but these will be escaped before output rather than directly
|
||||
thrown at your terminal as ncdu 1.x does.
|
||||
- The item information window organization is a little bit different. Just a
|
||||
tiny little bit, I promise.
|
||||
- Ncdu 2 now uses the `openat()` family of system calls to scan directories.
|
||||
This is generally an improvement over the `chdir()` and `opendir()` approach
|
||||
of ncdu 1.x, but does require a few more file descriptors (big deal) and is
|
||||
less portable to ancient systems (would Zig even work on those?).
|
||||
|
||||
The less good:
|
||||
|
||||
- Opening the 'Links' tab in the info window for hard links now requires a scan
|
||||
through the in-memory tree, so it's noticeably slower. To my surprise,
|
||||
though, a full scan through a tree with 30+ million files takes less than a
|
||||
second on my system, so in practice this probably isn't going to be a problem
|
||||
(and who uses that 'Links' tab, anyway?).
|
||||
- Refreshing a directory may leak memory (as discussed earlier).
|
||||
- The browser UI is not visible anymore when refreshing or deleting a
|
||||
directory. The problem is that the browser keeps cached information about
|
||||
the opened directory, and this cache may be invalidated while the
|
||||
refresh/deletion is running. This is also kind-of a problem in ncdu 1.x, but
|
||||
it's less pronounced. There have been requests for allowing interactive
|
||||
browsing while ncdu is still scanning, so this won't a problem if I ever get
|
||||
around to implementing that, but it's not much of a priority on my end.
|
||||
Updating the browser's cache on each UI draw is going to be too expensive, so
|
||||
I'm not yet sure how to handle it.
|
||||
- Lots of new bugs, no doubt.
|
||||
|
||||
|
||||
# Next steps
|
||||
|
||||
[Grab yourself an ncdu-2.0-beta1](/ncdu) and test! The source code is available
|
||||
in the ['zig' branch in the
|
||||
repository](https://code.blicky.net/yorhel/ncdu/src/branch/zig).
|
||||
|
||||
My first priority is to get 2.0 ready for a "stable" release, which means it
|
||||
needs to get some serious testing in the wild to evaluate how well it works and
|
||||
to flesh out the inevitable bugs. It's still a bit unclear to me if it even
|
||||
makes sense to release a stable version when the foundation it's built on is
|
||||
inherently unstable, but let's just see how things go.
|
||||
|
||||
On the slightly longer term, the rewrite to Zig opened up the possibility for a
|
||||
few more features that I've been wanting to see for a while, but that seemed
|
||||
tricky to implement in 1.x.
|
||||
|
||||
- Multithreaded scanning. This will be useless for old fashioned rotating
|
||||
hard drives, but for SSDs and especially NVMe, scanning performance can be
|
||||
greatly improved by distributing the work across multiple threads. While
|
||||
rewriting the code I came up with a promising idea on how to implement this,
|
||||
so I'd love to experiment with that in future versions (io\_uring is also an
|
||||
interesting target, but potentially even more complex).
|
||||
- Faster `--exclude-pattern` matching. Honestly, this feature is currently so
|
||||
slow in both versions that I'm surprised nobody has ever complained about it
|
||||
(not to me, in any case). It's possible to slow ncdu's scanning performance
|
||||
down to a crawl with just a few patterns, a more clever matching
|
||||
implementation could provide major improvements.
|
||||
- Exporting an in-memory tree to a file. Ncdu already has export/import
|
||||
options, but exporting requires a separate command invocation - it's not
|
||||
currently possible to write an export while you're in the directory browser.
|
||||
The new data model *could* support this feature, but I'm still unsure how to
|
||||
make it available in the UI.
|
||||
- Transparent export compression. The export function dumps uncompressed JSON
|
||||
data and is designed to be piped through `gzip` or similar commands. While
|
||||
this is documented in the manual page, I still see many people writing the
|
||||
export to disk without any sort of compression. That's a pretty big waste of
|
||||
space, so it would be nice if ncdu could transparently run the exported data
|
||||
through external (de)compression tools to make this easier and more
|
||||
discoverable.
|
||||
|
||||
These features are in addition to a long list of other possible improvements
|
||||
that I've been meaning to work on for the past decade, so don't expect too
|
||||
much. :)
|
||||
|
||||
|
||||
<!--
|
||||
TODO: Explain data model?
|
||||
- less pointers
|
||||
- struct splitting
|
||||
- arena allocation (and how this affects refreshing)
|
||||
|
||||
TODO: Rant about C openat annoyance?
|
||||
-->
|
||||
|
||||
|
||||
[^1]: That's not to say Zig is immune to the problem of projects using hundreds
|
||||
of tiny little dependencies, but that development style isn't strongly
|
||||
encouraged in its current state: the standard library already covers a lot of
|
||||
ground, package management solutions are still being worked on and it's easy
|
||||
enough to just use existing C libraries instead.
|
||||
[^2]: I would absolutely *love* for directory-level reports like these to be
|
||||
available for other forms of data sharing, such as reflinks or btrfs/ZFS
|
||||
snapshots. But alas, I doubt I'll ever be able to implement that in ncdu.
|
||||
Even if I could somehow grab and untangle the underlying data, keeping track
|
||||
of every block in a large filesystem is no doubt going to be very costly in
|
||||
both CPU and memory. I did write a [little tool](/dump/btrfssize) some time
|
||||
ago to generate such reports for quota-enabled btrfs subvolumes, but I ended
|
||||
up disabling the quota feature later on because even that is pretty costly.
|
||||
There's also [btdu](https://github.com/CyberShadow/btdu), which takes a very
|
||||
interesting approach to analyze btrfs filesystems.
|
||||
|
|
@ -20,6 +20,11 @@ the incidental article on this site. Enjoy your stay!
|
|||
<!-- These announcements are parsed by mkfeed.pl, see that file for formatting -->
|
||||
## Announcements <a href="/feed.atom"><img src="/img/feed_icon.png" alt="Atom feed"></a>
|
||||
|
||||
`2021-07-22` - ncdu 2.0-beta1 released <!-- tags: ncdu, link: /doc/ncdu2 -->
|
||||
: This marks the initial beta version of a complete rewrite of ncdu, written
|
||||
in Zig. This version significantly reduces memory usage and improves hard
|
||||
link counting. [Full release announcement](/doc/ncdu2) - [Ncdu homepage](/ncdu).
|
||||
|
||||
`2021-07-02` - ncdu 1.16 released <!-- tags: ncdu, link: /ncdu -->
|
||||
: A minor feature & bugfix release. This adds dynamic sizing of the file size
|
||||
bar, an `$NCDU_LEVEL` environment variable when spawning a subshell, more
|
||||
|
|
|
|||
31
dat/ncdu.md
31
dat/ncdu.md
|
|
@ -8,30 +8,39 @@ POSIX-like environment with ncurses installed.
|
|||
|
||||
## Download
|
||||
|
||||
Latest version
|
||||
C version (stable)
|
||||
: 1.16 ([ncdu-1.16.tar.gz](/download/ncdu-1.16.tar.gz) - [changes](/ncdu/changes))
|
||||
|
||||
I also have convenient static binaries for Linux
|
||||
[i486](/download/ncdu-linux-i486-1.16.tar.gz),
|
||||
[x86_64](/download/ncdu-linux-x86_64-1.16.tar.gz),
|
||||
[ARM](/download/ncdu-linux-arm-1.16.tar.gz) and
|
||||
[AArch64](/download/ncdu-linux-aarch64-1.16.tar.gz). Download, extract
|
||||
and run; no compilation or installation necessary (uses
|
||||
[AArch64](/download/ncdu-linux-aarch64-1.16.tar.gz).
|
||||
Download, extract and run; no compilation or installation necessary (uses
|
||||
[musl](http://www.musl-libc.org/)).
|
||||
|
||||
Project status
|
||||
: *Maintenance mode*: I consider ncdu to be mostly complete. I'm still here
|
||||
to keep it alive and to fix issues as they come along, but I don't actively
|
||||
work on adding new features. Bug reports are still very welcome. Feature
|
||||
requests are welcome too, but don't expect much from that. Patches and pull
|
||||
requests for new features will likely end up getting ignored.
|
||||
Zig version (beta)
|
||||
: 2.0-beta2 ([ncdu-2.0-beta2.tar.gz](/download/ncdu-2.0-beta2.tar.gz) - [changes](/ncdu/changes) - requires Zig 0.8)
|
||||
|
||||
See the [release announcement](/doc/ncdu2) for information about the
|
||||
differences with the C version.
|
||||
|
||||
Static binaries for Linux:
|
||||
[i486](/download/ncdu-2.0-beta2-linux-i386.tar.gz),
|
||||
[x86_64](/download/ncdu-2.0-beta2-linux-x86_64.tar.gz),
|
||||
[ARM](/download/ncdu-2.0-beta2-linux-arm.tar.gz) and
|
||||
[AArch64](/download/ncdu-2.0-beta2-linux-aarch64.tar.gz).
|
||||
|
||||
Development version
|
||||
: The most recent code is available on a git repository and can be cloned
|
||||
with `git clone git://g.blicky.net/ncdu.git/`. The repository is also
|
||||
available for [online browsing](https://g.blicky.net/ncdu.git/).
|
||||
available for [online browsing](https://code.blicky.net/yorhel/ncdu/) (and
|
||||
[through cgit](https://g.blicky.net/ncdu.git/) if you prefer that). The
|
||||
master branch represents the C version, the Zig version can be found in the
|
||||
'zig' branch.
|
||||
|
||||
Ncdu is entirely written in C and available under a liberal MIT license.
|
||||
License
|
||||
: MIT.
|
||||
|
||||
## Packages and ports
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue