- this flake is hardened specifically for my hardware sets. currently that is a second-generation ryzen (zen+) with no TPM and a Samsung EVO SSD with flawed hardware encryption. you may need to edit these files to add more things for what your hardware supports. always feel free to upstream your changes!
- the
gamingspecialization is specifically configured to run overwatch. im not kidding. with this nixos install, overwatch becomes my biggest security threat. - there is a
config.hostprofileoption callednoCompromises. this option is not recommended for day-to-day use. you can expect a 50-75% slowdown on any modern cpu by using this, as it disables SMT, enables SLUB debug consistency checks, each of which typically have an up to 50% slowdown, and has a plethora of other sacrificing-performance-for-security settings enabled.
there are two hardware pieces that are required for the using the higher
security paranoid.nix luks configuration;
- a FIDO2 security key. i'm looking at the Token2 Pin+ Dual Release 3.3, it has higher security specs than yubikeys and is far cheaper.
- a removable storage medium you trust with your life. i'm looking at a Samsung PRO Endurance 32GB MicroSD card and a Cotchear MINI
high-endurance microsd cards are typically spec'ed better for longevity than consumer usb drives. this alone makes them a desirable target for this kind of paranoid setup, but they have the added benefit of being easier to take out of an adapter and eat if shit really hits the fan. this is your last resort.
important: this page follows the boot process used in luks/paranoid.nix
the boot process for most linux installs are highly insecure. most configurations today skip secure boot, have unencrypted kernel files, or are otherwise susceptible to maid attacks, kernel swaps, or fake modules. we solve this using a variety of tools that are meant to make breaking the boot process as close to impossible as feasible.
as far as standard mitigations go, lanzaboote is used for secure boot setup,
sbctl keys are properly enrolled if you follow INSTALL.md,
and LUKS2 encrypts your drive, requiring various verification methods in order
to access your files. kernel verification is assured by secure boot in order to
prevent tampering.
when using the luks/paranoid.nix configuration, your LUKS2 header is actually
not stored on your disk. that's that "removable storage medium you trust with
your life" part of the hardware requirements. your header is stored on a ext4
partition on this external storage, meaning you need to have it plugged in to
boot. this means your decryption keys move with you (or more, with the external
storage, but keeping it on a keychain is good practice!).
after first boot setup (see INSTALL.md#First Boot), booting requires the external storage with the LUKS header, a passphrase, and a FIDO2 security key to decrypt your hard drive and boot. this key should also live on your keychain. you might want to invest in a trustable usb hub!
we also implement a hard lockout after 10 failed FDE decryption attempts.
we store the count of attempts in a plaintext file within the same partition
which holds your primary LUKS2 header. quite a simple script, but inconvenient
if not genuinely difficult to crack. gradual backoff of 2 ** (COUNT - 3) secs
starts after 3 failed attempts. once you fail the 10th time, the boot script
locks you out, and you have to mount the external storage to another computer
and manually remove the count file to continue attempting to unlock the
computer.
your LUKS2 header is stored as a binary file on an ext4 partition on the external storage. technically, anyone with physical access could easily clone this and have infinite tries to boot. we solve this issue by LUKS2 encrypting the ext4 partition with a temporary passcode, then on first boot we remove the passcode in favor of the FIDO2 key. these are unclonable. this may seem like it degrades UX significantly, but the decryption should be seamless if you have your FIDO2 key inserted when you boot. this also protects your lockout counter with that same encryption, as it sits on the same partition right beside your LUKS2 header.
- UEFI -> Bootloader the bootloader is verified using Secure Boot, and subsequently loaded.
- Bootloader loads available boot options
the available boot options (for instance, the standard
hardenedmode vs thegamingspecialization) are displayed usinglanzaboote. you are given the option to select either of these, or one of the nixos-given rollbacks (incase an update breaks your system in some way). - Bootloader -> Kernel the bootloader hands off control to the kernel. thus, a beautiful baby linux is born.
- Decrypting your LUKS header the kernel uses your FIDO2 key to decrypt the LUKS2 partition holding your main install's LUKS2 header.
- Decrypting your main drive your FIDO2 key and a unique, secure passphrase entered at runtime are used to decrypt your main drive, allowing the boot to progress.
optionally, on first boot, a 3/5 shamir share can be generated as a last-ditch recovery effort. only generate these keys and share them if you have ride-or-die people you would trust with every single file on this hard drive. without proper assurances, these could be used against you. store them in different jurisdictions, different countries if possible.
to make your setup easier, we have implemented a small config option within the
flake called hostprofile. this allows you to select a kernel flavor, config,
as well as various other things. here's what kernel flavors and configs are
available:
hostprofile.kernel = {
# the kernel flavor to use.
# - "common" pulls the latest mainline kernel from nixos' repos.
# - "hardened" locally compiles the latest release of linux-hardened.
# - "lts" pulls the latest LTS kernel from nixos' repos.
flavor = "common" | "hardened" | "lts";
# the kernel config set to use.
# - "common" is the secure daily-driver configuration. previously split into
# "loose" and "hardened" tiers; those have been merged — common now includes
# everything up to and including the former "hardened" baseline.
config = "common" | "fortress";
};this is the secure daily-driver baseline. it incorporates everything that was
previously split across the "loose" and "hardened" tiers — there is no
longer a softer option. if you need ptrace or user namespaces, apparmor handles
per-application delegation instead of lowering the system-wide floor.
init_on_alloc/init_on_free=1- zero out memory when its allocated or freed, this mitigates use-after-free bugs wrt leaking old dataslab_nomerge- the kernel normally merges "slab caches" with similar sizes to save memory, merging creates possible exploitation paths. this disables that feature.slab_debug=ZP- enables red zones and poisoning for the kernel slab allocator, catching heap corruption at the cost of a small performance hit.randomize_kstack_offset=on- randomizes the kernel stack offset on every syscall. this makes stack-based kernel exploits much harder to aim.page_alloc.shuffle=1- randomizes the free page list, making some classes of heap attacks significantly harder.pti=on- page table isolation, the meltdown fix. keeps kernel page tables out of userspace.spectre_v2=on- spectre variant 2 mitigation.spec_store_bypass_disable=on- spectre v4 mitigation.mds=full- mitigates microarchitectural data sampling, preventing a class of bugs that leak data across cpu boundaries.kvm.nx_huge_pages=force- forces NX bits on KVM huge pages, mitigating the iTLB multihit vulnerability.amd_iommu=on,iommu.strict=1,iommu.passthrough=0- enables amd iommu in strict mode, devices can only DMA to memory they're explicitly allowed to access. prevents a compromised/malicious device from reading arbitrary memory.efi=disable_early_pci_dma- disables pci dma prior to iommu initialization, preventing early-boot malicious/compromised devices from reading arbitrary memory.mem_encrypt=on- enables memory encryption. for my amd machine, this is SME; my bios supports TSME so this is redundant but it's nice to explicitly opt-in.random.trust_cpu=off- we refuse to exclusively trust the cpu for entropy, opting to includejitterentropy_rngas an initrd module to help.random.trust_bootloader=off- we also refuse to trust the bootloader for entropy.vsyscall=none- removes the vulnerable legacy vsyscall mechanism.debugfs=off- explicitly disables debugfs, which exposes internal kernel information.module.sig_enforce- block all unsigned modules.lockdown=confidentiality- kernel lockdown mode; blocks/dev/mem, raw disk access, hibernation, and other paths that could leak or overwrite kernel memory. this is the highest lockdown level.oops=panic- if the kernel hits an oops, panic instead of continuing. a kernel that has oops'd is likely in a vulnerable state.apparmor=1- enables the apparmor LSM.
vm.mmap_rnd_bits=32/vm.mmap_rnd_compat_bits=16- maximum ASLR entropy for memory mappings.vm.mmap_min_addr=65536- prevents mapping the zero page, eliminating null pointer dereference exploits.kernel.kptr_restrict=2- hides kernel pointers from all users incl root.kernel.dmesg_restrict=1- only root can readdmesg.kernel.printk="3 3 3 3"- limits what kernel messages get printed to the console.kernel.kexec_load_disabled=1- disableskexec, which would normally allow runtime kernel-swapping.kernel.core_pattern="|/bin/false"- core dumps get silently discarded.fs.suid_dumpable=0- setuid programs dont produce core dumps to begin with.net.core.bpf_jit_harden=2- hardens the BPF JIT compiler, reducing its attack surface.kernel.unprivileged_bpf_disabled=1- only root can load BPF programs.vm.unprivileged_userfaultfd=0- restricts theuserfaultfdsyscall to root, mitigating some heap exploit techniques.fs.protected_hardlinks/symlinks=1- prevents hard/symlink-based TOCTOU attacks.fs.protected_regular/fifos=2- extends the above protection to regular/fifo files, and prevents privilege escalation via O_CREAT.dev.tty.ldisc_autoload=0- disables automatic TTY line discipline module loading.kernel.sysrq=4- only allowsyncsysrqs.kernel.randomize_va_space=2- full ASLR.kernel.perf_event_paranoid=3-perfevents restricted to root only.kernel.yama.ptrace_scope=1- a process may only ptrace its own children. scope 3 (global disable) was considered but rejected: debuggers and proton-battleye have legitimate reasons to ptrace child processes. instead, apparmor profiles grant theptracepermission explicitly to the specific applications that need it; everything else is blocked at the apparmor layer.kernel.unprivileged_userns_clone=1withkernel.apparmor_restrict_unprivileged_userns=1andkernel.apparmor_restrict_unprivileged_unconfined=1- user namespaces are enabled at the kernel level but apparmor gates them per-application. a process must have an apparmor profile that explicitly grants userns before creation succeeds; unconfined processes are also blocked. this lets containers and chromium-sandboxed browsers work without opening userns to everything.
network-specific sysctl params, these often apply to both ipv4 and ipv6
tcp_rfc1337=1- protects against TIME-WAIT assassination attacks.tcp_syncookies=1- prevents SYN floods.tcp_timestamps=0- disables TCP timestamps, which can aid fingerprinting.accept/secure/send_redirects=0- block all ICMP redirects, preventing routing hijack attacks.accept_source_route=0- disables source routing.rp_filter=2- verifies that incoming packets could've come from where they claim.log_martians=1- log packets with impossible source addresses.net.ipv6.use_tempaddr=2- uses temporary randomized addresses instead of mac-derived addresses.net.ipv6.accept_ra=0- don't accept router advertisements, which can redirect all traffic.
note: expect a system slowdown of 50-75% while using this kernel configuration. disabling SMT and enabling full slab debug each carry 30-50% performance hits individually.
everything in "common", and
slab_debug=FZP- upgrades common'sZPtoFZP, adding full consistency checks on every slab allocation. do not pass go, immediately lose 50-70% of your system performance.nosmt- completely disable simultaneous multithreading. do not pass go, immediately lose 30-50% of your cpu performance.mds=full,nosmt- mitigates MDS with explicit buffer flushes on every kernel → user transition; nosmt reduces the performance cost of this.l1tf=full,nosmt- mitigates L1 terminal faults (Foreshadow), flushing the L1 data cache on each kernel → user transition; nosmt reduces the cost.
kernel.yama.ptrace_scope=3- nobody can ptrace anything, overriding common's apparmor-delegated scope 1. no exceptions.kernel.unprivileged_userns_clone=0- fully disables user namespace creation, overriding common's apparmor-gated approach. no exceptions.
the following modules have been blacklisted — they've fallen out of mainstream use, have large attack surfaces, and are often ripe with CVEs:
"dccp" "sctp" "rds" "tipc"
"n-hdlc" "ax25" "netrom" "x25"
"rose" "decnet" "econet" "af_802154"
"ipx" "appletalk" "atm" "can"
"rxrpc" "algif_aead"
"esp4" "esp6"
we also put a lot of work into hardening userspace. this section & userspace hardening overall is a work in progress, but we've made tremendous progress so far.
fairly simple, really, its probably best that your passkeys are stored on an uncloneable device.
firstly, a udev rule monitors the device which holds your nixos partition's
header, a block device named NIXHEADER. if that device is physically removed,
your system immediately discards the decryption keys and forces a shutdown.
we also have usbguard enabled in block-first mode. currently, all usb devices other than human-interface devices such as a keyboard and mouse are blocked immediately. this stops various attempts at physical access based attacks.
it's not exactly a drop-in replacement, but completely eliminating the need for
setuid binaries is very good practice for high-security environments. we're
working on a small wrapper script that converts sudo calls to run0 calls.
this prevents any form of data leak which happens by virtue of eating the data contained in a core dump.
this makes the system a little less stable, but it is well worth it. an attacker cannot simply gain a root shell on your system by just making the boot process crash anymore, however this does mean that system recovery becomes a severe headache if worst comes to worst.
we are going to take a fresh install, look at all the systemd services, and add a basic hardening template to each of them. this means that those services are less vulnerable, but you should always check this for yourself and add hardening configurations to any services you install.
we enable nix.settings.sandbox in order to assure that no derivations can
peek into the currently running system. this ensures that no derivation can
do things such as read files outside the sandbox or access the network during
compilation.
all packages downloaded from the nixos cache have their signature verified on-device, this prevents threat actors from simply replacing the package we get.
only members of the wheel group can connect to the nix daemon or perform
builds.
this wipes any non-persisted areas of your disk, preventing any bad actor from simply slipping in some malicious binary in a way which persists on reboot.
this seems like it would be common, and generally it is, but it is nice to
point out that /tmp is mounted as a tmpfs.
/proc is mounted with hidepid=2, so users can only see their own processes,
in case the other mitigations we have against this all fail.
logs, machine identity, networkmanager connections, etc are the only things preserved in its entirety.
note: this is a WIP feature
each user's persistent data (such as browser profiles, ssh keys, etc) lives
inside a LUKS-encrypted container stored on the persistence drive. that
container is automatically unlocked by pam_mount when the user logs in, using
their normal password (and optionally FIDO2 key). inside, only the files and
directories explicitly listed within home-manager's impermanence configuration
are kept. the remainder of the home directory is a tmpfs that vanishes on
reboot. this means a user's secrets are never written to disk in plaintext,
and even an attacker who compromises the running system can't read another
user's persistent data without knowing their password.
we also have file protection policies set up to where no regular user can see any of the container files on /persist, reducing the attack area for exfiltration and store-now-decrypt-later attacks.
this also means that having a separate user for any loosened configuration is absolutely essential. even if that kernel is compromised there is no way to actually open any sensitive user data, as it is still LUKS-backed.
to ration storage, we also create all per-user home containers and nested vaults as sparse files, so they only use the disk space actually written. the user container's virtual size acts as a per-user quota.
we use a simple systemd-resolved configuration to require all DNS lookups go
through a DoT tunnel. by default, this uses dns.quad9.net. plaintext dns is
completely disabled for non-tailscale DNS lookups.
we have tailscale enabled with --accept-dns=false, disabling tailscale's
typical system-level dns override. we instead do this ourselves, adding a
systemd network configuration to the tailscale0 network specifiying that
magicdns is only to be used for urls ending in ts.net, this prevents it
from poisoning the entire system's dns.
we plan to add opensnitch to our network stack, but until then we rely
entirely on nftables. right now, we disable all incoming TCP and UDP ports,
this is only altered by tailscale adding its own UDP port to the allowlist.
we replace systemd-timesyncd with chrony and force it to use NTS, which
encrypts the synchronisation data and authenticates the server, preventing
some large threat actor from feeding your system false time data, which could
break certificate verification in a plethora of places.
two servers are configured, with the option minsources 2 set to ensure a
consensus.
we disabled CUPS, avahi, and bluetooth in order to reduce the attack
surface experienced from local network discoverability.
failed login, sudo, and greetd attempts are delayed by 4 seconds, making brute-force attacks extremely slow.
each user has a soft limit of 1024 processes and a hard limit of 8192 processes
this prevents even any root-privileged user from adding any module to the kernel after the boot has completed. this prevents any plethora of attacks on otherwise vulnerable machines or machines with compromised credentials.
hardened_malloc is far more hardened against various attack vectors than
glibc, and so we've swapped it in as the system-wide allocator. this may break
some programs.
known programs that are affected include
- any Electron/CEF program (Steam, Discord, Signal, Chromium, etc.)
- note that for many of these, our default configuration bubblewraps them,
and our settings for bubblewrapping prevent the
LD_PRELOADenvironment variable from propagating from the host. this prevents issues while running these programs.
- note that for many of these, our default configuration bubblewraps them,
and our settings for bubblewrapping prevent the
dbus-broker is widely regarded as a faster, more reliable, and more secure
dbus implementation. for that reason we have selected it over other options.
we have auditd enabled, and we're logging just about everything we can get
our hands on. from any attempts to run module adding/removing binaries, to any
opened/closed connections, to every privilege change.
this clears out most of the remaining attack surface of sibling threads. SMT
is only disabled kernel.config is set to "fortress"
we have deployed an experimental custom derivation involving the apparmor.d
project that aims to force their build process to output apparmor profiles
which are compatible with nix's /nix/store layout. right now it is untested,
but the build succeeds in isolation from the rest of nixos. if this works, we
ship with 1500+ pre-built apparmor profiles for many common applications from
day one.