Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next-Gen NiXium #131

Open
Kreyren opened this issue Aug 17, 2024 · 6 comments
Open

Next-Gen NiXium #131

Kreyren opened this issue Aug 17, 2024 · 6 comments
Assignees

Comments

@Kreyren
Copy link
Member

Kreyren commented Aug 17, 2024

This issue is work in progres..

This issue ignores the Kreyren/kreyren#111 -relevant content as that is still on-going and currently can't be used for an objective review.


The idea behind this meta-issue is to permanently address the unfixable issues in both GNU Guix and NixOS by implementing a nix-based distribution that addresses the observed issues:

GNU Guix GNU/Linux:

  • Organization-wise the project is very mismanaged resulting very buggy and unreliable Operating System that in production requires more babysitting than feasable.

  • The lisp implementation used is GNU Guile which as observed is very over-complicated and limited on functionality in comparison to NixOS.

  • Their community management is insufficient as they expect everyone to use IRC and submit patches via e-mail while being very toxic against any proprietary software mention including those that can't be managed in a reasonable time e.g. Intel Microcode, wifi drivers, etc..

  • Their source code management is outdated as they lack any kind of implementation for Ci/CD and use GNU Savannah for the git forge which is very minimal and not friendly to new developers.

NixOS

Nix language is terrible implementation for it's designed workflow, it's constantly encountering infinite recursion issues when the code gets more complex that takes unreasonable amount of human resources to be managed e.g. https://discourse.nixos.org/t/how-to-correctly-implement-release-flexible-nixos-modules/49869 encountered at #124 (comment) and as a result the current source code is still affected by it https://github.com/NiXium-org/NiXium/blob/35dc1a258134234f1601c6124bd4881ef1ba7567/src/nixos/machines/tupac/config/disks.nix#L29-L30.

Additionally the Nix Language is no where near the flexibility and functionality of scheme-based languages resulting in a code that is more complex than it has to be and that is often very difficult to make work for the desired workload e.g. trying to get a different package version often requires making a custom package definition as the derivation providing the package is inherently not compatible with different versions of the package or having to pin a nixpkgs commit that provides the version.

Their community management forces cultures that hate each other in a collaborative work without being the required impartial judge to try to manage it which is prone to constantly cause conflicts and that is managed insufficiently as evident by the Anduril crisis with the current NixOS management refusing to learn from these mistakes and causing an exodus of maintainers and developers.

That said there are things done right which this issue aims to learn and inspire from namely:

GNU Guile deep integration with the GNU Guix

The use of proper frameworking language enables the distribution to function as the de-factor borg that can be infinitely expanded with built-in functionality instead of trying to make two components from two different worlds work with each other.

Namely the integration of the Init/service manager of GNU Shepherd on guix and SystemD on NixOS:

As observed the GNU Shepherd can integrate without having to make a duplicate function and in a way that can directly use the features from either guix or shepherd seamlessly resulting in a significantly smaller footprint and more functional integration that doesn't limit innovation like e.g. NixOS/nixpkgs#324911 (comment).

In comparison to the SystemD used in NixOS which is binded through an API and is very overcomplicated with features that NixOS will likely never use that only poses an increased risk of security vulnerability and difficulty of adaptation for the workload.

The guix's implementation also expands to 3rd party services such as gitile which is a gitea-inspired forge de-facto turning the package manager into having a built-in git forge or even an authoritative DNS server.

NixOS has a better management

NixOS is funded by the community through an open-collective which is done in a transparent way and thus does not have major issues with funding and resources like guix.

Nix also has CI/CD to capture common issues and is overall in an another league in terms of reliability to be an acceptable solution for production and mission-critical environment thus objectively better solution despite doing a worse design decisions in comparison to guix.


Considered lisp implementations

Common-Lisp ("CL")

Which was already attempted as implemented by a community member infinisil: https://github.com/infinisil/nixlisp, but CL is not a good fit for a functional language as it's expressions are more aimed to be an object oriented and have to be integrated in a very complex way in comparison to scheme.

GNU Guile

Overall not a bad option, but it's lacking on the integration with the IDE and emacs in terms of documentation making the language painfully difficult to learn and use correctly. Additionally it lacks a needed functionality for this task as well as e.g. it's not possible to attach a docstring with a default value to a variable and upstream devs do not seem to be interested in trying to make the documentation better either.

So for our usecase it would have to be forked and it would be a lot of work to re-integrate the documentation to be at least on elisp-levels.

Steel

TBD mattwparas/steel#259


Tvix is a Nix rewrite in rust - https://github.com/tvlfyi/tvix

State an usability of the project is unknown

Current major problem is that they use GPLv3 license, which might limit our use.


Infrastructure Management

Required roles

This is a writeup of the required roles while some systems will be able to do multiple roles at the same time.

Compute Server

NiXium is currently oriented around thin clients that are focused on battery life at the cost of performance with a set of minimal required features to be usable as a thin clients that rely on a remote accessible server for compute of compilation and other related tasks (blender rendering, etc..)

This server is expected to be very power hungry so we need something that can suspend itself when it's not in use.

Current Research Device: Morph.

Control Server

Power-Efficient always-on Server that is used to send commands to the other devices on it's relevant local network e.g. awaken commands to the compute server.

Additionally control server can be used to handle power efficient tasks like home assistant.

Current Research Device: Mracek

Storage Server

System with expandable storage connected to the local network to provide this storage access to all relevant system and remotely if needs be.

Kreyren's Personal Hardened Thin Client

Super Administrator's device that is hardened and is used to control the infrastructure.

Current Research Device: Tsvetan (Not yet submitted in central branch)

Tsvetan aka the OLIMEX Teres-I was selected as it's Open-Source Hardware Device that runs Open-Source Software and Firmware making it very flexible for various implementations. The issue is that it has a very slow storage and low amount of RAM making it sub-optimal for this use.

Current plan with this is to implement the System on Module standard that was finished on 8th November 2024 to make it economical and efficient to fabricate in a hackerspace environment with the ability to change the BGA chips.

OLIMEX open-sourced the iMX8MP chips (https://www.olimex.com/Products/SOM/NXP-iMX8/iMX8MP-SOM-4GB-IND/open-source-hardware) which seem to be sufficient for out use, but consult with manufacturers for options.

Ideal solution would be getac-like rugged system (https://youtu.be/7-ikjUWJ4Vs) with hot-swappable battery of 2x99Wh, two MxM slots for dedicated GPU and arm (riscv is considered too much of a liability rn) CPU.

Ideally dual RTX4090m that is underclocked unless the device is connected to the external water cooler or maybe dual Intel A380M configured as multiGPU.
image

To be Moved in separate tracking..

AI Server

TBD

@Kreyren Kreyren self-assigned this Aug 17, 2024
@Kreyren Kreyren changed the title Consider migrating on Tvix Next-Gen NiXium Aug 17, 2024
@Kreyren
Copy link
Member Author

Kreyren commented Aug 18, 2024

Hey @flokli, can you please elaborate on the state of tvix? Thanks! <3

@flokli
Copy link

flokli commented Aug 19, 2024

Hey @Kreyren, you're welcome to follow the state of Tvix on the various blog posts, changes in the repository, and other public information - I assume you understand I cannot provide individual status updates in various issues elsewhere. Thanks!

@Kreyren
Copy link
Member Author

Kreyren commented Aug 20, 2024

@flokli will read through them, thanks for info!

@Kreyren
Copy link
Member Author

Kreyren commented Nov 10, 2024

@TanvirOnGH Anything relevant on the state of tvix and it's functional implemention is welcomed. If it's just drop in replacement to the current Nix daemon with worst case few unimportant features missing then it's actionable.

Note that the backend has higher priority rn as i am working on a solution to replace the amazon-hosted cache by nix and instead use decentralized distribution via bittorrent or alike so that when user requests cache it would be fetched via basically a mesh network that distributes the files bit by bit.

relevant: NixOS/nix#859

ideally bittorrent over i2p or some solution alike for security and privacy?

@Kreyren Kreyren pinned this issue Nov 29, 2024
@nix-config-storage
Copy link

Tsvetan aka the OLIMEX Teres-I was selected as it's Open-Source Hardware Device that runs Open-Source Software and Firmware making it very flexible for various implementations. The issue is that it has a very slow storage and low amount of RAM making it sub-optimal for this use.

I suggest checking out this company. They're open sourced as well, and by default devices are set up with Dasharo/coreboot as the uefi.

Though you can also choose heads, whether or not you'd like the Intel management engine disabled, and a bunch of other things too. Plus you can buy replacement parts down the line, if needed.

https://novacustom.com/

@Kreyren
Copy link
Member Author

Kreyren commented Dec 28, 2024

I suggest checking out this company. They're open sourced as well, and by default devices are set up with Dasharo/coreboot as the uefi. -- @nix-config-storage (#131 (comment))

I am aware of them and about system76 and others who sell rebranded tongfang devices and have one for testing right now, but they are open-source only in software and i have to beg them to provide me with schematics and they won't provide gerbers like at all and even if they did it's made by MONSTER brand (http://support-monsternotebook-com.translate.goog/tr/product/tulpar-t5-v232-156-oyun-bilgisayari/drivers-and-downloads) who uses slightly different configuration so i have to basically remove lot of the components from the device and then probe the EC pads to see where they go to port it which is so far a real nightmare -- Kreyren/firmware-open#1.

I asked system76 multiple times for schematics as their device (gaze17) is very similar in design (just has RTX4060 instead of RTX3000 series) to one of their laptops and they refused to provide it to me without showing them proof of purchase of the device and then say that if i own the device that i can share the schematics, but are kinda weird about it so i am worried that that if i made KiCADs for it that i would be subject to a lawsuit. I asked novacustom before as well with the similar request but they said to not sell that device and then never replied to me iirc, @wessel-novacustom if you actually provide these files (schematics and/or gerbers including schematics without gerbers and won't sue me for implementing them in KiCAD as open-source hardware) then feel free to correct me as i would like to support novacustom devices.

It's currently being discussed internally if Arcanyx (own-designed WIP nix-based distribution that NiXIum is migrating on from NixOS) will support non-OSHW devices or how it should be done as GNU FSDG (which it's planned to be compatible with, but be more strict in implementation as they are not very OSHW-friendly) is not concerned about these, but i want Arcanyx to be more strict probably.

Technically if tongfang or clevo will build these devices for us and lets us make kicads to be released (afaik Intel threatens with lawsuit anyone who releases the hardware reference files for their chips without a signing their soul and first born to them which afaik is why https://github.com/system76/virgo is stalled, @jackpot51 sorry for the ping, but feel free to correct me as i would be curious in a fork if i had design references to know how to design gerbers for it) then we could support tongfang devices through selling them without markup (since arcanyx is non-profit, pending registration..) and releasing designs for them.

But another problem is that x86 systems are still kinda problematic as they always depend on some black box binary as explained by dasharo's opennes score in https://docs.dasharo.com/variants/msi_z690/openness_score/#v090-heads that binaries like fspm.bin and fsps.bin seems to be a black box that is impossible to review what it's actually doing and the cybersec community seems to assume that they are the gateway for stuff like Pegasus to be so effective though unsure if that's true and if so how much of that could be mitigated with me-cleaner to be kinda acceptable? assuming that it's the most popular architecture currently and if the code can be reviewed to be confirmed to not be malicious or at least manageable to disable all the backdoors then i guess it would be okay to support for the time being?

CC @corna @mkopec for context, thanks for anything relevant 🙏

Though i would preferably threw anything that's not open-source in both software and hardware the proprietary treatment like guix does with nonguix (3rd party repository that provides all the needed proprietary pieces like microcode updates, blobs, etc..) and just put them in a separate repository that is not part of arcanyx and treated them as proprietary even if they run open-source software.

So currently mostly orienting the development around OLIMEX's Teres-I and i am nearly done with https://git.dotya.ml/kreyren/OSHW-System-On-Module to enable it's PCB fabrication to be within less than 25 EUR (small 8-14 layer PCB that most fabricators will make for 5 of them for 1 EUR each to spread out the pins into a standardized array to then be placed on 2-4 layer PCB that it slots into) and https://git.dotya.ml/kreyren/OSHW-DEIMOS to be able to provide an updated version that can be fabricated very economically with tools found in common hackerspace (the plan is to publish detailed guide on how to do that through fibre laser https://youtu.be/wZiBThGrhqs as afaik maker/hackerspace in vienna made a lot of progress with it to make it reliable, CNC https://www.youtube.com/watch?v=cCm-UL-dCEc and process 01 https://www.youtube.com/watch?v=Kn92cLf69iw)

Any relevant opinions appreciated though. Currently trying to figure out how to handle the compute server which if the Dasharo's fimrware is beyond a reasonable doubt sane would help me to migrate from 6 devices into a one with MSI PRO Z790-P that would sped up the development a lot as i rely on the infra provided to my by hackerspace and i can't adapt it for the needed development rn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants