You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LANL systems face significant security threats, both insider and outsider. I'm hoping to state the basic considerations for a shared HPC resource and allow others to revise and extend.
Management/Compute isolation
Compute nodes, ideally, will never need to initiate communication with the management plane (comprising the image servers, service nodes, fabric managers, orchestration, etc that allow the HPC system to work as an ensemble). If such communication is unavoidable, it must be strictly separated from any endpoint that could allow an attacker to escalate privileges and act as a domain administrator.
Rapid Updates
Updating regularly for security hygiene or rapidly when vulnerabilities are detected are central to good security in general, and to LANL operations in particular. Any aspect of the HPC environment that prevents rapid testing and rollout of patches to packages or kernel components should be minimized, noted, and eliminated as soon as possible.
Verification
Red-team testing of specialized (small-community) management software, networks, filesystem projection software, and other components that don't get attention from large vendors or communities should be regularly red-teamed by knowledgable internal developers with an assumption of root access on user-facing nodes.
What else do we need in here?
The text was updated successfully, but these errors were encountered:
LANL HPC Security Pillars
LANL systems face significant security threats, both insider and outsider. I'm hoping to state the basic considerations for a shared HPC resource and allow others to revise and extend.
Management/Compute isolation
Compute nodes, ideally, will never need to initiate communication with the management plane (comprising the image servers, service nodes, fabric managers, orchestration, etc that allow the HPC system to work as an ensemble). If such communication is unavoidable, it must be strictly separated from any endpoint that could allow an attacker to escalate privileges and act as a domain administrator.
Rapid Updates
Updating regularly for security hygiene or rapidly when vulnerabilities are detected are central to good security in general, and to LANL operations in particular. Any aspect of the HPC environment that prevents rapid testing and rollout of patches to packages or kernel components should be minimized, noted, and eliminated as soon as possible.
Verification
Red-team testing of specialized (small-community) management software, networks, filesystem projection software, and other components that don't get attention from large vendors or communities should be regularly red-teamed by knowledgable internal developers with an assumption of root access on user-facing nodes.
What else do we need in here?
The text was updated successfully, but these errors were encountered: