Skip to content

src/save: Avoid writing CopyFile whenever possible#229

Open
mmirate wants to merge 9 commits intoCyberShadow:masterfrom
mmirate:mmirate-patch-1
Open

src/save: Avoid writing CopyFile whenever possible#229
mmirate wants to merge 9 commits intoCyberShadow:masterfrom
mmirate:mmirate-patch-1

Conversation

@mmirate
Copy link
Contributor

@mmirate mmirate commented Nov 15, 2025

If the file isn't plaintext, CopyFile remains mandatory.

If there are changes, write an inline unified-patch into a heredoc. If it's a new file, write its contents into a heredoc. If the new file has one or fewer lines, write an "echo" command (possibly with the -e flag in case of a file containing plaintext but no newlines).

This is essentially #46 but built into aconfmgr save.

Testing

This branch spent years on my backburner due to some frustrating bugs. Recently I fixed those, so that I have been using this branch on my collection of Arch computers. Doing so I've seen good effect at dredging up cruft, propagating good ideas, separating out things that are supposed to be different; all while keeping my intent clear and simple. I believe I've adapted the existing test cases to pass; but I did not yet try shoehorning multiline files into any of the existing test cases with which to exercise the full breadth of emitted heredocs; and I suspect that multiline text files who do not end with a newline, will still not be accurately captured.

Errata

  • At one point during development I was using the -, as in <<-'EOF', to prepend tabs to the files or patches, to allow better indenting in the typical case where these fragments go inside if-statements etc; but I found it could not then accurately capture any file which was itself indented with tabs, so I dropped that feature, but it might be possible to add it back by checking for leading tab characters.

  • With great difficulty I can fathom that inlining file contents into the config may be undesirable, especially as the line count of the inline content increases. The number of lines to tolerate looks like a bikeshed color. At the moment I've set no limit and no way for the user to add one. I know that adding a new command-line switch for this is suboptimal because it clutters the program's API surface. What about a new helper that manipulates a global variable?

Also Included

  • One read into a global variable. I believe it would have had no adverse effect upon the status quo had it been allowed to persist. The new feature in this PR, by contrast, is broken unless either this variable is localized or the overwritten variable is renamed.

  • The "Trim redundant blank parameters" loop would try to index into an empty args array, so I added a second loop-condition to prevent it from doing that. Maybe the fact that it tried to do that at all is evidence of something going wrong? I'm not sure.

  • There are a few branches which have to write their contents to the $config_save_target themselves, leaving no $func etc to be written. Handling this contingency is what necessitates the nonemptiness check on $func.

If the file isn't plaintext, CopyFile remains mandatory.

If there are changes, write an inline unified-patch into a heredoc. If
it's a new file, write its contents into a heredoc. If the new file
has one or fewer lines, write an "echo" command (possibly with the `-e`
flag in case of a file containing plaintext but no newlines).

This is essentially CyberShadow#46 but built into `aconfmgr save`.

Testing
-------

I've been using this branch on my collection of Arch computers, to good
effect at dredging up cruft, propagating good ideas, separating out
things that are supposed to be different, all while keeping my intent
clear and simple; and I believe I've adapted the existing test cases to
pass. But I did not yet try shoehorning multiline files into any of the
existing test cases with which to exercise the full breadth of emitted
heredocs; and I suspect that multiline text files who do not end with a
newline, will not be accurately captured.

Errata
------

* At one point during development I was using the `-`, as in `<<-'EOF'`,
to prepend tabs to the files or patches, to allow better indenting in
the typical case where these fragments go inside if-statements etc; but
I found it could not then accurately capture any file which was itself
indented with tabs, so I dropped that feature, but it might be possible
to add it back by checking for leading tab characters.

* With great difficulty I can fathom that inlining file contents into
the config is undesirable, especially as the line count of the inline
content increases. The number of lines to tolerate looks like a bikeshed
color, so at the moment there's no limit and no way for the user to add
one. I know that adding a new command-line switch is suboptimal because
it clutters the program's API surface. What about a new helper that
manipulates a global variable?

Also Included
-------------

* One `read` into a global variable. I believe it would have had no
adverse effect upon the status quo had it been allowed to persist. The
new feature in this PR, by contrast, is broken unless either this
variable is localized or the overwritten variable is renamed.

* The "Trim redundant blank parameters" loop would try to index into an
empty `args` array, so I added a second loop-condition to prevent it
from doing that. Maybe the fact that it tried to do that at all is
evidence of something going wrong? I'm not sure.

* There are a few branches which have to write their contents to the
`$config_save_target` themselves, leaving no `$func` etc to be written.
Handling this contingency is what necessitates the nonemptiness check on
`$func`.
Copy link
Owner

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I think it would be a valuable improvement if we can make this work. However, there's definitely a number of somewhat hairy aspects.

I know that adding a new command-line switch for this is suboptimal because it clutters the program's API surface. What about a new helper that manipulates a global variable?

Agreed; I would also imagine that some control on a path-by-path basis would be useful. Perhaps something in the shape of SetSaveMethod '*' auto?

src/save.bash Outdated
fi
elif
AconfNeedProgram file file n &&
file "$system_file" | grep -Fw text | grep -qvFw -- 'with escape sequences' &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LC_ALL=C is needed if we're going to grep for English strings; however, file is a rather heavy dependency for detecting a binary file. Perhaps grep -qI . will work to do the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep -qI . almost works; here's the kind of file that makes it misbehave:

sh-5.3$ if grep -qI . /etc/motd; then file /etc/motd; fi
/etc/motd: ASCII text, with escape sequences

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe something like this then:

tr -d '\t\n\r\040-\176\200-\376' < file.txt | head -c 1 | grep -q .

This should cover more cases than either approach. with escape sequences only checks for one additional control character.

I would extract this to a new top-level function, also so that it can be overridden from configurations if needed.

Comment on lines -223 to +288
while [[ -z "${args[-1]}" ]]
while [[ "${#args[*]}" -gt 0 && -z "${args[-1]}" ]]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense; alternatively, it could also be moved inside the if [[ -n $func ]] block.

@mmirate
Copy link
Contributor Author

mmirate commented Nov 17, 2025

I know that adding a new command-line switch for this is suboptimal because it clutters the program's API surface. What about a new helper that manipulates a global variable?

Agreed; I would also imagine that some control on a path-by-path basis would be useful. Perhaps something in the shape of SetSaveMethod '*' auto?

I like it! It will take me a bit more time but it seems worthwhile to me. Bikeshed colors: the status quo would be called "copy" and my new functionality would be called "auto"?

@CyberShadow
Copy link
Owner

Bikeshed colors: the status quo would be called "copy" and my new functionality would be called "auto"?

Works for me!

Milo Mirate and others added 3 commits November 17, 2025 21:49
Making `grep` accept \xFF style escape sequences requires using its PCRE
engine, which for all I know is less efficient. `awk`, by contrast,
can process these entirely within POSIX-compliant form.

Errata: maybe I should have `grep -qI` take a second pass over the file
in case it contains broken UTF-8.
@CyberShadow
Copy link
Owner

It will take me a bit more time but it seems worthwhile to me.

Happy to pick this up from here, BTW.

@mmirate
Copy link
Contributor Author

mmirate commented Nov 20, 2025

It seems that this feature is somehow behaving differently under make ci versus under make testsuite-integration (which, unfortunately, I am unable to run on my machine - it hits a permission error trying to mkdir). In test/t/t-1_save-2_files-2_modified-2_skipchecksums_hitdiffsize.sh, make ci saves what I wrote - a printf to a CreateFile - but make testsuite-integration saves a patch of a GetPackageOriginalFile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants