Skip to content

Conversation

dscho
Copy link
Contributor

@dscho dscho commented Mar 10, 2025

Since I offered to help with rebasing git/git-scm.com#1943, I figured I should give it a quick try, to see how hard it would be.

Narrator's voice: It was hard. Very.

When I saw just how involved it would be, I didn't want to just look from the peanut gallery but offer my assistance. And while I have literally no knowledge whatsoever of Farsi, I know other languages, such as Javascript. So I wrote a node.js script to help rebase the patches.

This is the script I used.
const fs = require('fs')
const child_process = require('child_process')

const git = (...arguments) => {
  const result = child_process.spawnSync('git', arguments)
  if (result.error) throw gpgDecrypt.error
  if (result.status !== 0) {
    const quotedArgs = arguments.map(
      arg => arg.match(/[ ']/)
      ? `'${arg.replace(/'/g, "'\\''")}'`
      : arg
    )
    throw new Error(`\`git ${quotedArgs.join(' ')}\` failed(${result.status}): ${result.stderr}`)
  }
  return result.stdout.toString('utf8').trim()
}

const guessFile = (needle) => {
  const file = git('grep', '-F', '-l', needle)
  if (file.includes('\n')) throw new Error(`Looking for ${needle} turned up multiple files:\n${file}`)
  return file
}

const rebasePatch = async (patch) => {
fs.writeFileSync('a1.patch', patch)
  const lines = patch.split('\n').filter(line => line !== '\\ No newline at end of file')
  if (!lines[0].match(/^From [0-9a-f]{40}/)) throw new Error(`Not a Git patch: starts with: '${lines[0]}'`)
  let author
  let date
  let message = []
  let i
  for (i = 1; i < lines.length && !lines[i].startsWith('diff'); i++) {
    if (lines[i].startsWith('From: ')) author = lines[i].slice(6)
    else if (lines[i].startsWith('Date: ')) date = lines[i].slice(6)
    else if (lines[i].startsWith('Subject: ')) {
      let subject = lines[i].slice(9).replace(/^\[PATCH\] */, '')
      while (i + 1 < lines.length && lines[i + 1].startsWith(' ')) subject += lines[++i]
      message.push(subject)
    } else if (lines[i] === '') {
      message.push('')
      break
    } else console.error(`warning: unrecognized header line '${lines[i]}'`)
  }

  console.error(`Parsing ${message[0]}`)
  while (i < lines.length && lines[i] !== '---' && !lines[i].startsWith('diff')) message.push(lines[i++])
  while (i < lines.length && !lines[i].startsWith('diff')) i++

  const targetFiles = []
  let targetFile
  let targetContent

  while (i < lines.length) {
    if (i < lines.length && lines[i].startsWith('diff ')) {
      i++
      // skip ---/+++ lines
      while (i < lines.length && !lines[i].startsWith('@')) i++
    }
    while (i < lines.length && !lines[i].match(/^[-+]/)) i++

    const minus = []
    const plus = []

    while (i < lines.length && lines[i].startsWith('-')) minus.push(lines[i++].slice(1))
    while (i < lines.length && lines[i].startsWith('+')) plus.push(lines[i++].slice(1))

    const joinLinesAndSplitAtTags =
      array => array
        .join('\n')
        .replace(/<code>([^<]*)<\/code>/g, "`$1`")
        .replace(/<pre class="highlight"><code class="language-([^"]*)"[ \n]+data-lang="\1">/g, '[source,$1]\n----\n')
        .replace(/<\/code><\/pre>/g, '\n----')
        .replace(/<a[ \n]+href="{{< relurl " *book[^"]*\/([^"#]*)"[ \n]*>}}">[^<]*<\/a>/g, "<<$1#$1>>")
        .replace(/(\[remote rejected\] master -)&gt;/g, "$1>")
        .replace(/<a[\n ]+href/g, '<a href')
        .replace(/<img src="{{< relurl " *book[^"]*\/(images\/[^"]*)" >}}" alt="([^>]*)">[ \n]*<\/div>[ \n]*<div class="title">[^.<]*\. ([^<]*)/g, '.$3\nimage::$1[$2]')
        .replace(/ &gt; (LAST_COMMIT)/, ' > $1')
        .replace(/<em>([^<]*)<\/em>/g, '_$1_')
        .replace(//g, ' -- ')
        .replace('"git reset HEAD &lt;file&gt;..."', '"git reset HEAD <file>..."')
        .replace(/(mergetool\.)&lt;(tool)&gt;\./g, '$1<$2>.')
        .replace(/(--tool=)&lt;(tool)&gt;/, '$1<$2>')
	.replace(/<div id="nav"><a href="{{< previous-section >}}">[^>]*<\/a> | <a href="{{< next-section >}}">[^>]*<\/a><\/div>/g, '')
	.replace(/(<a href="[^">]*")\n *(class="bare")/g, '$1 $2')
	.replace(/(Author: [^>\n]+ )&lt;([^&]*)&gt;/g, '$1<$2>')
        .split(/(\s*<(?!schacon)[^>]*(?!<>)>(?!>)\s*)/)
    const en = joinLinesAndSplitAtTags(minus)
    const fa = joinLinesAndSplitAtTags(plus)

    const sanitize = (line) => line
      .trim()
      // .replace(/({{< relurl ") *([^"]*")\s*(>)+/, '$1$2$3')
      .replace(/\s+(data-lang=")/, ' $1')
      .replace(//g, "'")
      .replace(//g, '``')
      .replace(//g, "''")

    let a = 0
    let b = 0
    while (a < en.length && b < fa.length) {
      let enLine = sanitize(en[a])
      let faLine = sanitize(fa[b])

      if (enLine === faLine) {
        a++
        b++
        continue
      }
      if (faLine.match(/^\s*<div dir="rtl">\s*$/) && b + 1 < fa.length && sanitize(fa[b + 1]) === '') {
        b += 2
        continue
      }
      if ((a % 2) === 0 && (b % 2) === 0) {
        /* if (
          b + 3 < fa.length
          && sanitize(fa[b + 1]) === '<code>'
          && (a + 1 >= en.length || sanitize(en[a + 1]) !== '<code>')
          && sanitize(fa[b + 3]) === '</code>'
        ) {
          faLine = sanitize(fa.slice(b, b + 5).join(''))
          b += 4
        } */
        if (enLine !== '' || faLine !== '') {
          // translate
          const needle = enLine.replace(/^\[source,console\]\n----\n/, '').replace(/\n[^]*/, '')
          if (targetFile === undefined) {
            if (enLine === 'Summary' && minus.join('\n').includes('covered most of the major ways')) {
	      targetFile = 'ch08-customizing-git.asc'
            } else targetFile = guessFile(needle)
            targetContent = fs.readFileSync(targetFile, 'utf8')
          }
          let found = targetContent.indexOf(enLine)
          if (found < 0) {
            for (const e of [{
              pattern: '_an_example_git_enforced_policy#_an_example_git_enforced_policy',
              replacement: 'ch08-customizing-git#_an_example_git_enforced_policy',
            }, {
              pattern: 'filters_a#filters_a',
              regex: /(filters_[ab])#\1/g,
              replacement: '$1',
            }, {
              pattern: '_signing#_signing',
              replacement: 'ch07-git-tools#_signing',
            }, {
              pattern: '_ignoring#_ignoring',
              replacement: 'ch02-git-basics-chapter#_ignoring',
            }, {
              pattern: '"&lt;input&gt;"="&lt;output&gt;"',
	      regex: /&lt;((in|out)put)&gt;/g,
	      replacement: '<$1>',
            }, {
              pattern: '_p4_git_fusion#_p4_git_fusion',
	      replacement: '_p4_git_fusion',
            }, {
              pattern: '_git_p4_branches#_git_p4_branches',
	      replacement: '_git_p4_branches',
	    }, {
	      pattern: '\nYou can use `git filter-branch` to remove',
	      replacement: '(((git commands, filter-branch)))\nYou can use `git filter-branch` to remove',
	    }, {
	      pattern: 'the _User_ column (the 2nd one)',
	      replacement: "the 'User' column (the 2nd one)",
	    }, {
	      pattern: '-unified=&lt;n&gt;',
	      regex: /(-u(nified=)?)&lt;n&gt;/g,
	      replacement: '$1<n>',
	    }, {
	      pattern: 'It is invoked like `$GIT_SSH',
	      regex: /&lt;([^&]*)&gt;/g,
	      replacement: '<$1>',
	    }, {
	      pattern: '_revision_selection#_revision_selection',
	      replacement: 'ch07-git-tools#_revision_selection',
	    }, {
	      pattern: '_credential_caching#_credential_caching',
	      replacement: 'ch07-git-tools#_credential_caching',
            }]) {
              if (!enLine.includes(e.pattern)) continue
              const candidate = enLine.replace(e.regex || e.pattern, e.replacement)
              found = targetContent.indexOf(candidate)
              if (found >= 0) {
                enLine = candidate
                break
              }
            }
          }
          if (found < 0) {
            console.error(`Could not find ${enLine} in ${targetFile}; looking harder`)
            fs.writeFileSync(targetFile, targetContent)
            targetFiles.push(targetFile)
	    if (enLine === 'Subversion' && en[a - 1] === '<h3 id="_subversion">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-svn.asc'
	    } else if (enLine === 'Mercurial' && en[a - 1] === '<h3 id="_mercurial">') {
	      targetFile = 'book/09-git-and-other-scms/sections/client-hg.asc'
	    } else if (enLine === 'Bazaar' && en[a - 1] === '<h3 id="_bazaar">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-bzr.asc'
	    } else if (enLine === 'Perforce' && en[a - 1] === '<h3 id="_perforce_import">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-p4.asc'
            } else targetFile = guessFile(needle)
            targetContent = fs.readFileSync(targetFile, 'utf8')
            found = targetContent.indexOf(enLine)
          }
          if (found < 0) throw new Error(`Could not find '${enLine}'`)
          targetContent = `${targetContent.slice(0, found)}${faLine}${targetContent.slice(found + enLine.length)}`
        }

        a++
        b++
        continue
      }
      throw new Error(`Stopped at a: ${a}, b: ${b}\n'${en.slice(a, a + 10).join('')}'\nvs\n'${fa.slice(b, b + 10).join('')}'`)
    }
  }

  if (!targetFile) throw new Error(`Could not find any edits in ${patch}`)
  fs.writeFileSync(targetFile, targetContent)
  targetFiles.push(targetFile)

  git('commit', '-m', message.join('\n'), `--author=${author}`, `--date=${date}`, '--', ...targetFiles)
  console.log(`Committed ${message[0]}`)
}

(async () => {
  for (let i = 7; i >= 0; i--) {
    const patch = await fetch(`https://github.com/git/git-scm.com/commit/dfd9553ba76c1b11aa978ef99a7dfc944bfb36c7~${i}.patch`)
    await rebasePatch(await patch.text())
  }
})().catch(e => { throw e })

True to form, as a one-time hack, it lacks pretty much all of the documentation.

As one might guess, I started out with something straight-forward: parse the diffs, ignoring the HTML tags, trying to let the script figure out automatically what text snippets should be replaced with what other text snippets.

However, some of the HTML -- even between HTML tags -- needed to be "back-converted" to AsciiDoc. So I added that.

From there, I worked my way through the exceptions to that rule, and there were tons.

The high-level overview of the script is that the loop at the end of the script tries to fetch the commits as patches, then calls rebasePatch(), which parses first the header (to learn the metadata that will later be used to create the commit), then parses the diff to obtain minimal pre-/post-images, then transforms those to look a lot more like AsciiDoc than HTML, then splits by HTML tags, then iterates over the parts between the HTML tags (verifying that the HTML tags are identical between English and Farsi). For the parts between the HTML tags that differ between English and Farsi, the script uses git grep -F to figure out which file needs to be edited, then finds the respective location where the English text (= "pre-image") is located, and replaces it with the Farsi text. In this part, there are quite a few hacks related to my reluctance to replace &lt;/&gt; wholesale, and there are quite a few hacks due to the {{< relurl ... >}} links no longer necessarily having all the information to recreate the AsciiDoc <<...>> references.

The last commit of git/git-scm.com#1943, git/git-scm.com@dfd9553, is not even applicable because the AsciiDoc references do not have the link text, and neither do they have full links.

Now, @YasinDehfuli I hope that this here PR is useful in some shape or form and does not cause more work than it took to craft.

@YasinDehfuli
Copy link
Collaborator

Of course, dear @dscho.

This was a very professional and interesting move, and I truly appreciate it.

My initial review of your translation was quite good. However, the Persian translation had some structural issues that needed correction. I'll edit them to ensure a flawless and accurate translation.

The Iranian open-source community will be grateful to you.

@dscho
Copy link
Contributor Author

dscho commented Mar 11, 2025

the Persian translation had some structural issues that needed correction. I'll edit them to ensure a flawless and accurate translation.

As long as I did not cause more work for you, I'm happy!

Copy link
Contributor Author

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot claim to understand Farsi, and I lack permissions to approve the PR, but I'd say: ship it!

@YasinDehfuli
Copy link
Collaborator

Dear friends, the revisions and translation of this pull request have been completed.

@jnavila Please check it and let me know if there are no structural issues and the code is working fine, so we can merge the pull request. Or you can merge it yourself.

@dscho, you can correct the untranslated sections or those with translation issues in the same way we proceeded. Let's resolve the issues and apply the changes.

I am available, and you can tag me for translations of any pull requests.

@YasinDehfuli
Copy link
Collaborator

YasinDehfuli commented Apr 19, 2025

Dear @dscho ,
I’ve noticed an issue in the Persian translation that can be easily fixed, and resolving it could significantly improve the fluency and quality of the Persian text.

As we know, Persian—like Arabic—is a right-to-left (RTL) language, but currently on the GitHub site, the layout is left-to-right (LTR).

This problem can be solved quite easily, and we can also handle the RTL formatting in our future translations.

Just needs to change #content direction from this
image

to dir="rtl"

image

or we can handle it in out files

@YasinDehfuli YasinDehfuli merged commit ce1c1d8 into progit:master May 11, 2025
@dscho
Copy link
Contributor Author

dscho commented May 12, 2025

Just needs to change #content direction from this image

to dir="rtl"

Hmm. I don't know how we would do that... I guess you mean to add that dir="rtl" to

https://github.com/git/git-scm.com/blob/caa90239ea0a597e3f2a0a65d2f9181b7a1a5a34/layouts/_default/baseof.html#L96

for Persian, e.g. like this?

-        <div id="content">
+        <div id="content"{{ if eq "fa" .Params.book.language_code }} dir="rtl"{{ end }}>

@dscho dscho deleted the rebase-git-scm.com-pr-1943 branch May 12, 2025 13:05
dscho added a commit to dscho/git-scm.com that referenced this pull request May 12, 2025
The Persian translation needs to be rendered right-to-left, as suggested
in progit/progit2-fa#1 (comment).

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Contributor Author

dscho commented May 12, 2025

I guess you mean to add that dir="rtl" to

https://github.com/git/git-scm.com/blob/caa90239ea0a597e3f2a0a65d2f9181b7a1a5a34/layouts/_default/baseof.html#L96

for Persian, e.g. like this?

-        <div id="content">
+        <div id="content"{{ if eq "fa" .Params.book.language_code }} dir="rtl"{{ end }}>

Draft PR for this: git/git-scm.com#2003. I also pushed it to my fork, so that https://dscho.github.io/git-scm.com/book/fa/v2 should show the effect once the deployment finishes.

@YasinDehfuli
Copy link
Collaborator

Greate.

image

it works successfuly.

now just need to translate other sections...

image

dscho added a commit to git/git-scm.com that referenced this pull request May 13, 2025
The Persian translation needs to be rendered right-to-left, as suggested
in progit/progit2-fa#1 (comment).

Signed-off-by: Johannes Schindelin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants