Skip to content

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Sep 8, 2025

Adds an initial set of expression, markup, and message attribute definitions.

The proposed attributes are drawn from:

As noted in the text, this is not intended as a final list, but as a starting point. The text is not being currently proposed to be normative, but we could change that later.

@eemeli eemeli added the Agenda+ Requested for upcoming teleconference label Sep 8, 2025
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. Lots of nit-picky comments.

Maybe a good question is: should these be directly incorporated? Or should all of these XLIFFy things be namespaced? Some of what XLIFF does doesn't apply to UMF messages and some of it would be much better on a message resource level (instead of cluttering up the message itself).


#### @translate

_Value:_ `yes` or `no`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate that yes is default?

Is there a reason attributes don't follow a similar structure to functions and their options here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've agreement that yes is the default. In fact, for expressions, I would think that the general default might in fact be no to indicate that a translator is not expected to make any changes to the expression.

Considering this a bit more, maybe something like translate=input or translate=|input,minimumFractionDigits| would be better? That would indicate which parts are expected to be translatable.

Copy link
Member

@aphillips aphillips Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is no when the attribute is not present, but yes when the attribute is present and has no value, right?

I don't like the values yes/no, but they are inherited from XLIFF (and its friends, such as ITS) and we should probably remain consistent with them (for portability at least)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a slightly different undrstanding of "default" than I'd had -- as in, the value that's applied if the attribute is not present at all.

I don't hate the yes/no as they're relatively legible and are perhaps easier to extend with other enum values than e.g. true/false would be. But as they're already in use by XLIFF, we should use the same values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that requiring explicit values is cleaner.
How hard is it to type =no (3 characters)?


translate=|input,minimumFractionDigits| would be better? > That would indicate which parts are expected to be translatable.

I think that such info does not belong here, it belongs in the function registry.

A while ago I even provided a list of l10n attributes to use for each function option (something like hide, read-only, enum, free-form). I can even think of more options.


Indicates whether or not the _markup_ and its contents can be re-ordered.

#### @comment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just permit the "global" attributes on markup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're repeating attributes defined above. Why not make those like @comment global to both expressions and markup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like an editorial fix we could apply later, if it does hold that the annotations continue to match on expressions and markup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bad idea for identically-named attributes to diverge. The sets aren't identical, of course.


#### @max-length

_Value:_ A strictly positive integer, followed by a space, followed by one of the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digit size option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's limited to max 99, and we need to allow for limits greater than that.

_Value:_ A strictly positive integer, followed by a space, followed by one of the following:
- `chars`
- `bytes`
- `lines`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good luck with this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, we should not include it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Measuring bytes will depend on some character encoding somewhere. Without an indication of the encoding (which this doesn't provide), there is no way to perform the measurement.

(FWIW, you're missing graphemes, which is another measurement (approximately "screen positions", but only approximately so).)

Lines depends on... font, font size, pixel width, line-breaking, hyphenation (insert more here) and are even harder to define that bytes.

Length limitations are a "fact of life" in localization, but badly defined mechanisms for them are not that helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be to leave out the units, and to let the implementation figure out what the limit means, something in the overlap of characters/code points/graphemes.

Co-authored-by: Addison Phillips <[email protected]>
@eemeli
Copy link
Collaborator Author

eemeli commented Sep 9, 2025

Maybe a good question is: should these be directly incorporated? Or should all of these XLIFFy things be namespaced? Some of what XLIFF does doesn't apply to UMF messages and some of it would be much better on a message resource level (instead of cluttering up the message itself).

During yesterday's call, @mihnita also expressed concern regarding cluttering up a message with multiple attributes. His thought was that it would often be preferable to attach a u:id to an expression or markup, and refer to that from a separate message-level block to attach attribute-y metadata to the relevant placeholder(s).

To me, this speaks of a need to have that capability also be well defined, so that it can be ergonomically done across resource formats. In other words, I think we need a JavaDoc-y syntax for message-level attributes.

@eemeli eemeli requested review from aphillips and mihnita September 9, 2025 09:47

Empty _messages_ SHOULD be accompanied by an explanatory `@comment`.

#### @max-length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a can of worms :-)

One might want two kinds os length limitations:

  • storage
    For example if you put the strings a in "traditional" database and you have a max size for the translations. Then you need the encoding of the string.
    So you "max 120 bytes as utf-8"

  • visual (for example using em)
    That is a can of worms.
    Because "m" is not the same width as "l" :-)
    And "AAAAVVVVV" is not the same width as "AVAVAVAV" (because of kerning).
    And ligatures, and complex script.
    To accurately measure anything you need the exact font, if it is monospaced or not, with the kerning table, ligatures, combining chars, etc.
    Even the font version might affect you.
    Then in some systems you can enable/disable opentype features.
    To measure multi-lines you need the max length of one line, if hyphenation is available, the exact hyphenation data + engine, if justification is set or not :-)


TLDR: I would leave it out for now


Identify the _functions_ and _markup_ supported by the _message_ formatter.

#### @source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really does not belong here!


Indicates whether the _message_ is translatable or not.

Some _messages_ may be required to have the same value in all locales.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then they are not messages that should be stored in resource bundles. They can very well be hard-coded.

A better use case is probably to encode info about locale sensitive behavior. For example the fact that the default order for a Contacts app should be first-name, except that Japanese, and a few others should be last name.

But that would not be MF2.

TLDR: I am not sure I see a good use case.


Some _messages_ may be required to have the same value in all locales.

#### @version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bend in long debates about mechanisms like this one.
It is controversial, so I would leave it out for now.

@@ -0,0 +1,233 @@
## Expression, Markup, and Message Attributes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I am not happy with the idea of storing all of this in the message proper.
This belongs in the storage, outside the message.

@janispritzkau
Copy link

janispritzkau commented Oct 13, 2025

I have a couple of questions and would like to share my understanding of the matter. In addition to the message resource standard, I’m considering how it could integrate with an in-context editing or translation tool.

Expression Attributes

These attributes may vary by locale, so it makes sense for them to be included within the message:

  • @comment
  • @term - Isn't @comment sufficient for this use case?
  • @example - This is really useful. It reminds me of OpenAPI, which lets you generate example queries.

For @translate, I think it should be consistent across locales. To me, it still makes more sense for it to be in the message rather than hardcoded. Perhaps it could be enforced through linters.

Markup Attributes

What exactly does the @comment attribute refer to in the markup context? Is it describing a particular use of the tag, or the type of tag itself, or the content between an opening/closing pair? If it's about the type of tag, perhaps the resource-level metadata (in Message Resource) would be a better place then it should go into a schema/registry.

The same with @term. What is it referring to?

Personally, I would drop these from the spec because they seem too application-specific: Do these attributes define what translators can or shouldn’t do during translation?

  • @can-copy
  • @can-delete
  • @can-overlap
  • @can-reorder

Message Attributes

I understand the flexibility of having messages with @translate=no in the resource bundle. However, it feels odd to duplicate such messages across all locales. If a base locale or locale-independent bundle exists, then this attribute would make more sense.

I haven't had time to think about the other message-related attributes yet, so that's all for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agenda+ Requested for upcoming teleconference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants