[Feature Request]: BOCU-1 for unicode multibyte chars text compression

Hello! How about adding BOCU-1 compression for Unicode multilingual messages?
Here is the description: [https://www.unicode.org/notes/tn6/](https://www.unicode.org/notes/tn6/)

Here are a few examples:

```
Это было моё первое принятое сообщение... 👩👨🧑👧🧒👨‍🦱👸🤶👮‍♀️🕵️‍♀️
input: 143 UTF-8 bytes
output: 90 BOCU-1 bytes
BOCU-1/UTF-8: 0.629371
```

```
Заменил антенну на 5-ягу. Если кто через нее ходит, дайте обратную связь
input: 129 UTF-8 bytes
output: 79 BOCU-1 bytes
BOCU-1/UTF-8: 0.612403
```

```
Доброго утра всем! 17,5 ° C и солнце 📡 )))
input: 68 UTF-8 bytes
output: 51 BOCU-1 bytes
BOCU-1/UTF-8: 0.750000
```

```
Погодка КАЙФ
input: 23 UTF-8 bytes
output: 13 BOCU-1 bytes
BOCU-1/UTF-8: 0.565217
```

```
Первый рабочий день после длинных выходных
input: 79 UTF-8 bytes
output: 43 BOCU-1 bytes
BOCU-1/UTF-8: 0.544304
```

You can download and test the Win32 console implementation here: [https://www.unicode.org/notes/tn6/bocu1.exe](https://www.unicode.org/notes/tn6/bocu1.exe)

Alternatively, there is the «UCF» encoding, which also resolves the issue of bloated file sizes caused by characters outside the a-z range: [https://github.com/hyoo-ru/mam_mol/tree/master/charset/ucf](https://github.com/hyoo-ru/mam_mol/tree/master/charset/ucf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: BOCU-1 for unicode multibyte chars text compression #2551

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request]: BOCU-1 for unicode multibyte chars text compression #2551

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions