Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): add markov model function and feistel function for obfuscator #17437

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Feb 11, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Migrate the obfuscator tool from ClickHouse
usage example: 02_0000_function_markov.test

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Feb 11, 2025
@forsaken628 forsaken628 changed the title feat(query): add markov model function for string obfuscator feat(query): add markov model function and feistel function for obfuscator Feb 13, 2025
@forsaken628
Copy link
Collaborator Author

forsaken628 commented Feb 13, 2025

benchmark:

select count(*) from lineitem; // 600037902
create or replace table lineitem_model as select markov_train(l_comment) as l_comment from lineitem; // finished in 553 seconds
select data_size from system.tables where name = 'lineitem_model'; // 416277

select 
lineitem.l_comment,
markov_generate(lineitem_model.l_comment,'{"order":5,"sliding_window_size":8}',0,lineitem.l_comment) g 
from (select l_comment from lineitem limit 10) lineitem, lineitem_model;

┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│                  l_comment                  │                         g                         │
│                    String                   │                  Nullable(String)                 │
├─────────────────────────────────────────────┼───────────────────────────────────────────────────┤
│ lar sheaves! sly asymptot                   │ aggle slyly final, bold pinto                     │
│ final pinto beans doze against the final, s │  fluffy the blithely unusual foxes. blithely      │
│ xpress theodolites. closely final t         │ s. fluffily even platelets. fluffily              │
│ haggle quickly. even theodolites wake caref │ encies! theodolites. depths wake. ironic dolphins │
│ . silent pinto bea                          │ nts. regular patterns                             │
│ uriously express                            │ c accounts cajole                                 │
│ e of the carefully unusu                    │ usly along the fluffily ironic                    │
│  realms. express instructions are beyond th │ s. furiously express solve. pending to beans      │
│ ckly alongside of the final m               │ gside of the unusual pinto beans                  │
│ y even ideas                                │  sleep. furiousl                                  │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant