GSoC 2025 - Using AI for Unicode Spoof Detection #6212
RaidedCluster
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
I'd applied for GSoC 2024 for the same Project Idea.
I'm a sophomore and I've also been doing some AI safety stuff at Anthropic & the UK AISI.
The last time I applied, I was working on a paper that illustrated how confusables and non-standard Unicode characters impact LLMs.
The paper's done now, but the hunt for confusables is still on.
https://arxiv.org/abs/2405.14490
This was my last year's proposal.
https://drive.google.com/file/d/1pPbhmrMwegTn5kNJUzC0rRulUzAfis3X/view?usp=sharing
The past year has really helped me to dwell more on how vast Unicode is and I've come across clusters of confusables, that I glossed over because of tofu 𓊪 𓊪 𓊪(oops spilled some).
Looking back, there were a lot of drawbacks in this approach.
I'll correct them in my new proposal.
Beta Was this translation helpful? Give feedback.
All reactions