-
-
Notifications
You must be signed in to change notification settings - Fork 8
Support Categorical Values directly #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Attempting to construct the minimal object: julia> using CategoricalArrays, OneHotArrays
julia> cv = CategoricalArrays.CategoricalValue('b', CategoricalArray('a':'z'))
CategoricalValue{Char, UInt32} 'b'
julia> dump(cv)
CategoricalValue{Char, UInt32}
pool: CategoricalPool{Char, UInt32, CategoricalValue{Char, UInt32}}
levels: Array{Char}((26,))
1: Char 'a'
2: Char 'b'
3: Char 'c'
4: Char 'd'
5: Char 'e'
...
22: Char 'v'
23: Char 'w'
24: Char 'x'
25: Char 'y'
26: Char 'z'
invindex: Dict{Char, UInt32}
slots: Memory{UInt8}
length: Int64 64
ptr: Ptr{Nothing} @0x0000000160607020
...
julia> cv.pool.levels
26-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
...
julia> Int(cv.ref), length(cv.pool.levels)
(2, 26)
julia> OneHotArrays.onehot(cv::CategoricalValue) = OneHotVector(cv.ref, length(cv.pool.levels))
julia> onehot(cv)
26-element OneHotVector(::UInt32) with eltype Bool:
⋅
1
⋅
⋅
⋅
⋅
...
julia> dump(onehot(cv))
OneHotVector{UInt32}
indices: UInt32 0x00000002
nlabels: Int64 26 Are these two integers all that's required, or are there more complicated examples? |
I think this is all, but I am not an expert on CategoricalArrays |
See #54 for a start. Probably need someone to come up with a list of CategoricalArrays examples worth testing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation and description
In Data Science
CategoricalArrays.CategoricalValue
orCategoricalArrays.CategoricalVector
and the like appear often. (RDatasets loads DataFrames with columns of that type by default).It would be great if onehotbatch could simply be applied on this.
I just came to this package, still figuring out how to transform such a Categorical Value/Vector into onehot Vector/Matrix... It is very possible that I missed something obvious
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: