Hi,
as mentioned in the title I am getting very different results when guessing data type, space, dims, and chunks with respect to passing directly an object to the create_dataset method. I attempted to identify my mistakes going through the method code but with poor results.
For example, if I do the following:
x <- data.frame(a=factor(letters[runif(n = 1E5,1,10)]),b=factor(letters[runif(n = 1E5,1,10)]))
dims <- hdf5r::guess_dim(x)
dtype <- hdf5r::guess_dtype(x,scalar=FALSE,string_len = Inf,ds_dim = dims)
nelem <- hdf5r::guess_nelem(x,dtype = dtype)
chunk_dims <- hdf5r::guess_chunks(space_maxdims = dims, dtype_size = nelem)
space <- hdf5r::guess_space(x,dtype = dtype,chunked = TRUE)
I get
> dims
[1] 100000
> dtype
Class: H5T_COMPOUND
Datatype: H5T_COMPOUND {
H5T_ENUM {
undefined integer;
"a" 1;
"b" 2;
"c" 3;
"d" 4;
"e" 5;
"f" 6;
"g" 7;
"h" 8;
"i" 9;
} "a" : 0;
H5T_ENUM {
undefined integer;
"a" 1;
"b" 2;
"c" 3;
"d" 4;
"e" 5;
"f" 6;
"g" 7;
"h" 8;
"i" 9;
} "b" : 1;
}
> nelem
[1] 100000
> chunk_dims
[1] 0
> space
Class: H5S
Type: Simple
Dims: 100000
Maxdims: Inf
Clearly these results are going to cause troubles if I pass them to the method.... in particular, shouldn't be nelem give 2 instead of the number of rows of the data.frame?
On the contrary, if I do:
test <- hdf5r.Extra::h5TryOpen(tempfile(),'w')
test$create_dataset('auto',x)
test[['auto']]
more correctly I get:
Datatype: H5T_COMPOUND {
H5T_ENUM {
H5T_STD_U8LE;
"a" 1;
"b" 2;
"c" 3;
"d" 4;
"e" 5;
"f" 6;
"g" 7;
"h" 8;
"i" 9;
} "a" : 0;
H5T_ENUM {
H5T_STD_U8LE;
"a" 1;
"b" 2;
"c" 3;
"d" 4;
"e" 5;
"f" 6;
"g" 7;
"h" 8;
"i" 9;
} "b" : 1;
}
Space: Type=Simple Dims=100000 Maxdims=Inf
Chunk: 4096
I would appreaciate if you have any suggestion on how to reproduce the direct call results using instead the guessing path.
thank you in advance,
Luigi
Hi,
as mentioned in the title I am getting very different results when guessing data type, space, dims, and chunks with respect to passing directly an object to the create_dataset method. I attempted to identify my mistakes going through the method code but with poor results.
For example, if I do the following:
I get
Clearly these results are going to cause troubles if I pass them to the method.... in particular, shouldn't be nelem give 2 instead of the number of rows of the data.frame?
On the contrary, if I do:
more correctly I get:
I would appreaciate if you have any suggestion on how to reproduce the direct call results using instead the guessing path.
thank you in advance,
Luigi