Describe the bug
When loading a custom dataset that has been pre split into three parts (not by recbole) we can load the data via the config file by setting benchmark parameter (see https://recbole.io/docs/user_guide/config/data_settings.html#benchmark-file). When doing so (windows at least), the files are not found.
I traced the issue to the configurator file (see lib\site-packages\recbole\config\configurator.py), line 335-338.
The current code is checking if a custom data set is configured (yes, thats case), then continues in the else statement. There it joins the OS path with the dataset name and sets it as the final "data_path" in the config dict.
else:
self.final_config_dict["data_path"] = os.path.join(
self.final_config_dict["data_path"], self.dataset
)
Later in dataset.py (around line 310) it checks if the directory exists, but this turns out to be set as the directory joined with the name of the dataset. Which of course doesn't exist.
for filename in self.benchmark_filename_list:
file_path = os.path.join(dataset_path, f"{token}.{filename}.inter")
# print(f"WRONG FILEPATH = {file_path}")
if os.path.isfile(file_path): # doesn't pass this line....!
temp = self._load_feat(file_path, FeatureSource.INTERACTION)
a similar issue arises with a normal data set. I think the lines in configurator.py 335-338 need to set a directory path instead of directory + dataset name.
To Reproduce
Steps to reproduce the behavior:
config_dict = {
'model': 'UserKNN',
'dataset': 'CustomDataSet',
'data_path': 'load_data', # tried multi variants e.g. full path 'C:/<location of python env>/load_data'
'benchmark_filename': ['train','valid','test'],
'USER_ID_FIELD': 'user_id',
'ITEM_ID_FIELD': 'item_id',
'RATING_FIELD': 'rating',
'TIME_FIELD': 'time_field',
'load_col': {'inter': ['user_id', 'item_id', 'rating']},
'eval_setting': 'RO_RS,split',
'split_ratio': [0.8, 0.1, 0.1],
'metrics': ['mrr', 'precision', 'recall', 'ndcg', 'map'],
'valid_metrics': 'MRR@10',
'n_neighbors': 2,
'similarity_type': 'cosine',
'normalization': 'z-score',
'train_batch_size': 4096,
'epochs': 1,
'seed': 42,
}
run_recbole(model='BPR', config_dict=config_dict2)
Desktop (please complete the following information):
- OS: Windows 11
- RecBole Version 1.2.1
- Python Version 9.9.23
- PyTorch Version 2.8.0
Describe the bug
When loading a custom dataset that has been pre split into three parts (not by recbole) we can load the data via the config file by setting benchmark parameter (see https://recbole.io/docs/user_guide/config/data_settings.html#benchmark-file). When doing so (windows at least), the files are not found.
I traced the issue to the configurator file (see lib\site-packages\recbole\config\configurator.py), line 335-338.
The current code is checking if a custom data set is configured (yes, thats case), then continues in the else statement. There it joins the OS path with the dataset name and sets it as the final "data_path" in the config dict.
Later in dataset.py (around line 310) it checks if the directory exists, but this turns out to be set as the directory joined with the name of the dataset. Which of course doesn't exist.
a similar issue arises with a normal data set. I think the lines in configurator.py 335-338 need to set a directory path instead of directory + dataset name.
To Reproduce
Steps to reproduce the behavior:
Desktop (please complete the following information):