The task description is:
You are provided with a dataset containing the characteristics of different mushrooms (mushrooms.csv), and are tasked with discovering whether a mushroom is poisonous (class=p) or edible (class=e).
You also have a dataset (mushrooms_validation.csv) where mushrooms are not labeled : run your algorithm on this dataset and provide a predicted_labels.csv file (keeping the indexes in mushrooms_validation.csv).
You will find all the data preparation, exploration, analytics, preprocessing and modeling on the provided dataset based on two approach:
-
Without considering the validation set to find the best model, useless features, etc.
-
Applying an amazing trick to reach the best result (concatening the two given dataset)