Feedback no. 3
One salient feature of the Backblaze dataset is that the distribution of vendors in the data is neither uniform nor exhaustive. For example, seagate comprises ~70% of data, HGST comprises ~15%, Intel drive data is absent, etc. Also, our initial assumption was that SMART metrics may behave differently for different vendors. Therefore in the current forecasting notebook, models are trained vendor-wise. However, the distribution of vendors across Ceph users is likely different and we want to support all of those vendors.
As a data scientist, I want to explore how "transferable" forecasting models are, across vendors. That is, how is performance affected when a model is trained on data from one vendor and evaluated on data from another one.
Acceptance criteria:
Feedback no. 3
One salient feature of the Backblaze dataset is that the distribution of vendors in the data is neither uniform nor exhaustive. For example, seagate comprises ~70% of data, HGST comprises ~15%, Intel drive data is absent, etc. Also, our initial assumption was that SMART metrics may behave differently for different vendors. Therefore in the current forecasting notebook, models are trained vendor-wise. However, the distribution of vendors across Ceph users is likely different and we want to support all of those vendors.
As a data scientist, I want to explore how "transferable" forecasting models are, across vendors. That is, how is performance affected when a model is trained on data from one vendor and evaluated on data from another one.
Acceptance criteria: