Work Log Day 80
Data collection continues to be the focus of this project, as modifications to the network hasn’t provided any meaningful returns. The increase in the data has dropped the k-folds average right at 100 MSE. There is still the issue where the predictions are banded towards 15 seconds and 35 seconds. There is definitely similar to the distribution of data, where there is a significant lack of data on the upper limit (50-60 seconds) and on the lower end (10-25 seconds). However the amount of data available now has finally resulted in having a test set.
I have modified the k-folds logic in the training to hold back one of the folds as the test set. It is saddening to see data be removed from training and it does seem to have increased the floor of error (seen 85~ at the very best). At this point though it is important to be sure the models that are being generated are actually generalizing. And since the validation set is saved at the best value, which effectively makes it a secondary optimization process going on along side the training, need to compare it against something that wasn’t seen in the training process. This is a very basic aspect of ML, and also an incredibly important one.
On the other fronts of the project, I have begun tinkering with a web UI using OpenCV.js to see if I can make switching between prediction and collection easy. Nothing too sexy here, just trying to figure out how to integrate JS assets into the repo without being too bloated, intend to look at jupyter for inspiration on this front. The other aspect is that I need to revisit my coffee prep stats project and see if I can’t learn something about stats and coffee prep in the process.