Work Log Day 34

Let's attempt to guess espresso pull time using images of coffee grinds. Saving coffee with CNN magic.


Work Log Day 34

Despite having stared intently at my screen for the past three days trying all sorts of different things to improve the model I have continued to be stuck around an MSE of 140. I suspect there is something that could be done with the fundamental model, if only resources regarding to CNNs have almost entirely to do with classifying.

I first played around with batch sizes, despite that being something I should do later. In part because I was curious about the impact of it on my model as it does have performance impact. Larger batch sizes run faster (to a point) while smaller batches run slower. In my tests the models converge slower with a large batch size, but doesn’t seem to have any improvement to the validation loss. So off I went in search of another silver bullet to make this model as great as I want it to be.

Reading through the keras docs I noticed that you could provide sample weights. This intrigued me as I obviously have very lop-sided data due to me consuming most shots of coffee around 30s pull times. My hope had been that while it might not reduce the MSE overall, it might result in more ‘distinct’ outputs from the model. Right now the difference between an image of grinds that would pull at 7s generally gets estimates between 16 and 35 seconds, while shots above 50 often are in the 20 to 35 second range. This seems consistent across my models as well, so I don’t think creating an ensemble that gets the mean is going to do much either. Right now I am testing to see whether using the validation weights as the training weights or if using different weights for each is the way to go. I am going to train some models over night and see what happens, fingers crossed that it will just work and I might have similar error but a nice +/- 5 band around the actual predictions.

The next thing to do, besides collect a ton more data, is to try implementing this. It seems like an effective way of exploring local minimas and based on requiring a small learning rate (1e-4) to get reasonable results, should be exploring quite a bit of space.