Work Log Day 96
The last ten days have been slower than I had hoped what with starting a new job, but it was expected. With a new job the bulk of the data collection is the coffee actually drunk, which generally is four cups a day. And the data ends up in the 30s range because… it needs to be drinkable. It will likely be necessary in the future to only collect images in the 30 second range for ‘novel’ or poorly predicted coffees. Thankfully the model is improving and I may finally be able to focus on the secondary model soon.
Sub 70 MSE model
After my machine churned away for a few days, I was luck enough to see my first sub 70 training result. Using the recently modified weighting, I was able to push the batch normalization changes I have been working on a bit further. Now it is only an improvement of ~4 compared to the un-weighted variant and the average k-folds performance is worse. It isn’t the specific performance that really matters as much as the continued improvement. I have begun to investigate my prep method data and what I have been learning is worrying. The variation of pull times with the ‘same’ grind seems relatively large, making me concerned that the model performance would plateau. Which if it had I would have pulled near 700 shots for a model that doesn’t seem to perform nearly well enough to be useful. Thankfully it seems to be improving, albeit slowly. It is worth pointing out, largely for myself, that this project began as a way to see if ML would do anything for this problem and not to actually build something usable. While I slowly plow through more coffee at the tails, will continue to focus on preparation of the puck and how to compare the different methods.
Puck Preparation Evaluation
If it hasn’t already become apparent, I care about how long my shots pull. It isn’t the perfect measure of a good cup of espresso by any mean, however starting at a 1:2 ratio pulled in 30 seconds I find most things drinkable and I can modify the recipe with little waste. And while I now have a ML model that can predict the shot time decently (within about eight seconds) I wanted to know what the ‘objectively’ best preparation method is to provide the smallest variation in shot pull time. Based on the little evaluations I have done it seems that some methods can vary up to 15 seconds (WDT or DT) compared to the ‘null model’ of just tamping the coffee which varies by 10 seconds. Now these numbers seem particularly high, which I attribute to the coffee. Many coffees I typically drink have a much tighter band, however I care about the general case and not the specific.
The most interesting thing I have come away with is that the distribution doesn’t seem normal. This is both irritating and worrying. Normal distributions are easy, you take the mean and the standard deviation and you go to town making claims about your data. One test I have seen used previous is ANOVA because it is simple and effective and at the time it seemed reasonable because shots are normally distributed right? Because the shot times don’t seem normally distributed I will need to determine the best central tendency measure and then compare no prep to OCD (V2) set at a depth of 9 to determine which is ‘better’. The determination of ‘better’ in this case is likely to be that the shot variance is different by at least 1 second, otherwise I need a different metric by which to compare them.
Refer to the following Preparation Evaluation Notebook for all of the gory stats details. And thanks to Paul Hawkins for helping me with the stats and forcing me to learn these strange art.