Work Log Day 246

Let's attempt to guess espresso pull time using images of coffee grinds. Saving coffee with CNN magic.


Work Log Day 246

It has now been eight months since this began, and there are somewhere around 16000 images collected for almost 1000 shots (another 300 or so don’t have images for one reason or other) of espresso. The progress has inconsistent to say the least, with false starts (much denser networks, duplicate images in train/validation), disappointments (novel grinder destroyed performance) and general befuddlement (jump in prediction values around 25 seconds). The amount of time I have devoted to tweaking the model has fallen off dramatically, instead preferring to collect data on my daily consumption of espresso. After a few months I decided to take a look at how the most recent model (uploaded on July 4th) compares to the older models. And before I do that, there are some minor project updates I would like to cover.

New Things Make Things Worse

A few months ago a friend had upgraded from a Kafatek Monolith Flat to a Kafatek Monolith Conical V4, and in doing so was generous to sell me the flat. A titan class flat grinder had long been on my list of things to get as lots of excitement for flat grinders in the community, my Vario-W was wearing out (hard to invest a lot of time into fixing a rarely used grinder, and has also moved along) and to have images from a titan class flat grinder.

Very quickly I found the model to perform poorly on the grinds from the flat. Even by visual inspection I was surprised by how different the results were. The grinds were ‘fluffier’ than they were with the conical monolith. Perhaps this an indication that prediction of shot pull times from images of coffee grounds is not generalizable, though for now I take it as an indication that I need to collect data across a wider variety of grinders. Hopefully by adding this flat grinder I will cover the breadth of home grinders, though that is likely an exaggeration until there is a vastly larger number of images (100k or more) to train against. There is also the fact that WDT/RDT seems much more crucial with the flat, so I fear I may not be able to achieve longer pull times with the flat without significant channeling, which is bad data, not to mention messy.

Another addition I added to my personal setup that I haven’t incorporated into the data collection is a WDT device. This hasn’t been entered into the data collection as it has a definite impact on pull time, with some shots being 10 seconds longer when WDT was used. For now I am leaving this question aside, in part because I dislike the additional step WDT adds to my work flow. As much as possible I want to simplify my work flow to get reliable shots, though I know that is not necessarily where espresso as a whole is headed so I continue to think about how to incorporate this into the model. Perhaps simplify adding in some categories that could provide a constant bias to the predictions would be enough?

The grinder setup is getting out of control

Looking Back At Historical Models

For months now I have battled with getting the model to perform with a MSE under 70. Despite having built models that hit this target, they fail to generalize to new data. And so I began to wonder if I was making any progress or if I was just dancing around the same performance and treating noise in the training outcome as more meaningful than it is. I decided to look back at the models I had uploaded and compare them to the test set I have been using to evaluate my new models.

Thankfully it seems that there has been steady improvement in the models, appearing to develop more generalization as the amount of data increases. Now the test data I use in the following graph is sadly not something I have had across all of these models, so some of the models will have been trained on the test data. Despite this, none of the older models seem to perform fantastically well on the set, which is probably a good sign.

Early Model

This model was built off probably 100 data points, back when I could get fantastic values thanks to over fitting.

Good thing I didn't give up right then

First Model

Okay so it wasn’t really the first, it just happened to be the first I had some amount of faith (now I see that it was blind faith) in.

At least the bias in the model seems to be doing something

Fifth Model

A relatively small model that was trained on probably closer to 7k images. Start to see some actual trend on the lower end, but not much. Bias seems to be better off here, actually hitting something closer to the mid point of the data.

A tiny bit more signal

Ninth Model

The ninth model went for a significantly denser model, with a larger CNN layer at the end as well as more dense layers. This shows a much reduced error and improvement in the rank ordering (kendtall tau/R^2), though still noisy at MSE of 122. The increase in error I largely attribute to the flat grinder and the images being unlike data otherwise see in the training set.

Worse then random rank ordering, but not much worse

Tenth and Current Model

The current model that was trained at the start of this month is doing relatively well considering 64 shots since that cover almost the entire range of shot times. This model was made less dense with the idea that a less dense model should fit smaller datasets, which sadly the current training set still constitutes.

Seems much more reliable, but over fitting is a dangerous beast

Where Can This Take Us?

Based on some responses I have seen to this project by the community, I am in a minority in caring about predicting pull times. I understand the thought behind this, which is if you track your shots then it is easy to return to a reasonable grinder setting. As this was my first successful attempt to track data, I had not realized the value until now. I will say that I have found this handy regardless, being able pull coffees with seemingly (don’t have a great way to evaluate this) less waste. I must ask if this is only useful to me, whats the point?

Transfer learning, that is the point. Or so I hope. If this model is able to make relatively reliable predictions of pull time, the hope is that using transfer learning (reusing the weights of one model to create a new one) models for prediction of more ‘valuable’ values such as TDS, grind distribution or perhaps even recipe generation. While there are already reliable ways to do all of these, I don’t find the current solutions to be nearly as easy use as just taking a photo (Refractometer, coffee grind site, ‘expert knowledge’).