Work Log Day 3
Finally beginning to understand what someone once said about machine learning being a pile of statistics that you poke until it gives you the right answer. I have managed to poke my model a few times today, with minor success.
The first thing I looked at was my pooling layers. They simply reduce its input by simple function. The code I had taken from here was using Max pooling. This simply meant the maximum value of the cells that fit within the 2x2 ‘pool’ was sent on. The obvious alternate is average pooling which emits the average value of the cells. I don’t find compelling evidence that the correct type of pooling could be choosen upfront (without experience?) based on what I read here. Thus I was inclined to try both. Running the models with identical setup besides the type of pooling at 200 epochs, I found that max pooling produced a lower error of about 10 using MSE as the metric. It came into question of whether or not this was ideal as I dug deeper.
The next step was to tinker with the filters in the convolution layers. After reading up what the filters were actually doing, I scaled them up to 32,64,128 in order. This seems to be a common practice though the rationale still escapes me. After playing around that didn’t seem to provide much value which I believe to be a lack of data, more on that latter though.
Having had copied this code from various examples, I was unclear on the purpose of the Dropout layer that came after the fully connected or dense layer. Effectively it just disables random nodes within the dense layer to avoid over fitting. After removing it and running a few hundred epochs I was able to get well below 20 MSE that had previously been my floor. Predictably my test images did poorly which ruined the initial joy at tiny errors.
Up until now I had been training on my Mac laptop, which while relatively fast, often leaving me long periods of time where I watched epoch after epoch go by. I turned to my aging desktop that happens to have an Nvidia GPU and installed Ubuntu 18.04 to see if I could speed up things. Like every previous attempt to setup Nvidia drivers on a linux box (that has a monitor attached) it went poorly. Following the tensorflow instructions I ended up with Cuda 11.0 rather than 10.1, so lots of googling around until I found a magic incantation that worked. One day I will put together an ansible script for this and never look back, until then I will slog through it.
Thankfully at the end of it I came through with a machine that could train at about an order of magnitude faster than my Mac. Which has allowed me to mess with different sizes of dense layers and adding in additional dense layers. Nothing conclusive has come out of it, include when trying different sizes of the input images. I had hoped that larger images would have more meaning than small images, but that has not demonstrated to be true. All of this leads me to my data.
At the point of writing this, I have just shy of 200 images that came from an incredibly cheap “microscope” I found on Amazon for $20. Having been looking at these images when I collected them I admit to having a hard time seeing much of a difference (at least between say 26 seconds and 31 seconds). I even culled some of the images that were taken early on due to how blurry and meaningless they seemed. I intend to see if my model can pick up differences in grinds that I can’t, however 200 images covering less than half of the range of 1-60 seconds isn’t nearly enough. I will see if I can train past it through ML magic, though I imagine it will just result in over fitting.
For now I eagerly await the pounds of cheap coffee I have ordered so that I can do another data generation day. Until then I should prepare some more sophisticated testing framework to evaluate my model than running the optpresso predict
command.