One of the great perks of living in the San Francisco Bay Area is proximity to some amazing wine regions. Over the last couple years, I've visited vineyards in regions like Napa Valley, Sonoma Valley, Paso Robles, and even Malibu. I recently ran into a machine learning data set that has data on 6000 Portuguese wines that includes a 1-10 quality rating, which seems like a great excuse to build a neural network that can predict the 1-10 quality rating based on factors like residual sugar and alcohol content. Effectively, this neural network attempts to match the wine palate of whoever put this data set together.
Training a Neural Network with Brain.js
Brain.js is a simple npm module for building neural networks, a common machine learning model that you might see in an undergraduate AI class.
The wine data can be downloaded here. The file is a CSV that uses semi-colons (
;
) as a delimiter. The contents look like this:
The first 11 columns are various chemical properties of a given wine, and the 12th and final column is a "quality" score that represents how good this wine tastes according to the person who recorded this data.
"Training" is how you build a neural network. Given some training data, Brain.js builds a mathematical model for predicting the quality rating of a wine based on the chemical properties. Below is an example from the Brain.js docs about how to train and then use a neural network.
var net = new brain.NeuralNetwork();
net.train([{input: { r: 0.03, g: 0.7, b: 0.5 }, output: { black: 1 }},
{input: { r: 0.16, g: 0.09, b: 0.2 }, output: { white: 1 }},
{input: { r: 0.5, g: 0.5, b: 1.0 }, output: { white: 1 }}]);
var output = net.run({ r: 1, g: 0.4, b: 0 }); // { white: 0.99, black: 0.002 }
For the wine data, the input will be an object representing the chemical properties, and the output will contain one property, the quality. One key detail about Brain.js is that all inputs must be between 0 and 1, so you need to scale some of the inputs. Below is the first wine from the CSV converted into a format that Brain.js can use for training a neural network.
{ input:
{ 'fixed acidity': 0.7,
'volatile acidity': 0.027000000000000003,
'citric acid': 0.036,
'residual sugar': 0.0207,
chlorides: 0.0045,
'free sulfur dioxide': 0.045,
'total sulfur dioxide': 0.17,
density: 0.1001,
pH: 0.3,
sulphates: 0.045,
alcohol: 0.08800000000000001 },
output: { quality: 0.6 } }
Below is the code for training a neural network on the first 1000 wines in the CSV.
const { NeuralNetwork } = require('brain.js');
const _ = require('lodash');
const fs = require('fs');
const raw = fs.readFileSync('./winequality-white.csv', 'utf8').split('\n');
const headers = raw[0].split(';').map(header => header.replace(/"/g, ''));
// Convert the raw data from a string into an array of objects where property
// names match the column headers.
const data = raw.
slice(1).
map(line => line.split(';').
reduce((cur, v, i) => {
// Ensure that numberic values are between 0 and 1
// Admittedly this is a bit hacky, and I'd love to hear how machine
// learning experts handle this.
if (headers[i].includes('sulfur') || headers[i].includes('sugar')) {
cur[headers[i]] = parseFloat(v) / 1000;
} else if (headers[i].includes('alcohol')) {
cur[headers[i]] = parseFloat(v) / 100;
} else {
// Quality will be 0.1-1 rather than 1-10
cur[headers[i]] = parseFloat(v) / 10;
}
return cur;
}, {}));
const net = new NeuralNetwork();
const numTrainingData = 1000;
const trainingData = data.
slice(0, numTrainingData).
map(obj => ({
input: _.omit(obj, ['quality']),
output: _.pick(obj, ['quality'])
}));
console.log(trainingData[0]);
console.log('done training', net.train(trainingData));
Once you have trained a neural network, you can use it to estimate the quality of subsequent wines based on their chemical properties. Below is code that takes the neural network, runs it on the next 50 wines, and calculates the average difference between the neural network's prediction and the actual quality of the wine.
let error = 0;
for (let i = 0; i < 50; ++i) {
const { quality } = net.run(_.omit(data[numTrainingData + i], ['quality']));
error += Math.abs(quality - data[numTrainingData + i].quality);
console.log(i, quality, data[numTrainingData + i].quality);
}
console.log('Average error', error / 50);
console.log('done');
Below is the truncated output. This rudimentary neural network gets within about 0.6 of the actual quality rating on average.
45 0.602045476436615 0.5
46 0.5928407311439514 0.5
47 0.4441471993923187 0.5
48 0.449766606092453 0.5
49 0.7137854695320129 0.6
Average error 0.06042885661125182
Serializing the Neural Network
In practice you don't want to recompute the neural network every time, because even in this simple example training the neural network takes approximately 20 seconds. You can serialize the neural network using the
toJSON()
function:// Serialize the neural network as JSON to a file
fs.writeFileSync('./net.json', JSON.stringify(net.toJSON(), null, ' '));
Open up the
net.json
file to see what the neural network looks like. Neural networks consist of "nodes" or "neurons" that assign a weight to each input. When you train a neural network, brain.js searches to try to come up with weights that match the training data as closely as possible. Here's a sample node from the net.json
file that shows the weights for each parameter.{
"bias": -5.532558917999268,
"weights": {
"fixed acidity": 1.0129427909851074,
"volatile acidity": -3.8902039527893066,
"citric acid": -0.4018211364746094,
"residual sugar": -0.5149407386779785,
"chlorides": -3.0765116214752197,
"free sulfur dioxide": 2.4955267906188965,
"total sulfur dioxide": -0.5537568926811218,
"density": -1.1998544931411743,
"pH": 3.0909314155578613,
"sulphates": 2.17152738571167,
"alcohol": 9.936287879943848
}
}
You can then load the neural network from the JSON file and re-use it.
const net = new NeuralNetwork();
net.fromJSON(JSON.parse(fs.readFileSync('./net.json', 'utf8')));
// ...
let error = 0;
for (let i = 0; i < 50; ++i) {
const { quality } = net.run(_.omit(data[numTrainingData + i], ['quality']));
error += Math.abs(quality - data[numTrainingData + i].quality);
console.log(i, quality, data[numTrainingData + i].quality);
}
console.log('Average error', error / 50);
Moving On
There's an npm module for just about everything, even machine learning. Brain.js is one of the older libraries. There's also a newer one by Google that supposedly has better performance called deeplearn. If you're interested in the theory of machine learning, I highly recommend Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig. R&N is the standard textbook for undergraduate AI courses and serves as an excellent introduction.
留言
張貼留言