library(tidyverse)
library(magrittr)
Keras
Caffee
and Theano
to Google’s Tensorflow
and the newer Pytorch
(which is increasingly trending in research).Keras
It has the following advantages:
TensorFlow
, CNTK
, or Theano
.Google cloud
, Spark
, HDF5
…)CoreML
, TensorFlow
Android runtime, R
or Python
webapp backend such as a Shiny
or Flask
app)It is widely adapted in academia and industry (Google, Netflix, Uber, CERN, Yelp, Square etc.), and is also a popular framework on Kaggle, the machine-learning competition website, where almost every recent deep-learning competition has been won using Keras
models. While Google’s TensorFlow
is even more popular, keep in mind that Keras
can use Tensorflow
(and other popular DL frameworks) as backend, and allows less cumbersome and more high-level
Keras
represents a wonderful high-level starter, fast and easy implementable, and in most cases flexible enough to do whatever you feel like.Sidenote: The weird name (Keras
) means horn in Greek, and is a reference to ancient Greek literature. Eg., in Odyssey, supernatural dream spirits are divided between those who deceive men with false visions (arriving to Earth through a gate of ivory), and those who announce a future that will come to pass (arriving through a gate of horn). So, enough history lessons, let’s run our first deep learning model!
# Load our main tool
library(keras)
Mnist
dataset.# Load our data
mnist <- dataset_mnist()
mnist %>%
glimpse()
List of 2
$ train:List of 2
..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
$ test :List of 2
..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...
# sepperate in train and test
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y
glimpse(train_images)
int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
glimpse(train_labels)
int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
digit <- train_images[5,,]
digit[,8:20] # I crop it a bit, otherwise the columns dont fit on one page
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 55 148 210 253 253 113 87 148
[9,] 0 0 0 0 87 232 252 253 189 210 252 252 253
[10,] 0 0 4 57 242 252 190 65 5 12 182 252 253
[11,] 0 0 96 252 252 183 14 0 0 92 252 252 225
[12,] 0 132 253 252 146 14 0 0 0 215 252 252 79
[13,] 126 253 247 176 9 0 0 8 78 245 253 129 0
[14,] 232 252 176 0 0 0 36 201 252 252 169 11 0
[15,] 252 252 30 22 119 197 241 253 252 251 77 0 0
[16,] 231 252 253 252 252 252 226 227 252 231 0 0 0
[17,] 55 235 253 217 138 42 24 192 252 143 0 0 0
[18,] 0 0 0 0 0 0 62 255 253 109 0 0 0
[19,] 0 0 0 0 0 0 71 253 252 21 0 0 0
[20,] 0 0 0 0 0 0 0 253 252 21 0 0 0
[21,] 0 0 0 0 0 0 71 253 252 21 0 0 0
[22,] 0 0 0 0 0 0 106 253 252 21 0 0 0
[23,] 0 0 0 0 0 0 45 255 253 21 0 0 0
[24,] 0 0 0 0 0 0 0 218 252 56 0 0 0
[25,] 0 0 0 0 0 0 0 96 252 189 42 0 0
[26,] 0 0 0 0 0 0 0 14 184 252 170 11 0
[27,] 0 0 0 0 0 0 0 0 14 147 252 42 0
[28,] 0 0 0 0 0 0 0 0 0 0 0 0 0
To make it more tangible, lets plot one:
digit %>% as.raster(max = 255) %>% plot()
rm(digits)
Keras
modelThe workflow will be as follows:
train_images
and train_labels
.test_images
, and we’ll verify whether these predictions match the labels from test_labels
.Let’s build the network - again, remember that you aren’t expected to understand everything about this example yet.
Building a model in Keras
that can be fitted on your data involves two steps:
network <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(28 * 28)) %>%
layer_dense(units = 10, activation = "softmax")
Notice that the layer stacking in R
is done via the well-known %>%
, in Pyhton
with .
. That’s about the main difference between both implementations.
The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form.
Specifically, layers extract representations out of the data fed into them - hopefully, representations that are more meaningful for the problem at hand.
Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation.
Here, our network consists of a sequence of two layers, which are densely connected (layer_dense
) neural layers.
The second (and last) layer is a 10-way softmax
layer, which means it will return an array of 10 probability scores (summing to 1).
Each score will be the probability that the current digit image belongs to one of our 10 digit classes. So, we defined a network with overall 634 cells, consisting of:
To make the network ready for training, we need to pick three more things, as part of the compilation step:
While we are already familiar with defining metrics to optimize, defining an optimizer and loss function is new. We will dig into that later.
Notice that the compile()
function modifies the network in place. We will talk about all of them later in a bit more detail.
network %>% compile(
optimizer = "rmsprop",
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
Lets inspect our final setup:
summary(network)
Model: "sequential_1"
___________________________________________________________________________________________________
Layer (type) Output Shape Param #
===================================================================================================
dense_2 (Dense) (None, 512) 401920
___________________________________________________________________________________________________
dense_3 (Dense) (None, 10) 5130
===================================================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
___________________________________________________________________________________________________
Well’ we see that a network of this size has quite a large number of trainable parameters (all edge-weights, meaning 512x512 + 512x10).
[0, 1]
interval.(60000, 28, 28)
of type integer with values in the [0, 255]
interval.(60000, 28 * 28)
with values between 0
and 1
.train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images / 255 # To scale between 0 and 1
test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images / 255 # To scale between 0 and 1
array_reshape()
rather than the dim()
function to reshape the array. I explain why later, when we talk about tensor reshaping.train_labels <- to_categorical(train_labels)
test_labels <- to_categorical(test_labels)
We’re now ready to train the network via Keras
fit()
function. We save the output in an object we call history.net
.
set.seed(1337)
history.net <- network %>% fit(x = train_images,
y = train_labels,
epochs = 10, # How often shall we re-run the model on the whole sample
batch_size = 128, # How many observations should be included in every batch
validation_split = 0.25 # If we want to do a cross-validation in the training
)
Epoch 1/10
1/352 [..............................] - ETA: 0s - loss: 2.3864 - accuracy: 0.1328
11/352 [..............................] - ETA: 1s - loss: 1.2659 - accuracy: 0.6236
22/352 [>.............................] - ETA: 1s - loss: 0.9197 - accuracy: 0.7365
31/352 [=>............................] - ETA: 1s - loss: 0.7930 - accuracy: 0.7712
44/352 [==>...........................] - ETA: 1s - loss: 0.6916 - accuracy: 0.7972
57/352 [===>..........................] - ETA: 1s - loss: 0.6179 - accuracy: 0.8189
68/352 [====>.........................] - ETA: 1s - loss: 0.5747 - accuracy: 0.8316
79/352 [=====>........................] - ETA: 1s - loss: 0.5402 - accuracy: 0.8424
91/352 [======>.......................] - ETA: 1s - loss: 0.5104 - accuracy: 0.8515
102/352 [=======>......................] - ETA: 1s - loss: 0.4856 - accuracy: 0.8584
113/352 [========>.....................] - ETA: 1s - loss: 0.4659 - accuracy: 0.8642
125/352 [=========>....................] - ETA: 1s - loss: 0.4507 - accuracy: 0.8687
137/352 [==========>...................] - ETA: 0s - loss: 0.4332 - accuracy: 0.8739
150/352 [===========>..................] - ETA: 0s - loss: 0.4188 - accuracy: 0.8781
163/352 [============>.................] - ETA: 0s - loss: 0.4035 - accuracy: 0.8828
176/352 [==============>...............] - ETA: 0s - loss: 0.3913 - accuracy: 0.8857
187/352 [==============>...............] - ETA: 0s - loss: 0.3822 - accuracy: 0.8885
197/352 [===============>..............] - ETA: 0s - loss: 0.3739 - accuracy: 0.8909
208/352 [================>.............] - ETA: 0s - loss: 0.3653 - accuracy: 0.8930
219/352 [=================>............] - ETA: 0s - loss: 0.3585 - accuracy: 0.8952
231/352 [==================>...........] - ETA: 0s - loss: 0.3503 - accuracy: 0.8976
241/352 [===================>..........] - ETA: 0s - loss: 0.3437 - accuracy: 0.8994
252/352 [====================>.........] - ETA: 0s - loss: 0.3374 - accuracy: 0.9013
262/352 [=====================>........] - ETA: 0s - loss: 0.3331 - accuracy: 0.9024
271/352 [======================>.......] - ETA: 0s - loss: 0.3281 - accuracy: 0.9039
281/352 [======================>.......] - ETA: 0s - loss: 0.3236 - accuracy: 0.9052
291/352 [=======================>......] - ETA: 0s - loss: 0.3189 - accuracy: 0.9065
302/352 [========================>.....] - ETA: 0s - loss: 0.3139 - accuracy: 0.9080
312/352 [=========================>....] - ETA: 0s - loss: 0.3087 - accuracy: 0.9094
322/352 [==========================>...] - ETA: 0s - loss: 0.3050 - accuracy: 0.9106
333/352 [===========================>..] - ETA: 0s - loss: 0.3001 - accuracy: 0.9119
345/352 [============================>.] - ETA: 0s - loss: 0.2952 - accuracy: 0.9134
352/352 [==============================] - 2s 5ms/step - loss: 0.2928 - accuracy: 0.9142
352/352 [==============================] - 3s 8ms/step - loss: 0.2928 - accuracy: 0.9142 - val_loss: 0.1599 - val_accuracy: 0.9533
Epoch 2/10
1/352 [..............................] - ETA: 0s - loss: 0.1682 - accuracy: 0.9609
13/352 [>.............................] - ETA: 1s - loss: 0.1348 - accuracy: 0.9627
25/352 [=>............................] - ETA: 1s - loss: 0.1295 - accuracy: 0.9638
38/352 [==>...........................] - ETA: 1s - loss: 0.1383 - accuracy: 0.9624
49/352 [===>..........................] - ETA: 1s - loss: 0.1387 - accuracy: 0.9617
61/352 [====>.........................] - ETA: 1s - loss: 0.1312 - accuracy: 0.9625
73/352 [=====>........................] - ETA: 1s - loss: 0.1293 - accuracy: 0.9625
83/352 [======>.......................] - ETA: 1s - loss: 0.1304 - accuracy: 0.9618
96/352 [=======>......................] - ETA: 1s - loss: 0.1286 - accuracy: 0.9627
108/352 [========>.....................] - ETA: 1s - loss: 0.1296 - accuracy: 0.9622
119/352 [=========>....................] - ETA: 1s - loss: 0.1265 - accuracy: 0.9632
130/352 [==========>...................] - ETA: 0s - loss: 0.1268 - accuracy: 0.9635
141/352 [===========>..................] - ETA: 0s - loss: 0.1276 - accuracy: 0.9632
151/352 [===========>..................] - ETA: 0s - loss: 0.1276 - accuracy: 0.9635
162/352 [============>.................] - ETA: 0s - loss: 0.1288 - accuracy: 0.9631
173/352 [=============>................] - ETA: 0s - loss: 0.1286 - accuracy: 0.9637
181/352 [==============>...............] - ETA: 0s - loss: 0.1279 - accuracy: 0.9639
187/352 [==============>...............] - ETA: 0s - loss: 0.1282 - accuracy: 0.9639
195/352 [===============>..............] - ETA: 0s - loss: 0.1275 - accuracy: 0.9638
203/352 [================>.............] - ETA: 0s - loss: 0.1271 - accuracy: 0.9639
206/352 [================>.............] - ETA: 0s - loss: 0.1277 - accuracy: 0.9636
213/352 [=================>............] - ETA: 0s - loss: 0.1278 - accuracy: 0.9635
220/352 [=================>............] - ETA: 0s - loss: 0.1272 - accuracy: 0.9637
229/352 [==================>...........] - ETA: 0s - loss: 0.1268 - accuracy: 0.9636
236/352 [===================>..........] - ETA: 0s - loss: 0.1266 - accuracy: 0.9636
245/352 [===================>..........] - ETA: 0s - loss: 0.1265 - accuracy: 0.9636
254/352 [====================>.........] - ETA: 0s - loss: 0.1265 - accuracy: 0.9634
263/352 [=====================>........] - ETA: 0s - loss: 0.1244 - accuracy: 0.9641
272/352 [======================>.......] - ETA: 0s - loss: 0.1238 - accuracy: 0.9642
282/352 [=======================>......] - ETA: 0s - loss: 0.1233 - accuracy: 0.9643
291/352 [=======================>......] - ETA: 0s - loss: 0.1233 - accuracy: 0.9642
299/352 [========================>.....] - ETA: 0s - loss: 0.1228 - accuracy: 0.9644
310/352 [=========================>....] - ETA: 0s - loss: 0.1228 - accuracy: 0.9643
321/352 [==========================>...] - ETA: 0s - loss: 0.1223 - accuracy: 0.9644
332/352 [===========================>..] - ETA: 0s - loss: 0.1218 - accuracy: 0.9644
343/352 [============================>.] - ETA: 0s - loss: 0.1214 - accuracy: 0.9647
352/352 [==============================] - 2s 5ms/step - loss: 0.1205 - accuracy: 0.9648
352/352 [==============================] - 2s 7ms/step - loss: 0.1205 - accuracy: 0.9648 - val_loss: 0.1126 - val_accuracy: 0.9659
Epoch 3/10
1/352 [..............................] - ETA: 0s - loss: 0.0267 - accuracy: 1.0000
12/352 [>.............................] - ETA: 1s - loss: 0.0690 - accuracy: 0.9805
24/352 [=>............................] - ETA: 1s - loss: 0.0708 - accuracy: 0.9798
36/352 [==>...........................] - ETA: 1s - loss: 0.0740 - accuracy: 0.9783
50/352 [===>..........................] - ETA: 1s - loss: 0.0780 - accuracy: 0.9759
63/352 [====>.........................] - ETA: 1s - loss: 0.0783 - accuracy: 0.9763
77/352 [=====>........................] - ETA: 1s - loss: 0.0803 - accuracy: 0.9761
91/352 [======>.......................] - ETA: 1s - loss: 0.0777 - accuracy: 0.9769
104/352 [=======>......................] - ETA: 0s - loss: 0.0774 - accuracy: 0.9770
117/352 [========>.....................] - ETA: 0s - loss: 0.0769 - accuracy: 0.9769
130/352 [==========>...................] - ETA: 0s - loss: 0.0788 - accuracy: 0.9764
143/352 [===========>..................] - ETA: 0s - loss: 0.0781 - accuracy: 0.9766
155/352 [============>.................] - ETA: 0s - loss: 0.0793 - accuracy: 0.9762
168/352 [=============>................] - ETA: 0s - loss: 0.0778 - accuracy: 0.9767
180/352 [==============>...............] - ETA: 0s - loss: 0.0765 - accuracy: 0.9771
193/352 [===============>..............] - ETA: 0s - loss: 0.0782 - accuracy: 0.9767
206/352 [================>.............] - ETA: 0s - loss: 0.0787 - accuracy: 0.9767
217/352 [=================>............] - ETA: 0s - loss: 0.0778 - accuracy: 0.9771
228/352 [==================>...........] - ETA: 0s - loss: 0.0790 - accuracy: 0.9769
239/352 [===================>..........] - ETA: 0s - loss: 0.0785 - accuracy: 0.9770
251/352 [====================>.........] - ETA: 0s - loss: 0.0779 - accuracy: 0.9770
261/352 [=====================>........] - ETA: 0s - loss: 0.0784 - accuracy: 0.9768
272/352 [======================>.......] - ETA: 0s - loss: 0.0777 - accuracy: 0.9771
282/352 [=======================>......] - ETA: 0s - loss: 0.0773 - accuracy: 0.9772
292/352 [=======================>......] - ETA: 0s - loss: 0.0774 - accuracy: 0.9772
303/352 [========================>.....] - ETA: 0s - loss: 0.0775 - accuracy: 0.9771
314/352 [=========================>....] - ETA: 0s - loss: 0.0773 - accuracy: 0.9772
325/352 [==========================>...] - ETA: 0s - loss: 0.0771 - accuracy: 0.9773
336/352 [===========================>..] - ETA: 0s - loss: 0.0775 - accuracy: 0.9771
347/352 [============================>.] - ETA: 0s - loss: 0.0774 - accuracy: 0.9771
352/352 [==============================] - 2s 4ms/step - loss: 0.0775 - accuracy: 0.9772
352/352 [==============================] - 2s 6ms/step - loss: 0.0775 - accuracy: 0.9772 - val_loss: 0.1008 - val_accuracy: 0.9710
Epoch 4/10
1/352 [..............................] - ETA: 0s - loss: 0.0433 - accuracy: 0.9922
13/352 [>.............................] - ETA: 1s - loss: 0.0487 - accuracy: 0.9862
26/352 [=>............................] - ETA: 1s - loss: 0.0523 - accuracy: 0.9838
39/352 [==>...........................] - ETA: 1s - loss: 0.0490 - accuracy: 0.9848
52/352 [===>..........................] - ETA: 1s - loss: 0.0492 - accuracy: 0.9847
66/352 [====>.........................] - ETA: 1s - loss: 0.0516 - accuracy: 0.9846
80/352 [=====>........................] - ETA: 1s - loss: 0.0526 - accuracy: 0.9841
94/352 [=======>......................] - ETA: 0s - loss: 0.0532 - accuracy: 0.9841
105/352 [=======>......................] - ETA: 0s - loss: 0.0533 - accuracy: 0.9841
117/352 [========>.....................] - ETA: 0s - loss: 0.0527 - accuracy: 0.9844
128/352 [=========>....................] - ETA: 0s - loss: 0.0522 - accuracy: 0.9846
137/352 [==========>...................] - ETA: 0s - loss: 0.0522 - accuracy: 0.9847
148/352 [===========>..................] - ETA: 0s - loss: 0.0527 - accuracy: 0.9847
160/352 [============>.................] - ETA: 0s - loss: 0.0529 - accuracy: 0.9846
170/352 [=============>................] - ETA: 0s - loss: 0.0528 - accuracy: 0.9847
180/352 [==============>...............] - ETA: 0s - loss: 0.0529 - accuracy: 0.9849
191/352 [===============>..............] - ETA: 0s - loss: 0.0537 - accuracy: 0.9845
201/352 [================>.............] - ETA: 0s - loss: 0.0551 - accuracy: 0.9842
212/352 [=================>............] - ETA: 0s - loss: 0.0544 - accuracy: 0.9844
223/352 [==================>...........] - ETA: 0s - loss: 0.0553 - accuracy: 0.9841
234/352 [==================>...........] - ETA: 0s - loss: 0.0561 - accuracy: 0.9840
244/352 [===================>..........] - ETA: 0s - loss: 0.0563 - accuracy: 0.9839
254/352 [====================>.........] - ETA: 0s - loss: 0.0561 - accuracy: 0.9840
265/352 [=====================>........] - ETA: 0s - loss: 0.0557 - accuracy: 0.9842
276/352 [======================>.......] - ETA: 0s - loss: 0.0554 - accuracy: 0.9841
287/352 [=======================>......] - ETA: 0s - loss: 0.0551 - accuracy: 0.9841
300/352 [========================>.....] - ETA: 0s - loss: 0.0552 - accuracy: 0.9840
313/352 [=========================>....] - ETA: 0s - loss: 0.0556 - accuracy: 0.9839
325/352 [==========================>...] - ETA: 0s - loss: 0.0561 - accuracy: 0.9836
338/352 [===========================>..] - ETA: 0s - loss: 0.0556 - accuracy: 0.9838
350/352 [============================>.] - ETA: 0s - loss: 0.0554 - accuracy: 0.9838
352/352 [==============================] - 2s 4ms/step - loss: 0.0555 - accuracy: 0.9838
352/352 [==============================] - 2s 6ms/step - loss: 0.0555 - accuracy: 0.9838 - val_loss: 0.0989 - val_accuracy: 0.9717
Epoch 5/10
1/352 [..............................] - ETA: 0s - loss: 0.0369 - accuracy: 0.9922
15/352 [>.............................] - ETA: 1s - loss: 0.0299 - accuracy: 0.9922
29/352 [=>............................] - ETA: 1s - loss: 0.0310 - accuracy: 0.9906
43/352 [==>...........................] - ETA: 1s - loss: 0.0334 - accuracy: 0.9896
56/352 [===>..........................] - ETA: 1s - loss: 0.0356 - accuracy: 0.9891
69/352 [====>.........................] - ETA: 1s - loss: 0.0369 - accuracy: 0.9887
82/352 [=====>........................] - ETA: 1s - loss: 0.0368 - accuracy: 0.9889
93/352 [======>.......................] - ETA: 1s - loss: 0.0358 - accuracy: 0.9894
103/352 [=======>......................] - ETA: 1s - loss: 0.0356 - accuracy: 0.9894
116/352 [========>.....................] - ETA: 0s - loss: 0.0358 - accuracy: 0.9894
129/352 [=========>....................] - ETA: 0s - loss: 0.0375 - accuracy: 0.9891
137/352 [==========>...................] - ETA: 0s - loss: 0.0372 - accuracy: 0.9890
148/352 [===========>..................] - ETA: 0s - loss: 0.0365 - accuracy: 0.9894
158/352 [============>.................] - ETA: 0s - loss: 0.0371 - accuracy: 0.9893
170/352 [=============>................] - ETA: 0s - loss: 0.0374 - accuracy: 0.9892
181/352 [==============>...............] - ETA: 0s - loss: 0.0380 - accuracy: 0.9892
193/352 [===============>..............] - ETA: 0s - loss: 0.0380 - accuracy: 0.9891
204/352 [================>.............] - ETA: 0s - loss: 0.0386 - accuracy: 0.9889
214/352 [=================>............] - ETA: 0s - loss: 0.0384 - accuracy: 0.9888
225/352 [==================>...........] - ETA: 0s - loss: 0.0388 - accuracy: 0.9888
237/352 [===================>..........] - ETA: 0s - loss: 0.0385 - accuracy: 0.9888
247/352 [====================>.........] - ETA: 0s - loss: 0.0387 - accuracy: 0.9887
259/352 [=====================>........] - ETA: 0s - loss: 0.0396 - accuracy: 0.9884
271/352 [======================>.......] - ETA: 0s - loss: 0.0403 - accuracy: 0.9882
281/352 [======================>.......] - ETA: 0s - loss: 0.0401 - accuracy: 0.9883
292/352 [=======================>......] - ETA: 0s - loss: 0.0404 - accuracy: 0.9881
304/352 [========================>.....] - ETA: 0s - loss: 0.0405 - accuracy: 0.9881
316/352 [=========================>....] - ETA: 0s - loss: 0.0411 - accuracy: 0.9879
327/352 [==========================>...] - ETA: 0s - loss: 0.0408 - accuracy: 0.9880
337/352 [===========================>..] - ETA: 0s - loss: 0.0406 - accuracy: 0.9880
349/352 [============================>.] - ETA: 0s - loss: 0.0411 - accuracy: 0.9878
352/352 [==============================] - 2s 4ms/step - loss: 0.0411 - accuracy: 0.9878
352/352 [==============================] - 2s 6ms/step - loss: 0.0411 - accuracy: 0.9878 - val_loss: 0.0870 - val_accuracy: 0.9747
Epoch 6/10
1/352 [..............................] - ETA: 0s - loss: 0.0442 - accuracy: 0.9922
13/352 [>.............................] - ETA: 1s - loss: 0.0279 - accuracy: 0.9916
26/352 [=>............................] - ETA: 1s - loss: 0.0247 - accuracy: 0.9928
38/352 [==>...........................] - ETA: 1s - loss: 0.0273 - accuracy: 0.9920
50/352 [===>..........................] - ETA: 1s - loss: 0.0250 - accuracy: 0.9928
62/352 [====>.........................] - ETA: 1s - loss: 0.0250 - accuracy: 0.9931
73/352 [=====>........................] - ETA: 1s - loss: 0.0257 - accuracy: 0.9924
86/352 [======>.......................] - ETA: 1s - loss: 0.0279 - accuracy: 0.9914
98/352 [=======>......................] - ETA: 1s - loss: 0.0289 - accuracy: 0.9913
110/352 [========>.....................] - ETA: 1s - loss: 0.0285 - accuracy: 0.9915
121/352 [=========>....................] - ETA: 0s - loss: 0.0280 - accuracy: 0.9916
133/352 [==========>...................] - ETA: 0s - loss: 0.0293 - accuracy: 0.9909
145/352 [===========>..................] - ETA: 0s - loss: 0.0294 - accuracy: 0.9907
157/352 [============>.................] - ETA: 0s - loss: 0.0291 - accuracy: 0.9907
168/352 [=============>................] - ETA: 0s - loss: 0.0298 - accuracy: 0.9905
180/352 [==============>...............] - ETA: 0s - loss: 0.0303 - accuracy: 0.9905
190/352 [===============>..............] - ETA: 0s - loss: 0.0308 - accuracy: 0.9905
203/352 [================>.............] - ETA: 0s - loss: 0.0315 - accuracy: 0.9901
215/352 [=================>............] - ETA: 0s - loss: 0.0316 - accuracy: 0.9902
228/352 [==================>...........] - ETA: 0s - loss: 0.0313 - accuracy: 0.9904
241/352 [===================>..........] - ETA: 0s - loss: 0.0312 - accuracy: 0.9905
253/352 [====================>.........] - ETA: 0s - loss: 0.0312 - accuracy: 0.9905
265/352 [=====================>........] - ETA: 0s - loss: 0.0308 - accuracy: 0.9906
276/352 [======================>.......] - ETA: 0s - loss: 0.0304 - accuracy: 0.9907
288/352 [=======================>......] - ETA: 0s - loss: 0.0302 - accuracy: 0.9907
300/352 [========================>.....] - ETA: 0s - loss: 0.0302 - accuracy: 0.9908
312/352 [=========================>....] - ETA: 0s - loss: 0.0301 - accuracy: 0.9908
324/352 [==========================>...] - ETA: 0s - loss: 0.0301 - accuracy: 0.9908
336/352 [===========================>..] - ETA: 0s - loss: 0.0308 - accuracy: 0.9908
349/352 [============================>.] - ETA: 0s - loss: 0.0305 - accuracy: 0.9910
352/352 [==============================] - 2s 4ms/step - loss: 0.0303 - accuracy: 0.9910
352/352 [==============================] - 2s 5ms/step - loss: 0.0303 - accuracy: 0.9910 - val_loss: 0.1008 - val_accuracy: 0.9727
Epoch 7/10
1/352 [..............................] - ETA: 0s - loss: 0.0514 - accuracy: 0.9766
15/352 [>.............................] - ETA: 1s - loss: 0.0237 - accuracy: 0.9927
29/352 [=>............................] - ETA: 1s - loss: 0.0246 - accuracy: 0.9925
43/352 [==>...........................] - ETA: 1s - loss: 0.0216 - accuracy: 0.9933
55/352 [===>..........................] - ETA: 1s - loss: 0.0217 - accuracy: 0.9935
68/352 [====>.........................] - ETA: 1s - loss: 0.0212 - accuracy: 0.9938
83/352 [======>.......................] - ETA: 1s - loss: 0.0233 - accuracy: 0.9933
97/352 [=======>......................] - ETA: 0s - loss: 0.0232 - accuracy: 0.9936
111/352 [========>.....................] - ETA: 0s - loss: 0.0231 - accuracy: 0.9935
125/352 [=========>....................] - ETA: 0s - loss: 0.0231 - accuracy: 0.9936
139/352 [==========>...................] - ETA: 0s - loss: 0.0231 - accuracy: 0.9934
152/352 [===========>..................] - ETA: 0s - loss: 0.0229 - accuracy: 0.9935
165/352 [=============>................] - ETA: 0s - loss: 0.0236 - accuracy: 0.9936
178/352 [==============>...............] - ETA: 0s - loss: 0.0237 - accuracy: 0.9935
190/352 [===============>..............] - ETA: 0s - loss: 0.0232 - accuracy: 0.9935
203/352 [================>.............] - ETA: 0s - loss: 0.0229 - accuracy: 0.9935
215/352 [=================>............] - ETA: 0s - loss: 0.0231 - accuracy: 0.9935
226/352 [==================>...........] - ETA: 0s - loss: 0.0232 - accuracy: 0.9934
239/352 [===================>..........] - ETA: 0s - loss: 0.0230 - accuracy: 0.9934
253/352 [====================>.........] - ETA: 0s - loss: 0.0228 - accuracy: 0.9934
264/352 [=====================>........] - ETA: 0s - loss: 0.0227 - accuracy: 0.9934
277/352 [======================>.......] - ETA: 0s - loss: 0.0225 - accuracy: 0.9935
288/352 [=======================>......] - ETA: 0s - loss: 0.0227 - accuracy: 0.9934
299/352 [========================>.....] - ETA: 0s - loss: 0.0225 - accuracy: 0.9933
312/352 [=========================>....] - ETA: 0s - loss: 0.0226 - accuracy: 0.9934
323/352 [==========================>...] - ETA: 0s - loss: 0.0226 - accuracy: 0.9933
334/352 [===========================>..] - ETA: 0s - loss: 0.0225 - accuracy: 0.9934
344/352 [============================>.] - ETA: 0s - loss: 0.0228 - accuracy: 0.9933
352/352 [==============================] - 1s 4ms/step - loss: 0.0229 - accuracy: 0.9933
352/352 [==============================] - 2s 5ms/step - loss: 0.0229 - accuracy: 0.9933 - val_loss: 0.0875 - val_accuracy: 0.9768
Epoch 8/10
1/352 [..............................] - ETA: 0s - loss: 0.0032 - accuracy: 1.0000
14/352 [>.............................] - ETA: 1s - loss: 0.0201 - accuracy: 0.9950
28/352 [=>............................] - ETA: 1s - loss: 0.0150 - accuracy: 0.9961
41/352 [==>...........................] - ETA: 1s - loss: 0.0148 - accuracy: 0.9962
53/352 [===>..........................] - ETA: 1s - loss: 0.0145 - accuracy: 0.9960
66/352 [====>.........................] - ETA: 1s - loss: 0.0152 - accuracy: 0.9959
80/352 [=====>........................] - ETA: 1s - loss: 0.0145 - accuracy: 0.9961
92/352 [======>.......................] - ETA: 1s - loss: 0.0146 - accuracy: 0.9962
104/352 [=======>......................] - ETA: 0s - loss: 0.0163 - accuracy: 0.9958
115/352 [========>.....................] - ETA: 0s - loss: 0.0162 - accuracy: 0.9957
126/352 [=========>....................] - ETA: 0s - loss: 0.0163 - accuracy: 0.9957
138/352 [==========>...................] - ETA: 0s - loss: 0.0161 - accuracy: 0.9959
150/352 [===========>..................] - ETA: 0s - loss: 0.0157 - accuracy: 0.9959
162/352 [============>.................] - ETA: 0s - loss: 0.0155 - accuracy: 0.9960
174/352 [=============>................] - ETA: 0s - loss: 0.0157 - accuracy: 0.9960
186/352 [==============>...............] - ETA: 0s - loss: 0.0155 - accuracy: 0.9960
197/352 [===============>..............] - ETA: 0s - loss: 0.0159 - accuracy: 0.9958
209/352 [================>.............] - ETA: 0s - loss: 0.0158 - accuracy: 0.9958
220/352 [=================>............] - ETA: 0s - loss: 0.0166 - accuracy: 0.9956
231/352 [==================>...........] - ETA: 0s - loss: 0.0169 - accuracy: 0.9955
244/352 [===================>..........] - ETA: 0s - loss: 0.0167 - accuracy: 0.9955
256/352 [====================>.........] - ETA: 0s - loss: 0.0169 - accuracy: 0.9954
268/352 [=====================>........] - ETA: 0s - loss: 0.0168 - accuracy: 0.9954
282/352 [=======================>......] - ETA: 0s - loss: 0.0171 - accuracy: 0.9954
296/352 [========================>.....] - ETA: 0s - loss: 0.0173 - accuracy: 0.9954
306/352 [=========================>....] - ETA: 0s - loss: 0.0172 - accuracy: 0.9954
317/352 [==========================>...] - ETA: 0s - loss: 0.0173 - accuracy: 0.9952
328/352 [==========================>...] - ETA: 0s - loss: 0.0175 - accuracy: 0.9952
340/352 [===========================>..] - ETA: 0s - loss: 0.0174 - accuracy: 0.9952
352/352 [==============================] - 2s 5ms/step - loss: 0.0176 - accuracy: 0.9952
352/352 [==============================] - 2s 6ms/step - loss: 0.0176 - accuracy: 0.9952 - val_loss: 0.0888 - val_accuracy: 0.9761
Epoch 9/10
1/352 [..............................] - ETA: 0s - loss: 0.0402 - accuracy: 0.9844
13/352 [>.............................] - ETA: 1s - loss: 0.0091 - accuracy: 0.9976
25/352 [=>............................] - ETA: 1s - loss: 0.0107 - accuracy: 0.9975
30/352 [=>............................] - ETA: 1s - loss: 0.0114 - accuracy: 0.9971
36/352 [==>...........................] - ETA: 1s - loss: 0.0121 - accuracy: 0.9970
48/352 [===>..........................] - ETA: 1s - loss: 0.0131 - accuracy: 0.9972
64/352 [====>.........................] - ETA: 1s - loss: 0.0117 - accuracy: 0.9976
75/352 [=====>........................] - ETA: 1s - loss: 0.0117 - accuracy: 0.9974
85/352 [======>.......................] - ETA: 1s - loss: 0.0114 - accuracy: 0.9974
97/352 [=======>......................] - ETA: 1s - loss: 0.0122 - accuracy: 0.9972
109/352 [========>.....................] - ETA: 1s - loss: 0.0117 - accuracy: 0.9973
121/352 [=========>....................] - ETA: 1s - loss: 0.0125 - accuracy: 0.9971
133/352 [==========>...................] - ETA: 1s - loss: 0.0124 - accuracy: 0.9971
145/352 [===========>..................] - ETA: 0s - loss: 0.0120 - accuracy: 0.9972
158/352 [============>.................] - ETA: 0s - loss: 0.0133 - accuracy: 0.9969
170/352 [=============>................] - ETA: 0s - loss: 0.0130 - accuracy: 0.9969
182/352 [==============>...............] - ETA: 0s - loss: 0.0131 - accuracy: 0.9968
194/352 [===============>..............] - ETA: 0s - loss: 0.0134 - accuracy: 0.9967
206/352 [================>.............] - ETA: 0s - loss: 0.0131 - accuracy: 0.9968
217/352 [=================>............] - ETA: 0s - loss: 0.0130 - accuracy: 0.9968
229/352 [==================>...........] - ETA: 0s - loss: 0.0135 - accuracy: 0.9966
241/352 [===================>..........] - ETA: 0s - loss: 0.0134 - accuracy: 0.9965
254/352 [====================>.........] - ETA: 0s - loss: 0.0132 - accuracy: 0.9966
267/352 [=====================>........] - ETA: 0s - loss: 0.0132 - accuracy: 0.9965
279/352 [======================>.......] - ETA: 0s - loss: 0.0129 - accuracy: 0.9966
291/352 [=======================>......] - ETA: 0s - loss: 0.0128 - accuracy: 0.9967
302/352 [========================>.....] - ETA: 0s - loss: 0.0129 - accuracy: 0.9966
310/352 [=========================>....] - ETA: 0s - loss: 0.0128 - accuracy: 0.9966
321/352 [==========================>...] - ETA: 0s - loss: 0.0129 - accuracy: 0.9967
332/352 [===========================>..] - ETA: 0s - loss: 0.0128 - accuracy: 0.9967
343/352 [============================>.] - ETA: 0s - loss: 0.0127 - accuracy: 0.9968
352/352 [==============================] - 2s 5ms/step - loss: 0.0128 - accuracy: 0.9967
352/352 [==============================] - 2s 6ms/step - loss: 0.0128 - accuracy: 0.9967 - val_loss: 0.0889 - val_accuracy: 0.9783
Epoch 10/10
1/352 [..............................] - ETA: 0s - loss: 0.0685 - accuracy: 0.9922
15/352 [>.............................] - ETA: 1s - loss: 0.0120 - accuracy: 0.9969
29/352 [=>............................] - ETA: 1s - loss: 0.0086 - accuracy: 0.9978
42/352 [==>...........................] - ETA: 1s - loss: 0.0080 - accuracy: 0.9980
56/352 [===>..........................] - ETA: 1s - loss: 0.0079 - accuracy: 0.9980
67/352 [====>.........................] - ETA: 1s - loss: 0.0081 - accuracy: 0.9980
78/352 [=====>........................] - ETA: 1s - loss: 0.0074 - accuracy: 0.9983
91/352 [======>.......................] - ETA: 1s - loss: 0.0075 - accuracy: 0.9983
104/352 [=======>......................] - ETA: 0s - loss: 0.0081 - accuracy: 0.9979
115/352 [========>.....................] - ETA: 0s - loss: 0.0083 - accuracy: 0.9978
127/352 [=========>....................] - ETA: 0s - loss: 0.0088 - accuracy: 0.9977
139/352 [==========>...................] - ETA: 0s - loss: 0.0085 - accuracy: 0.9978
151/352 [===========>..................] - ETA: 0s - loss: 0.0089 - accuracy: 0.9977
161/352 [============>.................] - ETA: 0s - loss: 0.0091 - accuracy: 0.9974
171/352 [=============>................] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
177/352 [==============>...............] - ETA: 0s - loss: 0.0091 - accuracy: 0.9975
188/352 [===============>..............] - ETA: 0s - loss: 0.0091 - accuracy: 0.9975
199/352 [===============>..............] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
209/352 [================>.............] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
220/352 [=================>............] - ETA: 0s - loss: 0.0087 - accuracy: 0.9976
231/352 [==================>...........] - ETA: 0s - loss: 0.0087 - accuracy: 0.9976
241/352 [===================>..........] - ETA: 0s - loss: 0.0086 - accuracy: 0.9976
252/352 [====================>.........] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
262/352 [=====================>........] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
272/352 [======================>.......] - ETA: 0s - loss: 0.0088 - accuracy: 0.9975
283/352 [=======================>......] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
293/352 [=======================>......] - ETA: 0s - loss: 0.0090 - accuracy: 0.9974
304/352 [========================>.....] - ETA: 0s - loss: 0.0089 - accuracy: 0.9975
313/352 [=========================>....] - ETA: 0s - loss: 0.0096 - accuracy: 0.9973
323/352 [==========================>...] - ETA: 0s - loss: 0.0097 - accuracy: 0.9972
332/352 [===========================>..] - ETA: 0s - loss: 0.0099 - accuracy: 0.9972
342/352 [============================>.] - ETA: 0s - loss: 0.0100 - accuracy: 0.9972
352/352 [==============================] - 2s 5ms/step - loss: 0.0101 - accuracy: 0.9972
352/352 [==============================] - 2s 6ms/step - loss: 0.0101 - accuracy: 0.9972 - val_loss: 0.0924 - val_accuracy: 0.9765
fit()
adjusts the weights of the network without explicly assigning it into a new object. history.net
therefore only contains the history of the models prediction metrics through the different epochs, in case we would like to inspect it. Let’s take a look:history.net
Final epoch (plot to see history):
loss: 0.01011
accuracy: 0.9972
val_loss: 0.09245
val_accuracy: 0.9765
history.net %>% glimpse()
List of 2
$ params :List of 3
..$ verbose: int 1
..$ epochs : int 10
..$ steps : int 352
$ metrics:List of 4
..$ loss : num [1:10] 0.2928 0.1205 0.0775 0.0555 0.0411 ...
..$ accuracy : num [1:10] 0.914 0.965 0.977 0.984 0.988 ...
..$ val_loss : num [1:10] 0.1599 0.1126 0.1008 0.0989 0.087 ...
..$ val_accuracy: num [1:10] 0.953 0.966 0.971 0.972 0.975 ...
- attr(*, "class")= chr "keras_training_history"
We can also visualize these metrics through the epocs.
history.net %>% plot(smooth = TRUE)
layer_dropout
, or to tell the model to pick stop running further epocs as soon as the validation accuracy drops. However, we will for now just move on.metrics <- network %>% evaluate(test_images, test_labels)
1/313 [..............................] - ETA: 0s - loss: 0.0036 - accuracy: 1.0000
38/313 [==>...........................] - ETA: 0s - loss: 0.0640 - accuracy: 0.9794
70/313 [=====>........................] - ETA: 0s - loss: 0.0922 - accuracy: 0.9737
102/313 [========>.....................] - ETA: 0s - loss: 0.0913 - accuracy: 0.9767
133/313 [===========>..................] - ETA: 0s - loss: 0.0965 - accuracy: 0.9751
162/313 [==============>...............] - ETA: 0s - loss: 0.0906 - accuracy: 0.9767
197/313 [=================>............] - ETA: 0s - loss: 0.0852 - accuracy: 0.9778
227/313 [====================>.........] - ETA: 0s - loss: 0.0796 - accuracy: 0.9794
256/313 [=======================>......] - ETA: 0s - loss: 0.0723 - accuracy: 0.9810
288/313 [==========================>...] - ETA: 0s - loss: 0.0693 - accuracy: 0.9821
313/313 [==============================] - 0s 2ms/step - loss: 0.0692 - accuracy: 0.9821
313/313 [==============================] - 0s 2ms/step - loss: 0.0692 - accuracy: 0.9821
metrics
loss accuracy
0.06924154 0.98210001
R
, vectors are used to create and manipulate 1D tensors, and matrices are used for 2D tensors. For higher-level dimensions, array objects (which support any number of dimensions) are used.A tensor is defined by three key attributes:
Remember that we before did not use the dim()
but the array_reshape()
function to manipulate our input tensors.
train_images <- array_reshape(train_images, c(60000, 28 * 28))
str(train_images)
num [1:60000, 1:784] 0 0 0 0 0 0 0 0 0 0 ...
dim(train_images)
[1] 60000 784
R
specific thingy, so that the data is reinterpreted using row-major semantics (as opposed to R
s default column-major semantics), which is in turn compatible with the way the numerical libraries called by Keras
(NumPy
, TensorFlow
, and so on) interpret array dimensions.array_reshape()
function when reshaping R
arrays that will be passed to Keras
.x <- matrix(c(0:5),
nrow = 3, ncol = 2, byrow = TRUE)
x
[,1] [,2]
[1,] 0 1
[2,] 2 3
[3,] 4 5
x <- array_reshape(x, dim = c(3, 2))
x
[,1] [,2]
[1,] 0 1
[2,] 2 3
[3,] 4 5
x <- array_reshape(x, dim = c(2, 3))
x
[,1] [,2] [,3]
[1,] 0 1 2
[2,] 3 4 5
x[i,]
becomes x[, i]
. The t()
function can be used to transpose a matrix:x <- t(x)
x
[,1] [,2]
[1,] 0 3
[2,] 1 4
[3,] 2 5
rm(x)
layer <- layer_dense(units = 32, input_shape = c(784))
model <- keras_model_sequential() %>%
layer_dense(units = 32, input_shape = c(784)) %>%
layer_dense(units = 32)
model
Model
Model: "sequential_2"
___________________________________________________________________________________________________
Layer (type) Output Shape Param #
===================================================================================================
dense_4 (Dense) (None, 32) 25120
___________________________________________________________________________________________________
dense_5 (Dense) (None, 32) 1056
===================================================================================================
Total params: 26,176
Trainable params: 26,176
Non-trainable params: 0
___________________________________________________________________________________________________
# devtools::install_github("andrie/deepviz")
library(deepviz)
plot_model(model)
The second layer didn’t receive an input shape argument-instead, it automatically inferred its input shape as being the output shape of the layer that came before.
Picking the right network architecture is more an art than a science; and although there are some best practices and principles you can rely on, only practice can help you become a proper neural-network architect.
Here, we will simit ourself to a simple feed-forward network, where every layer is only connected to the following one. For now, there are two key architecture decisions to be made about such a stack of dense layers:
rm(layer, model)
relu
is the most popular activation function in deep learning, but there are many other candidates, which all come with similarly strange names: prelu
, elu
, and so on. A relu
(rectified linear unit) is a function meant to zero out negative values, and commonly used for intermediate layers (formerly, almost all layers where modelled with sigmoid
, but nowadays its proven that for intermediate layers relu
mostly works better).Our output layer, however, should model a binary choice (yes/no classification). For such a model, we would in a 2-class problem commonly use a a sigmoid
function, which we already know from logistic regression models. It “squashes” arbitrary values into the [0, 1]
interval, outputting something that can be interpreted as a probability.
However, since we have a multi-class prediction problem, we choose softmax
, which squashes the outputs of each unit to be between 0
and 1
, just like a sigmoid
, but it also divides each output such that the total sum of the outputs is equal to 1
. The output is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.
If you are interested regarding the different types of layers in Keras
, check the reference site with all layers implemented. Furthermore, types of activation functions are discussed HERE
Once the network architecture is defined, you still have to choose two more things:
Loss function (objective function): The quantity that will be minimized during training. It represents a measure of success for the task at hand.
Optimizer: Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).
Choosing the right objective and function for the right problem is extremely important: your network will take any shortcut it can, to minimize the loss; so if the objective doesn’t fully correlate with success for the task at hand, your network will end up doing things you may not have wanted.
HERE you find a brief overview on different loss functions. Fortunately, when it comes to common problems such as classification, regression, and sequence prediction, there are simple guidelines you can follow to choose the correct loss.
Take this rule-of-thumb table as a good starter:
optimizer
: We will cover that later. There are a bunch of different around, most variants of the Stochastic Gradient Descent (SGD), Batch (vanilla) Gradient Descent, and Mini-Batch Gradient Descent.RMSprop
(an unpublished, adaptive learning rate method proposed by Geoff Hinton) with standard learning rates works just well.Let’s go back to the first example and review each piece of it in the light of what we have learned up to now: This was the input data:
mnist <- dataset_mnist()
train_images <- mnist$train$x
train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images / 255
test_images <- mnist$test$x
test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images / 255
(60000, 784)
(training data) and (10000, 784)
(test data), respectively.network <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>%
layer_dense(units = 10, activation = "softmax")
layer_dense()
creates fully connected layers, so there exists a weight between every element of one with every element of the following layer.512
cells, the final output layer 10
(equal to the number of classes to predict). Finally, we know that every cell also contains a non-linear activation function, such as relu
, sigmoid
, or softmax
.This was the network-compilation step:
network network %>% compile(
optimizer = "rmsprop",
loss = "categorical_crossentropy",
metrics = "accuracy"
)
categorical_crossentropy
(a measure how pure the predicted classes are) is a type of a `loss`` function that’s used as a feedback signal for learning the weight tensors, and which the training phase will attempt to minimize.rmsprop
optimizer passed as the first argument.Finally, this was the training loop:
network %>% fit(x = train_images,
y = train_labels,
epochs = 10,
batch_size = 128)
fit()
: the network will start to iterate on the training data in mini-batches of 128 samples, 10 times over (each iteration over all the training data is called an ?epoch?).At this point, you already know most of what there is to know about the basics of neural networks.
However, there is still some stuff to come, namely:
But for that, there will be other sessions top come…
library(tidymodels)
data <- read_csv("https://github.com/allisonhorst/palmerpenguins/raw/5b5891f01b52ae26ad8cb9755ec93672f49328a8/data/penguins_size.csv")
data %>% glimpse()
Rows: 344
Columns: 7
$ species_short <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", …
$ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torg…
$ culmen_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, …
$ culmen_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1, 17.3, …
$ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180, 182, 191, 1…
$ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3700, …
$ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "FEMALE", "MALE", NA…
data %<>%
rename(y = species_short) %>%
relocate(y) %>%
drop_na()
data_split <- initial_split(data, prop = 0.75, strata = y)
data_train <- data_split %>% training()
data_test <- data_split %>% testing()
data_recipe <- data_train %>%
recipe(y ~.) %>%
step_center(all_numeric(), -all_outcomes()) %>% # Centers all numeric variables to mean = 0
step_scale(all_numeric(), -all_outcomes()) %>% # scales all numeric variables to sd = 1
step_dummy(all_nominal(), one_hot = TRUE) %>%
prep()
x_train <- juice(data_recipe) %>% select(-starts_with('y')) %>% as.matrix()
x_test <- bake(data_recipe, new_data = data_test) %>% select(-starts_with('y')) %>% as.matrix()
y_train <- juice(data_recipe) %>% select(starts_with('y')) %>% as.matrix()
y_test <- bake(data_recipe, new_data = data_test) %>% select(starts_with('y')) %>% as.matrix()
model_keras <- keras_model_sequential()
model_keras %>%
# First hidden layer
layer_dense(
units = 12,
activation = "relu",
input_shape = ncol(x_train)) %>%
# Dropout to prevent overfitting
layer_dropout(rate = 0.1) %>%
# Second hidden layer
layer_dense(
units = 12,
activation = "relu") %>%
# Dropout to prevent overfitting
layer_dropout(rate = 0.1) %>%
# Output layer
layer_dense(
units = ncol(y_train),
activation = "softmax")
model_keras %>%
compile(
optimizer = "adam",
loss = "categorical_crossentropy",
metrics = "accuracy"
)
model_keras_hist <- model_keras %>% fit(x = x_train,
y = y_train,
epochs = 10, # How often shall we re-run the model on the whole sample
batch_size = 12, # How many observations should be included in every batch
validation_split = 0.25 # If we want to do a cross-validation in the training
)
Epoch 1/10
1/16 [>.............................] - ETA: 0s - loss: 1.1291 - accuracy: 0.5000
16/16 [==============================] - 0s 2ms/step - loss: 1.2027 - accuracy: 0.3085
16/16 [==============================] - 1s 47ms/step - loss: 1.2027 - accuracy: 0.3085 - val_loss: 1.3124 - val_accuracy: 0.0000e+00
Epoch 2/10
1/16 [>.............................] - ETA: 0s - loss: 1.3300 - accuracy: 0.1667
16/16 [==============================] - 0s 1ms/step - loss: 1.1194 - accuracy: 0.3830
16/16 [==============================] - 0s 12ms/step - loss: 1.1194 - accuracy: 0.3830 - val_loss: 1.2318 - val_accuracy: 0.0000e+00
Epoch 3/10
1/16 [>.............................] - ETA: 0s - loss: 1.0494 - accuracy: 0.5000
16/16 [==============================] - 0s 1ms/step - loss: 1.0029 - accuracy: 0.5532
16/16 [==============================] - 0s 12ms/step - loss: 1.0029 - accuracy: 0.5532 - val_loss: 1.1818 - val_accuracy: 0.0159
Epoch 4/10
1/16 [>.............................] - ETA: 0s - loss: 0.9049 - accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.9466 - accuracy: 0.6170
16/16 [==============================] - 0s 13ms/step - loss: 0.9466 - accuracy: 0.6170 - val_loss: 1.1483 - val_accuracy: 0.0794
Epoch 5/10
1/16 [>.............................] - ETA: 0s - loss: 0.8350 - accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.8769 - accuracy: 0.6755
16/16 [==============================] - 0s 13ms/step - loss: 0.8769 - accuracy: 0.6755 - val_loss: 1.1026 - val_accuracy: 0.2063
Epoch 6/10
1/16 [>.............................] - ETA: 0s - loss: 0.8247 - accuracy: 0.6667
16/16 [==============================] - 0s 1ms/step - loss: 0.8050 - accuracy: 0.7660
16/16 [==============================] - 0s 12ms/step - loss: 0.8050 - accuracy: 0.7660 - val_loss: 1.0505 - val_accuracy: 0.5873
Epoch 7/10
1/16 [>.............................] - ETA: 0s - loss: 0.7771 - accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.7296 - accuracy: 0.8298
16/16 [==============================] - 0s 13ms/step - loss: 0.7296 - accuracy: 0.8298 - val_loss: 0.9910 - val_accuracy: 0.6825
Epoch 8/10
1/16 [>.............................] - ETA: 0s - loss: 0.6809 - accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.6871 - accuracy: 0.8404
16/16 [==============================] - 0s 12ms/step - loss: 0.6871 - accuracy: 0.8404 - val_loss: 0.9175 - val_accuracy: 0.7460
Epoch 9/10
1/16 [>.............................] - ETA: 0s - loss: 0.6663 - accuracy: 0.7500
16/16 [==============================] - 0s 3ms/step - loss: 0.6088 - accuracy: 0.8617
16/16 [==============================] - 0s 15ms/step - loss: 0.6088 - accuracy: 0.8617 - val_loss: 0.8506 - val_accuracy: 0.7937
Epoch 10/10
1/16 [>.............................] - ETA: 0s - loss: 0.7198 - accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.5699 - accuracy: 0.8777
16/16 [==============================] - 0s 12ms/step - loss: 0.5699 - accuracy: 0.8777 - val_loss: 0.7716 - val_accuracy: 0.8571
Using the Keras API and reference (Python or R) manual and in particular construct additional simple models (with appropriate metrics):
Also consider further resources mentioned below
You can find more info about:
keras
here: Excellent documentation, tutorials, and resources regarding keras
, maintained by RstudioDatacamp * Introduction to TensorFlow in R: A bit low-level, but a good intro for starters * Also follow the Python intros, they might still be helpful for you.
Others
Books
sessionInfo()