Day 3: Watery deep learning with H2O

by Danielle Navarro, 29 Apr 2018

Maybe I should try playing around with deep learning? All the cool kids are doing it. A few moments on google turns up this R-bloggers post comparing several tools. My first thought is to try mxnet. The package installs but then doesn’t load because of a namespace issue. Okay. I try the h2o package (documentation) instead. The installation is 114Mb, so I have a little time.

I make a cup of tea

I attempt to load the package using library(h2o). R tells me that all I need to do is type h2o.init() and then go to http://localhost:54321 to use the web-based UI. I think to myself that this seems easy enough. Oh, sweet summer child.

I type h2o.init() and it hangs. Some dots print on the screen but there’s nothing happening besides that. Eventually it stops, and says it was unable to connect. I open the error logs. It turns out that h2o is complaining that I have the wrong version of Java. I have version 9 and it wants version 1.8. I shiver: is that an absurdly old version of Java? Apparently not. Version 1.8 is actually version 8, for reasons that escape me entirely. I google some more. It turns out that other people have the same problem. I try following the instructions for uninstalling Java, but Java is too powerful. It takes more than a mere sudo rm -rf command or three and to remove it. I sigh.

My tea is cold

On a whim I decide not to bother with uninstalling Java and try to install the older version anyway. Doing this from the Oracle website does not go well. Gr. More time passes on google. I discover some instructions on how to install Java version 8 using Homebrew. Oh, Homebrew. Is there anything you can’t do?

There is something growing in my tea

I forge on with the Homebrew approach:

brew tap caskroom/versions
brew cask install java8

This does what it’s supposed to, and a quick check

ls /Library/Java/JavaVirtualMachines/

reveals that now there are two. There are two Java Virtual Machines (jdk-9.0.4.jdk and jdk1.8.0_172.jdk). I pause to make a deeply unfunny Sesame Street joke about counting JVMs.

Next I suppose I need to set the JAVA_HOME environment variable? Ugh, shell commands. The internet comes to my rescue again when find this useful trick. Setting an alias for version 8 and version 9 like this

alias setJdk8='export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)'
alias setJdk9='export JAVA_HOME=$(/usr/libexec/java_home -v 9)'

means that I can toggle the Java version at the terminal.

> setJdk8
> java -version

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

> setJdk9
> java -version
java version "9.0.4"
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

I have forgotten my purpose here. What was I doing? Why am I doing it? I am at war with Java. I have always been at war with Java. No, wait. Wasn’t I trying to do something with water? Or the depths? Oh, the h2o package. A million years ago when I started writing this post, the advice from the internet was that I’d need to start h2o from the system prompt, and then link R to h2o without starting it.

Something is moving in my tea cup

Okay so where is my h2o installation? I open R.

lib <- .libPaths() # rlibraries
path <- paste0(lib,"/h2o/java/") # directory where the jvm is 
## [1] "h2o.jar"

My shell command ends up being

setJdk8; java -jar /Library/Frameworks/R.framework/Versions/3.4/Resources/library/h2o/java/h2o.jar

A wall of text cascades like rain down my terminal. From within R I call

h2o.init(startH2O = FALSE)

A browser window opens. It looks suspiciously like a Jupyter notebook.I am making progress!

The tea critters have tentacles and locomote independently

I’m sure I’m not going to get much further than this today. Before I close my eyes and drown my sorrows, it would be nice to check that h2o is working and that I do now have the ability to invoke the deep learning gods. The function call h2o.deeplearning() seems a little less than simple:

h2o.deeplearning(x, y, training_frame, model_id = NULL,
  validation_frame = NULL, nfolds = 0,
  keep_cross_validation_predictions = FALSE,
  keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO",
  "Random", "Modulo", "Stratified"), fold_column = NULL,
  ignore_const_cols = TRUE, score_each_iteration = FALSE,
  weights_column = NULL, offset_column = NULL, balance_classes = FALSE,
  class_sampling_factors = NULL, max_after_balance_size = 5,
  max_hit_ratio_k = 0, checkpoint = NULL, pretrained_autoencoder = NULL,
  overwrite_with_best_model = TRUE, use_all_factor_levels = TRUE,
  standardize = TRUE, activation = c("Tanh", "TanhWithDropout", "Rectifier",
  "RectifierWithDropout", "Maxout", "MaxoutWithDropout"), hidden = c(200,
  200), epochs = 10, train_samples_per_iteration = -2,
  target_ratio_comm_to_comp = 0.05, seed = -1, adaptive_rate = TRUE,
  rho = 0.99, epsilon = 1e-08, rate = 0.005, rate_annealing = 1e-06,
  rate_decay = 1, momentum_start = 0, momentum_ramp = 1e+06,
  momentum_stable = 0, nesterov_accelerated_gradient = TRUE,
  input_dropout_ratio = 0, hidden_dropout_ratios = NULL, l1 = 0, l2 = 0,
  max_w2 = 3.4028235e+38, initial_weight_distribution = c("UniformAdaptive",
  "Uniform", "Normal"), initial_weight_scale = 1, initial_weights = NULL,
  initial_biases = NULL, loss = c("Automatic", "CrossEntropy", "Quadratic",
  "Huber", "Absolute", "Quantile"), distribution = c("AUTO", "bernoulli",
  "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace",
  "quantile", "huber"), quantile_alpha = 0.5, tweedie_power = 1.5,
  huber_alpha = 0.9, score_interval = 5, score_training_samples = 10000,
  score_validation_samples = 0, score_duty_cycle = 0.1,
  classification_stop = 0, regression_stop = 1e-06, stopping_rounds = 5,
  stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE",
  "RMSLE", "AUC", "lift_top_group", "misclassification",
  "mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0,
  score_validation_sampling = c("Uniform", "Stratified"),
  diagnostics = TRUE, fast_mode = TRUE, force_load_balance = TRUE,
  variable_importances = TRUE, replicate_training_data = TRUE,
  single_node_mode = FALSE, shuffle_training_data = FALSE,
  missing_values_handling = c("MeanImputation", "Skip"), quiet_mode = FALSE,
  autoencoder = FALSE, sparse = FALSE, col_major = FALSE,
  average_activation = 0, sparsity_beta = 0,
  max_categorical_features = 2147483647, reproducible = FALSE,
  export_weights_and_biases = FALSE, mini_batch_size = 1,
  categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit",
  "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
  elastic_averaging = FALSE, elastic_averaging_moving_rate = 0.9,
  elastic_averaging_regularization = 0.001, verbose = FALSE)

Perhaps I will decipher this strange inscription another time. There is, however, a simple example using the iris data. I mindlessly cut and paste the code:

h2o.init(startH2O = FALSE)
iris.hex <- as.h2o(iris)
iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex, seed=123456)
predictions <- h2o.predict(iris.dl, iris.hex)
## <environment: 0x7fcd46f3e3f0>
## attr(,"class")
## [1] "H2OFrame"
## attr(,"op")
## [1] "predictions_99ac_DeepLearning_model_R_1524913327346_4_on_iris"
## attr(,"id")
## [1] "predictions_99ac_DeepLearning_model_R_1524913327346_4_on_iris"
## attr(,"eval")
## [1] FALSE
## attr(,"nrow")
## [1] 150
## attr(,"ncol")
## [1] 4
## attr(,"types")
## attr(,"types")[[1]]
## [1] "enum"
## attr(,"types")[[2]]
## [1] "real"
## attr(,"types")[[3]]
## [1] "real"
## attr(,"types")[[4]]
## [1] "real"
## attr(,"data")
##    predict    setosa   versicolor    virginica
## 1   setosa 0.9999972 2.813364e-06 1.141541e-22
## 2   setosa 0.9999558 4.422399e-05 1.026746e-20
## 3   setosa 0.9999975 2.508734e-06 3.222464e-22
## 4   setosa 0.9999946 5.424330e-06 2.625707e-21
## 5   setosa 0.9999991 8.562619e-07 3.776109e-23
## 6   setosa 0.9999971 2.911772e-06 2.107533e-21
## 7   setosa 0.9999995 5.057584e-07 2.707050e-22
## 8   setosa 0.9999957 4.287183e-06 3.881659e-22
## 9   setosa 0.9999925 7.534163e-06 9.412641e-21
## 10  setosa 0.9999786 2.138619e-05 1.336519e-21

Yessss. It’s up and running. I call that a success. I think tomorrow I’ll do something different and return to the sunless seas of deep learning another time.

The tea squid have evolved a rudimentary language.
I’m so proud of them.