Day 3: Watery deep learning with H2O
by Danielle Navarro, 29 Apr 2018
Maybe I should try playing around with deep learning? All the cool kids are doing it. A few moments on google turns up this R-bloggers post comparing several tools. My first thought is to try
mxnet. The package installs but then doesn’t load because of a namespace issue. Okay. I try the
h2o package (documentation) instead. The installation is 114Mb, so I have a little time.
I make a cup of tea
I attempt to load the package using
library(h2o). R tells me that all I need to do is type
h2o.init() and then go to http://localhost:54321 to use the web-based UI. I think to myself that this seems easy enough. Oh, sweet summer child.
h2o.init() and it hangs. Some dots print on the screen but there’s nothing happening besides that. Eventually it stops, and says it was unable to connect. I open the error logs. It turns out that h2o is complaining that I have the wrong version of Java. I have version 9 and it wants version 1.8. I shiver: is that an absurdly old version of Java? Apparently not. Version 1.8 is actually version 8, for reasons that escape me entirely. I google some more. It turns out that other people have the same problem. I try following the instructions for uninstalling Java, but Java is too powerful. It takes more than a mere
sudo rm -rf command or three and to remove it. I sigh.
My tea is cold
On a whim I decide not to bother with uninstalling Java and try to install the older version anyway. Doing this from the Oracle website does not go well. Gr. More time passes on google. I discover some instructions on how to install Java version 8 using Homebrew. Oh, Homebrew. Is there anything you can’t do?
There is something growing in my tea
I forge on with the Homebrew approach:
brew tap caskroom/versions brew cask install java8
This does what it’s supposed to, and a quick check
reveals that now there are two. There are two Java Virtual Machines (
jdk1.8.0_172.jdk). I pause to make a deeply unfunny Sesame Street joke about counting JVMs.
Next I suppose I need to set the JAVA_HOME environment variable? Ugh, shell commands. The internet comes to my rescue again when find this useful trick. Setting an alias for version 8 and version 9 like this
alias setJdk8='export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)' alias setJdk9='export JAVA_HOME=$(/usr/libexec/java_home -v 9)'
means that I can toggle the Java version at the terminal.
> setJdk8 > java -version java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode) > setJdk9 > java -version java version "9.0.4" Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
I have forgotten my purpose here. What was I doing? Why am I doing it? I am at war with Java. I have always been at war with Java. No, wait. Wasn’t I trying to do something with water? Or the depths? Oh, the
h2o package. A million years ago when I started writing this post, the advice from the internet was that I’d need to start h2o from the system prompt, and then link R to h2o without starting it.
Something is moving in my tea cup
Okay so where is my h2o installation? I open R.
lib <- .libPaths() # rlibraries path <- paste0(lib,"/h2o/java/") # directory where the jvm is list.files(path)
##  "h2o.jar"
My shell command ends up being
setJdk8; java -jar /Library/Frameworks/R.framework/Versions/3.4/Resources/library/h2o/java/h2o.jar
A wall of text cascades like rain down my terminal. From within R I call
h2o.init(startH2O = FALSE) browseURL("http://localhost:54321")
A browser window opens. It looks suspiciously like a Jupyter notebook.I am making progress!
The tea critters have tentacles and locomote independently
I’m sure I’m not going to get much further than this today. Before I close my eyes and drown my sorrows, it would be nice to check that
h2o is working and that I do now have the ability to invoke the deep learning gods. The function call
h2o.deeplearning() seems a little less than simple:
h2o.deeplearning(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, weights_column = NULL, offset_column = NULL, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, checkpoint = NULL, pretrained_autoencoder = NULL, overwrite_with_best_model = TRUE, use_all_factor_levels = TRUE, standardize = TRUE, activation = c("Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout"), hidden = c(200, 200), epochs = 10, train_samples_per_iteration = -2, target_ratio_comm_to_comp = 0.05, seed = -1, adaptive_rate = TRUE, rho = 0.99, epsilon = 1e-08, rate = 0.005, rate_annealing = 1e-06, rate_decay = 1, momentum_start = 0, momentum_ramp = 1e+06, momentum_stable = 0, nesterov_accelerated_gradient = TRUE, input_dropout_ratio = 0, hidden_dropout_ratios = NULL, l1 = 0, l2 = 0, max_w2 = 3.4028235e+38, initial_weight_distribution = c("UniformAdaptive", "Uniform", "Normal"), initial_weight_scale = 1, initial_weights = NULL, initial_biases = NULL, loss = c("Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile"), distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), quantile_alpha = 0.5, tweedie_power = 1.5, huber_alpha = 0.9, score_interval = 5, score_training_samples = 10000, score_validation_samples = 0, score_duty_cycle = 0.1, classification_stop = 0, regression_stop = 1e-06, stopping_rounds = 5, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0, score_validation_sampling = c("Uniform", "Stratified"), diagnostics = TRUE, fast_mode = TRUE, force_load_balance = TRUE, variable_importances = TRUE, replicate_training_data = TRUE, single_node_mode = FALSE, shuffle_training_data = FALSE, missing_values_handling = c("MeanImputation", "Skip"), quiet_mode = FALSE, autoencoder = FALSE, sparse = FALSE, col_major = FALSE, average_activation = 0, sparsity_beta = 0, max_categorical_features = 2147483647, reproducible = FALSE, export_weights_and_biases = FALSE, mini_batch_size = 1, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), elastic_averaging = FALSE, elastic_averaging_moving_rate = 0.9, elastic_averaging_regularization = 0.001, verbose = FALSE)
Perhaps I will decipher this strange inscription another time. There is, however, a simple example using the
iris data. I mindlessly cut and paste the code:
library(h2o) h2o.init(startH2O = FALSE) iris.hex <- as.h2o(iris) iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex, seed=123456) predictions <- h2o.predict(iris.dl, iris.hex) print(predictions)
## <environment: 0x7fcd46f3e3f0> ## attr(,"class") ##  "H2OFrame" ## attr(,"op") ##  "predictions_99ac_DeepLearning_model_R_1524913327346_4_on_iris" ## attr(,"id") ##  "predictions_99ac_DeepLearning_model_R_1524913327346_4_on_iris" ## attr(,"eval") ##  FALSE ## attr(,"nrow") ##  150 ## attr(,"ncol") ##  4 ## attr(,"types") ## attr(,"types")[] ##  "enum" ## ## attr(,"types")[] ##  "real" ## ## attr(,"types")[] ##  "real" ## ## attr(,"types")[] ##  "real" ## ## attr(,"data") ## predict setosa versicolor virginica ## 1 setosa 0.9999972 2.813364e-06 1.141541e-22 ## 2 setosa 0.9999558 4.422399e-05 1.026746e-20 ## 3 setosa 0.9999975 2.508734e-06 3.222464e-22 ## 4 setosa 0.9999946 5.424330e-06 2.625707e-21 ## 5 setosa 0.9999991 8.562619e-07 3.776109e-23 ## 6 setosa 0.9999971 2.911772e-06 2.107533e-21 ## 7 setosa 0.9999995 5.057584e-07 2.707050e-22 ## 8 setosa 0.9999957 4.287183e-06 3.881659e-22 ## 9 setosa 0.9999925 7.534163e-06 9.412641e-21 ## 10 setosa 0.9999786 2.138619e-05 1.336519e-21
Yessss. It’s up and running. I call that a success. I think tomorrow I’ll do something different and return to the sunless seas of deep learning another time.
The tea squid have evolved a rudimentary language.
I’m so proud of them.