Porting MXNet's Neural-Style Python example to MXNet.jl

Posted on January 11, 2016 by Brian Cohen

The following was copied from https://github.com/dmlc/MXNet.jl/issues/56 to document what was needed to port Neural Style (which transfers the “style” of one image to antoher using an image recognition neural network), and may be modified from the original to add context.

These notes intend to document what I’ve had to do to get a working implementation of A Neural Algorithm of Artistic Style from within Julia. These may or may not be applicable for other examples ported from Python to Julia, but are here for reference.

MXNet.jl

Official documentation might be a good place to start.

argparse to ArgParse.jl

ArgParse.jl’s documention

This library is used to turn it into a command-line program.

Named tuples to Composite types

Julia documentation on (composite) types

This translation is pretty easy, and Julia’s built-in composite types are very straightforward to work with, compared to Python’s named tuples which require from collections import namedtuple. For the ConvExecutor, it means going from

Executor = namedtuple('Executor', ['executor', 'data', 'data_grad'])

type SGExecutor
    executor :: mx.Executor
    data :: mx.NDArray
    data_grad :: mx.NDArray
end

I also renamed it just to avoid ambiguity. The type signatures are optional, but I included them for type safety.

Row-major order to Column-major order arrays

If interested, read the wikipedia article here, but summarized, Julia tends to think of a matrix as a array of column-vectors, while Python natively stores it as a list of row lists. The major difference when it comes to dealing with multi-dimensional arrays is that the ordering of shape is inverted.

Example:

out.infer_shape(data=(1, 3, input_size[0], input_size[1]))

would become the following in Julia

mx.infer_shape(out, data=(input_size[1], input_size[2], 3, 1))

Memory leaks

As of Julia v0.4, some methods cannot be directly overwritten such as assignment operators (_ ⊗= _ for some binary operation ⊗ in [-,+,/,*]), so for lines that take the form a[:] ⊗= ..., you can take one of several approaches to avoid unnecessary use of graphics memory, all of which are described in the NDArray source code.

b=copy(a::NDArray) to a julia array then perform your operations and then copy!(a,b) back.
some combination of mx.mult_to!, mx.div_from!, etc. for directly modifying the NDArray
Use the macro mx.@nd_as_jl to work as if you were just using native arrays

This takes arguments ro for NDArrays that are read and rw for those that are written to. This macro copies everything to native Julia arrays and then writes them back into the NDArrays.

Implement Factor-based Learning Rate

This required modifying Mocha.jl’s src/optimizer.jl as to include an unimplemented subtype of AbstractLearningRateScheduler so it can cooperate with the Stochastic Gradient Descent. There may be a better way to do this. Also not entirely comfortable with the robustness of my implementation as of the time of writing.