Deep thoughts on neural networks

 

Taking off on our own

The emergence of GPT has made discussions on deep neural networks and LLM (Large Language Model)
which is, an application of deep neural networks almost English. i.e., it is so common now that it is no more
limited to nerd talk. Having observed that we noticed a stark difference in one of the engagements with our
customers. We noticed ourselves perplexed with the choice of neural network architecture particularly
layering. How many layers, what is the shape of input etc. were never given enough time in our heads so
far because we knew the domain. It was only now that we got out of our comfort zone, we realized that
pain. Recollecting on our own past we remembered that we did struggle when we started but never spent
more time thinking about it in the past. We fixed, ourselves some tutorials and started cracking code out
and over time the heuristic knowledge served us well. This has come to bite us today with that customer.
Had we got the holes in our understanding filled with first principles we would have been in a better position
to explain the phenomena or reason why X layers and why should it be [x,y] shaped input? That is when we
fired our favourite text editor and set ourselves some directions or say hacks which we thought we will put
across for our readers which could serve them with some direction to their thoughts.
Assuming you are working on Keras (which could be easily any other framework no worries) your modelling
code will resemble
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
windProfile = loadtxt([Your data file], delimiter=',')
# split into input (X) and output (y) variables
X = windProfile[:,0:14]
y = windProfile[:,14]
model = Sequential()
model.add(Dense(18, input_shape=(14,), activation='relu'))
model.add(Dense(14, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=80, batch_size=20, verbose=0)
efficiency = (model.predict(X) > 0.5).astype(int)
for index in range(5):
print('%s => %d (expected %d)' % (X[index].tolist(), efficiency[index],
y[index]))
We can eloquently put this as
1. Read data
2. Determine the type of model
3. Stack the layers for the model processing
4. Compile the designed model
5. Test on sample data

This article throws more light on the steps – Determine the type of model and Stack the layers for model
processing.
The template
Start simple
You will have come across this suggestion often. It is applicable to modelling activities too. If a neural
network is what you are building start simple, start with Sequential. In Keras there is something called
Functional API which allows reusing layers and non-linear (with circular DAG structuring). You must
venture to those only when you are well aware of the fitment to your problem statement. i.e., you must
really feel the need to create non-linear (non-Sequential) layers until then sticking with Sequential is no
harm. Similarly, the completely connected layer serves a multitude and variety of purposes so sticking with
the Dense layer makes sense.
Answering the why; Do we need multiple layers?
The answer in the present age is Yes, duh! But we get mostly silence when posting the question – Why?
We solved this for ourselves by asking – Can we model the problem linearly? So, if we are doing
classification the screening question is – Do we have only two classes? In case we filter through that then
we ask ourselves – Can we draw a line in the ensemble and separate the classes? You will concur that we
immediately get no for these two questions. Almost all real-life problems are beyond binary classes and are
not linearly separable. Though, this serves as a yardstick the curious mind asks – What is the relation of a
layer to classes or linear separability? You must recall that layer is nothing but a grouping of nodes. The
nodes perform a mathematical function on the input and pass the result if an activation function’s criterion is
satisfied otherwise not. If there is only one layer this operation is performed only once and thus the problem
space that we model must also be describable by that one operation. The nature of the mathematical
function does not have bearing it is the number of times it is applied to perform the transformation that
counts.
You must have realized that there is no one recipe for determining the number of layers. Each problem
based on the training ensemble could be modelled in multiple ways and thus the model.add will be called
multiple times and varies from problem space to problem space.
Determining the count
In the example code above, we created a model with 3 layers. They can be spotted by the call to
model.add in the code. Now for any given real-life problem how do you determine how many is sufficient?
To begin with, it appears a daunting task. However, there is a method to that madness.
Trial and error
This is the best approach, try for yourself and then if you see that model fitness is not satisfactory keep
trying more. The flip side of this is there is no method it is more heuristic and only over time and your
understanding of the domain will help you shorten the cycles.
Art of reduction
A pragmatic approach is to start deep. Add as many layers as possible.
But wait.
How do I know how many layers can be added?
Remember, layers contain nodes and nodes perform a mathematical function. How many layers can be
answered by flipping the question to How many can you afford computationally? Can you afford to model
for days and weeks? Can you afford to model at such a cost (cost for running the compute resources)?
Those are some practical ways to find how deep you can go.
Once you hit the ceiling (or better floor) of depth you keep removing layers which is not impacting your
model metrics detrimentally. Does this sound like trial and error? We beg to differ; this is a calculated
approach where there is economic will that is driving the modelling exercise.

If you have good sponsors who are willing to spend on computing if justified, try considering search which
can make the art of reduction even more scientific. So, the search algorithms run multiple layer
configurations and mathematically minimise the losses and suggest the best network architecture.
Read literature
If you have time and the cost of going wrong is high, it is appropriate that you spend time reading up the
domain of your problem statement. What others have done in solving a problem similar to yours in the
domain and what kind of layer architecture have they used will help you make educated guesses and then
you could trickle towards better modelling metrics.