SharpNEAT Project Roadmap

Last reviewed / updated 2023-07-26

Generic Math and support for Single-Precision Floats

.NET 7 introduced Generic Math; this allows for code that performs mathematical operations on abstract numeric interface types instead of specific types (such as double and single-precision IEEE floating point numbers).

SharpNEAT currently employs double-precision floating-point arithmetic throughout, such as for connection weights and neural net computations. With the introduction of Generic Math in .NET 7, it should become much simpler to support the use of single-precision floats without the need to duplicate extensive amounts of code solely to change the floating point type.

Using single-precision floats should offer performance improvements. Each single-precision value needs only half the memory compared to double-precision floats. This not only improves transfer speeds between main memory and the CPU but also doubles the number of values that can be accommodated in the same CPU cache space or stored in a CPU SIMD vector register.

X86 Intrinsics

SharpNEAT already contains some code that uses SIMD CPU instructions via use of the Vector<T> class. However, further performance gains may be possible by using Hardware Intrinsics in .NET Core, as these avoid the abstraction layer provided by Vector<T>, and expose more of the capabilities of the underlying CPU hardware. For instance, the neural net code may benefit from leveraging scatter-gather SIMD instructions, or vector dot product instructions.

Longer Term

Integration with Native Math Libraries

The ANN code can tap into highly optimized matrix-vector multiplication subroutines provided by natively compiled math libraries. In particular NEAT should benefit from sparse matrix sub-routines that fully utilise CPU and GPU capabilities such as vector/SIMD instructions, FMA (fused multiple and add) instructions, and massive parallelism. Math libs of note are:
Of particular note is the plug-in native math lib support in mathnet-numerics. SharpNEAT could potentially use this to gain access to the abstractions that mathnet provides, thus supporting a widespread range of options rather than tying SharpNEAT to one or two. However, a recent check discovered that the Intel MKL provider did not have support for MKL's sparse matrix sub-routines, so this may have to be addressed (not sure about the other providers, e.g CUDA in particular warrants strong attention).

Speciation Research

Speciation by comparing genomes is possibly flawed in some significant ways; this fits the narrative around novelty search research, and how following an objective function may not lead you to the desired objective.

For now the ideas under this heading are best covered by a number of fairly rambling blog posts, which I hope to condense into something more concrete at some future time...

Notes / Miscellany

Performance - Floating Point Precision

The relative merits of single versus double precision floats.

There is an open question regarding how much precision is required in neuro-evolution methods. E.g. for gradient descent learning, additional numerical precision certainly has benefits; allowing more accurate representation of small values, and reduced rounding/numerical error generally. There is clearly a trade-off between improved precision, and the extra storage space and memory transfer overhead and such. In NEAT a very similar question arises - do the performance benefits offered by lower precision values outweigh the detrimental effects of the lower numeric precision?

A single precision float is 32 bits (4 bytes); double precision is 64bits (8 bytes). Therefore there is a potential speed improvement to be gained by using less precision, in terms of fitting more weights into CPU caches, efficient use of memory bandwidth and the ability to apply SIMD instructions to 2x as many weights in one operation (at time of writing 256 bit SIMD instructions are common in consumer CPUs).

A broader question might be whether the precision could be reduced further, given the existence of half precision floats

Notes / Links

New Features in CUDA 7.5: 16-bit Floating Point (FP16) Data
Performance Benefits of Half Precision Floats
IEEE 754 half-precision binary floating-point format: binary16
Techniques and Systems for Training Large Neural Networks Quickly, Jeff Dean, Google - "Neural nets are very tolerant of reduced precision: 8 bits or less for inference 12 to 14 bits for training", pg 41

Response from Brandyn Webb

Fwiw, I seem to recall productively reducing at least some of the weights in Inkwell (Apple's handwriting recognition engine) to 8 bit logarithmic values (expanded on the fly via a small lookup table). We'd periodically discretize them this way during training so that over time the weights could compensate for each others' discretization errors (otherwise if you just lop blindly at the end the odds are too high of systematic shifts that add up over large numbers of weights). In general I think the more abstract the representation, the less granularity you need. That is, whether a pixel is brighter than its neighbor or not can require quite a precise measurement, and this can matter, as can precise averages over a large number of noisy pixels; but usually whether something is a dog or a tree, and the relative implications thereof, is relatively high contrast--it's not that borderline cases never exist, but just that they're increasingly rare as you ascend the hierarchy of abstraction. You also generally have far fewer examples of more abstract concepts, so there's often little basis for a high precision tally thereof.

CPU Core Counts

SharpNEAT is able to parallelize evaluation of genomes across multiple CPU cores. However, the efficiency of this approach likely falls with increasing core counts. This is because the evolutionary algorithm operates on the basis of distinct generations, and therefore there will typically be CPU cores with no work to do between generations if the work has not been evenly distributed between cores. To some extent this depends on the evaluation scheme, i.e., if a single evaluation is long running and highly variable in duration, then it is much harder to evenly distribute work across the CPU cores.

Review 2D Physics Engine

The current version of SharpNEAT (4.x) does not have any tasks with a 2D rendered visualix=zation, but in SharpNEAT 2.x the box2dx library was used for some of the tasks. There is a detailed overview of the situation regarding use of this 2D physics engine and the rendering library being used with it, at gihub.com/colgreen/box2dx.

Ultimately it may be wise to switch to another 2D engine such as the Box2D .NET Standard. For 3D physics and video rendering see bepuphysics v2.

Miscellany

Replace roulette wheel selection with stochastic universal sampling (SUS).
Species visualisation(s).
Simple primer tutorial(s).
Periodic innovation ID defragmentation.
Distributed NEAT. Island model. (+ fast binary serialization/IO).
3D HyperNeat substrate/network visualization. Where # of connections is large visualization is possible by randomly thinning out the shown connections. This approach is used in TrackVis (http://www.trackvis.org/) to visualise brain fibers.