Last reviewed / updated 2023-07-26
.NET 7 introduced Generic Math; this allows for code that performs mathematical operations on abstract numeric interface types instead of specific types (such as double and single-precision IEEE floating point numbers).
SharpNEAT currently employs double-precision floating-point arithmetic throughout, such as for connection weights and neural net computations. With the introduction of Generic Math in .NET 7, it should become much simpler to support the use of single-precision floats without the need to duplicate extensive amounts of code solely to change the floating point type.
Using single-precision floats should offer performance improvements. Each single-precision value needs only half the memory compared to double-precision floats. This not only improves transfer speeds between main memory and the CPU but also doubles the number of values that can be accommodated in the same CPU cache space or stored in a CPU SIMD vector register.
SharpNEAT already contains some code that uses SIMD CPU instructions via use of the Vector<T> class. However, further performance gains may be possible by using Hardware Intrinsics in .NET Core, as these avoid the abstraction layer provided by Vector<T>, and expose more of the capabilities of the underlying CPU hardware. For instance, the neural net code may benefit from leveraging scatter-gather SIMD instructions, or vector dot product instructions.
Speciation by comparing genomes is possibly flawed in some significant ways; this fits the narrative around novelty search research, and how following an objective function may not lead you to the desired objective.
For now the ideas under this heading are best covered by a number of fairly rambling blog posts, which I hope to condense into something more concrete at some future time...
The relative merits of single versus double precision floats.
There is an open question regarding how much precision is required in neuro-evolution methods. E.g. for gradient descent learning, additional numerical precision certainly has benefits; allowing more accurate representation of small values, and reduced rounding/numerical error generally. There is clearly a trade-off between improved precision, and the extra storage space and memory transfer overhead and such. In NEAT a very similar question arises - do the performance benefits offered by lower precision values outweigh the detrimental effects of the lower numeric precision?
A single precision float is 32 bits (4 bytes); double precision is 64bits (8 bytes). Therefore there is a potential speed improvement to be gained by using less precision, in terms of fitting more weights into CPU caches, efficient use of memory bandwidth and the ability to apply SIMD instructions to 2x as many weights in one operation (at time of writing 256 bit SIMD instructions are common in consumer CPUs).
A broader question might be whether the precision could be reduced further, given the existence of half precision floats
Fwiw, I seem to recall productively reducing at least some of the weights in Inkwell (Apple's handwriting recognition engine) to 8 bit logarithmic values (expanded on the fly via a small lookup table). We'd periodically discretize them this way during training so that over time the weights could compensate for each others' discretization errors (otherwise if you just lop blindly at the end the odds are too high of systematic shifts that add up over large numbers of weights). In general I think the more abstract the representation, the less granularity you need. That is, whether a pixel is brighter than its neighbor or not can require quite a precise measurement, and this can matter, as can precise averages over a large number of noisy pixels; but usually whether something is a dog or a tree, and the relative implications thereof, is relatively high contrast--it's not that borderline cases never exist, but just that they're increasingly rare as you ascend the hierarchy of abstraction. You also generally have far fewer examples of more abstract concepts, so there's often little basis for a high precision tally thereof.
SharpNEAT is able to parallelize evaluation of genomes across multiple CPU cores. However, the efficiency of this approach likely falls with increasing core counts. This is because the evolutionary algorithm operates on the basis of distinct generations, and therefore there will typically be CPU cores with no work to do between generations if the work has not been evenly distributed between cores. To some extent this depends on the evaluation scheme, i.e., if a single evaluation is long running and highly variable in duration, then it is much harder to evenly distribute work across the CPU cores.
The current version of SharpNEAT (4.x) does not have any tasks with a 2D rendered visualix=zation, but in SharpNEAT 2.x the box2dx library was used for some of the tasks. There is a detailed overview of the situation regarding use of this 2D physics engine and the rendering library being used with it, at gihub.com/colgreen/box2dx.
Ultimately it may be wise to switch to another 2D engine such as the Box2D .NET Standard. For 3D physics and video rendering see bepuphysics v2.