A scalable shared memory multiprocessing (SSMM ) system ( Fig.4) can be a good example for the best parallelization of my ANN.
Figure 4: Example of Scalable Shared Memory Parallel Computer
As the name says, SSMM is a shared memory parallel computer with a hierarchical configuration to be scaled. The SSMM architecture is found in  or , for example.
In my example architecture shown in Fig.4, the system consists of N processor clusters and a global memory system connected by a global network. Each processor cluster is constructed by n processors and a local memory system connected by a local network. The global network should be an interconnection network while the local network can be a simple bus network.
Each processor of the system can access any part of the global memory while the local memory can be accessed by the processors which are in the same processor cluster. I also assume that each processor is a SuperScalar RISC processor with m function units.
The three levels of parallelism for my ANN from the viewpoint of granularity.
The coarse one can be viewed as a layer oriented parallelism in a pipelining fashion. The middle one is neuron parallelism which takes advantage of the fact that each neuron in the same layer can be calculated independently. The fine one, synapse level parallelism, is based on the computational characteristics of ANNs: dot product.
The coarse grain parallelism can be processed at processor clusters and global memory level on the global network of the example architecture: each processor cluster corresponds to one layer of my ANN. The middle one is suitable for processors and local memory: each processor does the independent calculation of each neuron. The fine one can be represented as Instruction Level Parallelism (ILP) which makes an efficient use of function units on the SuperScalar Processor.