That's branching and then coalescing, right? It selects a path that is weighted as being most beneficial to the input?
Given you pointed out how even the vertical part of the architecture allows for skipping layers anyway, isn't that essentially the same thing?
That's branching and then coalescing, right? It selects a path that is weighted as being most beneficial to the input?
Given you pointed out how even the vertical part of the architecture allows for skipping layers anyway, isn't that essentially the same thing?