ML 2: A Discussion of Action Spaces
27 June, 2019 - 2 min read
|finite number of actions which can be taken||infinite amount of actions which can be taken|
|e.g. or||e.g. to apply to a wheel|
|easier to conceptualize and evaluate as the action set is finite and therefore iterable||Action space can be differentiated which is advantageous because this allows us to identify similarities between actions|
|can not be differentiated, therefore actions like may be both adjacent and highly dissimilar||e.g.
can be grouped by trend
|is easier to find, as there is an exhaustible set of actions and policies to be evaluated||continuous action-spaces are superior because, theoretically, andaction which immediately solves the given problem. While stipulation that most will be so downright wank that you'll want to terminate the simulation, the infinite size of dictates that|
|we can compensate for the limitations of the discrete action space by identifying the region about the global maximum at any time of and discretizing it via some function gamma –s.t. where is some factor that creates a distribution of actions rather than a set of identical actions - such that we end up with a dense action space each member of which is better, on average, than a random
see Fig. 1
This solution, however eloquent, is also constrained by because no matter how "global" a theoretical maximum , . We can resolve this caveat and potential resource leak (forever searching for a global maximum in an infinite space) by defining a such that the system is satisfied if a global maximum has not been found in an arbitrary time steps.