Using the simulation with ``gym.Env``
===================================
`gym.Env `_ is a framework to produce a live simulation that can be incremented in parallel.
Most commonly this is used in RL applications, like for Rllib or Stable Baselines, as their high computational cost requires the efficiency of parallelization.
In order to change the the simulation into a gym.Env, subclass ``gym.Env`` and the ``DeepSurveySim.Survey`` functionality.
It is recommended to check your work with gym's `Envoriment Checker `_ before using it in any application.
Initialization
---------------
The initalization method of the subclass has to initalize both the ``Survey`` and ``Env`` super classes.
In the case of the ``Survey``, this just means calling the ``super().__init__()`` method with the necessary configuration files.
For ``Env``, this means including the ``self.action_space`` and ``self.observation_space`` attributes.
The bare requirements for an initalized action space is as follows:
.. code-block:: python
def __init__(self, obs_config, survey_config):
super().__init__(observatory_config=obs_config, survey_config=survey_config)
self.action_space = self.generate_action_space()
self.observation_space = self.generation_observation_space()
Action Space
--------------------------
These ``spaces`` must be either ``spaces.Box``, ``spaces.Discrete`` instances, (in the case of actions that are single variables), or ``spaces.Dict`` instances (where they are made up of multiple variables).
`View the spaces documentation for further details. `_
For this specific program, the logical action space is made up of the variables that are input into ``Survey.update()``, the continious ``ra``, ``decl`` and the discete ``band``.
This would produce an action space
.. code-block:: python
spaces.Dict(
{
"ra": spaces.Box(low=0, high=360, shape=(1,), dtype=np.float32),
"decl": spaces.Box(low=-90.0, high=90.0, shape=(1,), dtype=np.float32),
"band": spaces.Discrete(len(self.observator.band_wavelengths))
}
)
However, some of these actions can be held constant, and thus the action space becomes:
.. code-block:: python
spaces.Box(low=0, high=360, shape=(1,), dtype=np.float32)
Or if you wish to descetize the action space (the below example assumes the ``step`` function maps the actions to 10 unique ra/decl options; it would also be useful to store this map in the envoriment's ``__init__``):
.. code-block:: python
spaces.Dict(
{
"ra": spaces.Discrete(10),
"decl": spaces.Discrete(10),
"band": spaces.Discrete(len(self.observator.band_wavelengths))
}
)
Observation Space
-------------------
The observation space behaves very simularly to the action space in terms of programming.
The only difference is that this defines the format of the expected output of ``step`` method.
By consequence, it is encouraged to only define the observation in terms of the variables in ``Survey.observatory_variables``.
For example, a configuration file that contains the line
.. code-block:: yaml
variables: ["airmass", 'ha']
Would logically have the ``observation_space``:
.. code-block:: python
spaces.Dict(
{
"airmass": spaces.Box(low=-100000, high=100000, shape=(1,), dtype=np.float32),
"ha": spaces.Box(low=-100000, high=100000, shape=(1,), dtype=np.float32),
}
)
This space is much larger than is stricitly required for these variables, but if you wish to define the spaces automatically, using this wider range is encouraged.
Step and Reset
---------------
``step`` and ``reset`` are also required by ``gym.Env``, and are the core of the program.
``step`` defines how the simulation is updated and what is returned (and what format), and ``reset`` returns the program back to its inital condition.
The ``super().step(action)`` from ``TelescopePositioningSimulator.Survey`` already handles updating the simulation, so all that is required of the subclass is formatting.
The ``action`` argument of ``step`` requests a dictionary containing ``time``, ``location``, ``band``.
This can be achieved by formatting the passed action into:
.. code-block:: python
{
"time": self.time,
"location": {"ra": action["ra"], "decl": action["decl"]},
"band": action["band"]
}
``time`` is the only required parameter, so if ra, declination, or band are held constant, they need not be passed.
The ``action`` given to the ``step`` function depends on the variables defined in the ``action_space``
The ``super().step()`` returns the calculated observation (containing the variables from ``Survey.observation_variables``) as a dictionary of arrays, the reward, as an array, the stop condition, as an array, and a 'log' (dictionary with possible diagonistic data).
The format step will need to return depends on the framework being used and the specifics of your ``observation_space``.
For example, a discrete observation space will require:
.. code-block:: python
new_observation = {
key: mapping_rule(observation[key]) for key in self.observation_space
}
Where the ``mapping_rule`` defines how the variable ``obvervation[key]`` is discetized.
Or a continious ``np.ndarray`` observation space will be:
.. code-block:: python
new_observation = {
key: np.array(np.nan_to_num(observation[key], copy=True).ravel(), dtype=np.float32,) for key in self.observation_space
}
``reset`` also requires an observation be formated as defined in ``self.observation_space``, but also requires the ``super().reset()`` method is called.
The state of the simulation can then be accessed with ``self._observation_calculation()``
.. code-block:: python
def reset(self, *, seed=None, options=None):
super().reset()
observation = self._observation_calculation()
return observation
Example
--------
The below example shows a bare-bones envoriment with outputs designed for an `rllib` trained algorithm to interact with.
.. code-block:: python
import numpy as np
from gymnasium import spaces, Env
from DeepSurveySim.Survey import Survey
from DeepSurveySim.IO import ReadConfig
class GymSurvey(Survey, Env):
def __init__(self, kwarg):
obs_config = ReadConfig(kwarg["observatory_config"])()
survey_config = ReadConfig(kwarg["survey_config"], survey=True)()
super().__init__(observatory_config=obs_config, survey_config=survey_config)
self.action_space = spaces.Dict(
{
"ra": spaces.Box(low=0, high=360, shape=(1,), dtype=np.float32),
"decl": spaces.Box(
low=-90.0, high=90.0, shape=(1,), dtype=np.float32
),
}
)
self.observation_space = spaces.Dict(
{
"airmass": spaces.Box(
low=-100000, high=100000, shape=(1,), dtype=np.float32
),
"alt": spaces.Box(
low=-100000, high=100000, shape=(1,), dtype=np.float32
),
"sky_magnitude": spaces.Box(
low=-100000, high=100000, shape=(1,), dtype=np.float32
),
"teff": spaces.Box(
low=-100000, high=100000, shape=(1,), dtype=np.float32
),
}
)
def reset(self, *, seed=None, options=None):
super().reset()
observation = self._observation_calculation()
observation = {
key: np.nan_to_num(observation[key], copy=True)
for key in self.observation_space
}
return observation, {}
def step(self, action: dict):
new_action = {
"time": self.time,
"location": {"ra": action["ra"], "decl": action["decl"]},
}
observation, reward, stop, log = super().step(new_action)
truncated = False # Additional truncated flag required by RLLib
observation = {
key: np.array(
np.nan_to_num(observation[key], copy=True).ravel()[0],
dtype=np.float32,
).reshape(
1,
)
for key in self.observation_space
}
reward = reward.ravel()[0]
return observation, reward, stop, truncated, log