Task 3: Reinforcement Learning
For this task, you'll be designing a reinforcement learning environment and training a spacecraft to perform a transfer autonomously.
To accomplish this, modify your orbit propagator code to accept some control input \(u\) in the acceleration and construct a custom environment in Stable Baselines 3.
Minimum Requirements
- A custom environment in SB3 that resets when the spacecraft gets too impacts the earth or to far past GEO
- A DQN agent with discrete actions \(\mathbf{u} = [+\Delta V, -\Delta V]\)
- An spacecraft agent that can autonomously perform a transfer from LEO to GEO.
- A reward function that penalizes distance to the target altitude
Stretch Goals
- Modify your environment to accept continuous actions \(\mathbf{u} \in \mathbb{R}^3\)
Recommendations
- Make your environment as fast as possible, and decide how big of a time step should exist between steps in the environment.
- Preload the replay / experience buffer with data from a heuristic policy that you know will give decent performance.