Task 3: Reinforcement Learning

For this task, you'll be designing a reinforcement learning environment and training a spacecraft to perform a transfer autonomously.

To accomplish this, modify your orbit propagator code to accept some control input \(u\) in the acceleration and construct a custom environment in Stable Baselines 3.

Minimum Requirements

A custom environment in SB3 that resets when the spacecraft gets too impacts the earth or to far past GEO
A DQN agent with discrete actions \(\mathbf{u} = [+\Delta V, -\Delta V]\)
An spacecraft agent that can autonomously perform a transfer from LEO to GEO.
A reward function that penalizes distance to the target altitude

Stretch Goals

Modify your environment to accept continuous actions \(\mathbf{u} \in \mathbb{R}^3\)

Recommendations

Make your environment as fast as possible, and decide how big of a time step should exist between steps in the environment.
Preload the replay / experience buffer with data from a heuristic policy that you know will give decent performance.