Reinforcement Learning Approach to Nonequilibrium Quantum Thermodynamics