Curriculum learning posits that you get better results if you gradually increase the training "difficulty". That is, learn to walk before you run. So you'd do "additions and multiplications first" and then "now draw the rest of the integral" :)
RL - Reinforcement Learning. You have a carrot and a stick. You run a model through iterations (in LLMs you generate n completions), you score each of them based on some reward functions, and if the result is correct you give it a carrot (positive reward), if the result is incorrect you give it a stick (negative or 0 reward). (simplified ofc)
OpenAI gym is (was?) an environment that allowed "AI agents" to be simulated in an environment. You could for example play games, or solve puzzles, or things like that. oAI gym was a "wrapper" over those environments, with a standardised API (observe, step (provide action), reward; rinse and repeat). You could for example have an agent that learned to land a lunar lander in a simple game. Or play chess. Or control a 3d stick figure in a maze.
RL - Reinforcement Learning. You have a carrot and a stick. You run a model through iterations (in LLMs you generate n completions), you score each of them based on some reward functions, and if the result is correct you give it a carrot (positive reward), if the result is incorrect you give it a stick (negative or 0 reward). (simplified ofc)
OpenAI gym is (was?) an environment that allowed "AI agents" to be simulated in an environment. You could for example play games, or solve puzzles, or things like that. oAI gym was a "wrapper" over those environments, with a standardised API (observe, step (provide action), reward; rinse and repeat). You could for example have an agent that learned to land a lunar lander in a simple game. Or play chess. Or control a 3d stick figure in a maze.