A review of Investigating the consequences of accidentally grading CoT during RL