This study presents a deep reinforcement learning-based control strategy for a two-wheeled inverted pendulum robot with a longitudinally extended body. While the structure offers a large top surface suitable for payload loading, it also increases the difficulty of maintaining balance due to greater inertial effects under external disturbances, especially on rough terrain. To address these challenges, we adopt a deep reinforcement learning approach to train robust control policies that leverage a sliding mechanism to actively regulate the robot’s center of mass. The Proximal Policy Optimization (PPO) algorithm was used to train the policy in a simulation environment built with NVIDIA Isaac Sim. The training process followed a curriculum consisting of balance maintenance, flat-terrain driving, and rough-terrain traversal, with observation noise and reward shaping incorporated to enhance robustness. Simulation results demonstrate that the proposed controller maintains stable posture and driving performance under disturbances, achieving a pitch angle RMSE of 0.295°, which is more than 10 times lower than that of an OSQP-based model predictive controller.