In this study, we propose an action recognition model that provides generalized performance regardless of camera location and distance between the camera and human. The proposed model consists of two-stage networks, namely, human detection and action recognition. The proposed method operates on video frames that are resized by a new zoom-in method using pretrained Yolo v3. To use temporal information, which is regarded as a critical factor in action recognition, we adopt the R(2+1)D model, which is a factorized model capable of representing more complex networks. The proposed Zoom-In method yields generalized performance regardless of distance. In an experiment, the proposed method exhibited accuracies of 96.07%, 96.61%, and 94.55% in the short, medium, and long ranges in which our datasets were employed, respectively.