Analysis of visual drone images in various domains is increasing due to its ability to see the object from different perspectives and certain distant. Nowadays, small object detection from drones images is expected to overcome various environmental challenges such as illumination and motion change. Thermal infrared in addition to RGB images (RGBT) appears to undertake these challenges. Research on how this combination can give the best performance in object detection is still become attractive problem to be solved. In this study, The state of the art in object detection You Only Look Once (YOLO) v4 performance has been demonstrated by conducted three scenarios training methods in both RGB and thermal infrared (TIR) images dataset with unmanned aerial vehicle (UAV) perspective view. The dataset has been manually annotated to be compatible in YOLO format and the annotation will be release to the community. The pre-trained YOLO weight with minimal fine tuning also has been utilized to determine the transfer learning influence on the new aerial perspective dataset. The experimental result shows that with pre-trained model transfer learning from MS COCO dataset can improved the YOLOv4 human detection with Average Precision (AP) up to 91.18 % and 78.24 % in RGB and TIR dataset, respectively.