Fall is the second leading cause of accidental injury and death worldwide. This event often occurs in the elderly and the frequency is increasing every year. Reliable fall activity detection system can reduce the risk of injuries suffered. Since falls are unwanted events or occur suddenly, it is difficult to collect actual fall data. It is also difficult because of the similarity to some activities such as squatting and picking up objects from the floor. In addition, in recent years the publicly available fall datasets are limited. Therefore, in 2019, some researchers tried to create a comprehensive fall dataset that simulates the actual events using camera and sensor devices. The experiment produced a multimodal dataset UP-Fall. Using this dataset, this work tries to detect falling activity using Convolutional Neural Network and Long Short-Term Memory approaches. CNN is used to detect spatial information from image data, while LSTM is used to exploit temporal information from signal data. Then, the results of the two models are combined with the majority voting strategy. Based on the evaluation results, CNN obtained an accuracy of 98.49% and LSTM 98.88%. Both models contribute to the performance of the majority voting strategy with the result that the accuracy (98.31%) exceeds baseline accuracy (96.4%). Other evaluation metrics also improved such as precision goes up to 11%, recall 14%, and F1-score 12% in comparison with baseline.