This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.