Financial assets are a type of source of income for the retired. There are problems regarding decisions in managing financial assets. One way to manage financial portfolios is by using reinforcement learning. Reinforcement learning is a machine learning method that uses rewards and punishments as feedback for various learning agents to maximize cumulative rewards. This method requires a Markov Decision Process (MDP) environment to describe the problem to the learning agent. We will use Actor Critic using Kronecker-Factored Trust Region (ACKTR), which reduces the amount of sampling needed for learning. ACKTR excels in sample reduction by using Kronecker-Factored Approximated Curvature (K-FAC) and trust region in approximating the natural gradient for an agent's learning. In this research, we use an MDP that describes personal retirement portfolio with securities, before implementing ACKTR and analyzing it. MDP used in this research is constructed as model-free MDP and total reward (cumulative reward) is influenced by discount rate and action. Results of this implementation shows that lower discount rates are superior to higher discount rates in average of cumulative rewards, and higher discount rates lead to riskier agents.