We propose a bottom-up approach, based on Reinforcement Learning, to the design of a chain achieving efficient excitation-transfer performances. We assume distance-dependent interactions among particles arranged in a chain under tight-binding conditions. Starting from two particles and a localised excitation, we gradually increase the number of constitutents of the system so as to improve the transfer probability. We formulate the problem of finding the optimal locations and numbers of particles as a Markov Decision Process: we use Proximal Policy Optimization to find the optimal chain-building policies and the optimal chain configurations under different scenarios. We consider both the case in which the target is a sink connected to the end of the chain and the case in which the target is the right-most particle in the chain. We address the problem of disorder in the chain induced by particle positioning errors. We are able to achieve extremely high excitation transfer in all cases, with different chain configurations and properties depending on the specific conditions.