Fluid EDL Performance and Availability

](https://1.bp.blogspot.com/-sp_sVZvhMbU/WiYgXMLQKuI/AAAAAAAAAIM/uc_3iT9BZmAtQGiGGSErgueHK71uWMBCACEwYBhgL/s1600/figure-1.png) | | _Figure 1. Fluid EDL evenly distributes resource among jobs._ | In the second test, each experiment ran 400 Nginx pods, which has higher priority than the six PaddlePaddle jobs. Initially, each PaddlePaddle job had 15 trainers and 10 parameter servers. We killed 100 Nginx pods every 90 seconds until 100 left, and then we started to increase the number of Nginx jobs by 100 every 90 seconds. The upper part of Figure 2 shows this process. The middle of the diagram shows that Fluid EDL automatically started some PaddlePaddle processes by decreasing Nginx pods, and killed PaddlePaddle processes by increasing Nginx pods later on. As a result, the cluster maintains around 90% utilization as shown in the bottom of the figure. When Fluid EDL was turned off, there were no PaddlePaddle processes autoincrement, and the utilization fluctuated with the varying number of Nginx pods. | [![](https://4.bp.blogspot.com/-gOMFfnaygSU/WiYgXO_KJ0I/AAAAAAAAAII/lMLjTGNGYhsovwKornCzMZBhEdMdPI5HACLcBGAs/s640/figure-2.png)](https://4.bp.blogspot.com/-gOMFfnaygSU/WiYgXO_KJ0I/AAAAAAAAAII/lMLjTGNGYhsovwKornCzMZBhEdMdPI5HACLcBGAs/s1600/figure-2.png) | | _Figure 2. Fluid changes PaddlePaddle processes with the change of Nginx processes._ | We continue to work on FluidEDL and welcome comments and contributions. Visit the [PaddlePaddle repo](https://github.com/PaddlePaddle/cloud), where you can find the [design doc](https://github.com/PaddlePaddle/cloud/tree/develop/doc/design), a [simple tutorial](https://github.com/PaddlePaddle/cloud/blob/develop/doc/autoscale/example/autoscale.md), and [experiment details](https://github.com/PaddlePaddle/cloud/tree/develop/doc/edl/experiment).

The second test involved running Nginx pods (high priority) and PaddlePaddle jobs. Fluid EDL dynamically adjusted PaddlePaddle processes based on Nginx pod changes to maintain cluster utilization around 90%. Without Fluid EDL, utilization fluctuated. Further work is ongoing, and contributions are welcomed; resources including the design document, tutorial, and experiment details are available at the PaddlePaddle repository.