From your answer to my comment, it seems that the first time you run this request, a lot of physical reads or reads are performed, which means that it takes a lot of IO pool to enter the correct pages into the buffer to satisfy this request.
As soon as the pages are read into the buffer pool (memory), they usually remain there, so the physical IO is not required to read them again (you can see that this happens because you indicated that the physical reads are converted to logical reads the second runtime request). The memory is an order of magnitude higher than that of the IO drive, therefore, the speed difference for this request.
Looking at the plan, I can simply see that each read operation is performed against the clustered index of the table. Because the clustered index contains each column for a row, it potentially retrieves more data per row than is actually required by the query.
If you do not select each column from each table, I would suggest creating non-clustered coverage indexes that satisfy this query (which will be as narrow as possible), this will reduce the I / O requirement for the query and make it less expensive for the first time.
Of course, this may not be possible / viable for you, and in this case you should either just take a hit at the first start, or not empty the caches, or rewrite the request itself in order to be more efficient and perform less reads.
source share