I see, that has been added in the newer versions of the compiler; didn't exist in the older versions. However, that seems to be something the compiler decides on based on the characteristics of the memory accesses, rather than something the programmer/user can explicitly control. Furthermore, the compiler will never analyze global memory access dependencies between two separate kernels and hence, such LSU will never be created by the compiler for your case. Based on the example in the guide, this LSU is created for cases a write-after-write dependency exists in the code; needless to say, such dependency is a false dependency and any sane compiler will optimize out the first write and only keep the second one. I fail to see why Intel even needed to add support for this LSU type...
What you are looking for is likely the atomic memory read/write I mentioned earlier.