The Best Practices Guide says:
"The Kernel Execution tab also displays information on memory transfers between the host and your devices"
I don't think "memory transfers" here refer to "global memory" since I highly doubt the profiler implements separate counters for global memory traffic. Furthermore, memory and compute operations generally overlap in a kernel and it would not be very easy to separate them with run-time counters. The guide does not address this topic very clearly, so I am not sure how the information you have obtained from the profiler can be interpreted.
Regarding the low occupancy, it could simply be caused by your code performing memory operations less frequently that compute. "Best Practices Guide, Section 4.3.4.2. Low Occupancy Percentage" contains the official guidelines to improve occupancy.