The closest you have today is __constant memory where on-chip memory is used as a cache for read-only data, but if you need write access as well then this is not appropriate.
For read/writeable fast memory today you need to pair up __local and __global memory and perform scratch pad copies explicitly in your code which is typical OpenCL way to utilize the memory more efficiently. This has the limitation of only the work-group having visibility into the __local memory.
Stay tuned to future releases, as the compiler evolves the feature you are looking for may appear.