What part are you targeting? Yes, this is where it's generally easier to use the SRL LUT, and where I can eat some crow. That being said, I think you can still build something smaller like so:
1) Create a 1Kx1 RAM with the output registers on(for performance)
2) Create a free-running 10 bit counter that powers up to 1(basically have your asynchronous clear reset it to this value.) This will be your write pointer, and your shift register will always write to this value. So the writes will occur like so:
SR bit : Memory Location
0 : 1
1 : 2
2 : 3
3 : 4
4 : 5
etc
3) Whatever tap you want to pull off will be your read address. So if you tap address 4, it will take one cycle to access the memory, and what was in that location(data bit 3) will have been shifted once, i.e. it will now be the 4th bit.
4) You may need to have a bypass register that is always being written to. It is only read if your tap value is 1.
5) Finally, this is without the memory output registered, whcih will be slightly slower since memory accesses are slower. I don't know what device, speed grade, and logic this SR is feeding, so I don't know if this is necessary for 250MHz performance. If it is, then edit the memory, turn on the output registers, and change your write pointer to begin writing at memory location 2. You will then need two bypass registers, for when you tap 1 and 2.
You'll probably want to throw down a quick simulation, as this is off the top of my head and I may be off on something. Hopefully the whole thing doesn't take more than an hour to code up and simulate. The net result is that it should take ~12 logic cells and a memory(assuming your tap select is already encoded).