My first point would be to note that 4 GB are tough for storing 20M sorted sets. A quick try shows that 20M users, each with 20 tags, consumes about 8 GB on a 64-bit field (and it takes into account the optimized ziplist memory optimizations shipped with Redis 2.4, don't even try to use this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate over keys. Rather, it means a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis that can support iteration are a list and a sorted set (via range methods). However, they seek to convert the iteration algorithms O (n) to O (n ^ 2) (for a list) or O (nlogn) (for zset). The list is also a poor choice for storing keys, as it will be difficult to maintain, as keys are added / deleted.
A more effective solution is to add an index consisting of regular sets. You must use the hash function to bind a specific user to the bucket and add the user ID to the set corresponding to that bucket. If the user ID is a numeric value, a simple modular function is enough. If this is not the case, the simplest line hashing function will do the trick.
So, to support iteration on user: 1000, user: 2000 and user: 1001, select the modulo 1000 function. User: 1000 and user: 2000 will be placed in bucket: 0 index, while user: 1001 will be placed in bucket index : 1.
So, at the top of zsets there are now the following keys:
index:0 => set[ 1000, 2000 ] index:1 => set[ 1001 ]
Keys prefix is not needed in sets, and it allows Redis to optimize memory consumption by serializing sets, provided that they remain small enough (optimization of whole sets proposed by Sripati Krishnan).
The global iteration consists of a simple loop on buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is used to obtain the corresponding set, and the client can then iterate over individual items.
Here is an example in Python:
#!/usr/bin/env python
By setting constants, you can also use this program to estimate the global memory consumption of this data structure.
IMO this strategy is simple and effective because it offers O (1) the complexity of adding / removing users and the true complexity of O (n) for iterating over all elements. The only drawback is that the key order of iterations is random.