Identify Duplicates in CouchDB

I am new to CouchDB and document-oriented databases.

I played with CouchDB and could familiarize myself with document creation (with perl) and use the Map / Reduce functions in Futon to query data and create views.

One of the things I'm still trying to figure out is how to identify duplicate values ​​in documents using Futon Map / Reduce.

For example, if I have the following documents:

{
  "_id": "123",
  "name": "carl",
  "timestamp": "2012-01-27T17:06:03Z"
}

{
  "_id": "124",
  "name": "carl",
  "timestamp": "2012-01-27T17:07:03Z"
}

And I wanted to get a list of document identifiers that had duplicate "name" values, is that what I could do with Futon Map / Reduce?

The result was hoping to achieve:

{
  "name": "carl",
  "dupes": [ "123", "124" ]
}

.. or..

{
  "carl": [ "123", "124" ]
}

.., which will be the value and associated document identifiers that contain these duplicate values.

Map/Reduce, , Map , Reduce / .

, , perl, , , , CouchDB, /.

, , , RDBMS:

{
  "_id": "names",
  "rec1": {
    "_id": "123",
    "name": "carl",
    "timestamp": "2012-01-27T17:06:03Z"
  },
  "rec2": {
    "_id": "124",
    "name": "carl",
    "timestamp": "2012-01-27T17:07:03Z"
  }
}

.., Map/Reduce , . , .

, , , , . .

!

: JSON .

+5
1

, . , .

:

function (doc) {
   emit(doc.name);
}

_count.

: ( 2 )

{
    "rows": [
        { "key": "carl", "value": 2 }
    ]
}

, . , " " _list .

function (head, req) {
    var row, duplicates = [];
    while (row = getRow()) {
        if (row.value > 1) {
            duplicates.push(row);
        }
    }
    send(JSON.stringify(duplicates));
}

_list, .

+7

All Articles