How to implement this scheme in MongoDB?

I am trying to write a tracking script and I am having trouble figuring out how the database should work.

In MySQL, I would create a table that looks like

User: username_name: string Campaign: title: string description: string link: string UserCampaign: user_id: integer camp_id: integer Click: os: text referer: text camp_id: integer user_id: integer 

I need to be able to:

  • See information from every click, such as IP, Referer, OS, etc.
  • See how often clicks come from X IP, X Referer, X OS
  • Associate each click with a user and campaign

If I do something in the lines

 User { Campaigns: [ { Clicks: [] } ] } 

I have two problems:

  • It creates a new campaign object for each user, which is a problem, because if I need to update my campaign, I will need to update the object for each user.
  • I expect the Clicks array to contain a LOT of data, I feel that its part of the User object will be very slow for the request.
+26
mongodb database-design
Jan 11 2018-11-11T00:
source share
3 answers

Well, I think you need to break this down into the main “species."

You have two entity objects:

  • User
  • Campaign

You have one "mapping" object:

  • UserCampaign

You have one "transactional" -style object:

  • Click

Step 1: object

Let's start with the simple ones: User and Campaign . These are really two separate objects, none of them depends on the other for its existence. There is also no hidden hierarchy between the two: users do not belong to Campaigns, and Campaigns do not belong to Users.

When you have two top-level objects like this, they usually earn their own collection. Therefore, you will need the Users collection and the Camapaigns collection.

Step 2: display

UserCampaign is currently used to display N-to-M. Now, in general, when you have an N-to-1 mapping, you can put N inside 1. However, when you are N-to-M matching, you usually need to "choose a side."

In theory, you can do one of the following:

  • Put a Campaign ID list inside each User
  • Put a list of Users ID inside each Campaign

Personally, I would do # 1. Probably you have more users than in campaigns, and you probably want to place the array where it will be shorter.

Step 3: Transactional

Clicks are a completely different beast. In object terms, you might think of the following: Clicks "belongs to" a User , Clicks "refers to" a Campaign . Thus, theoretically, you can simply store clicks that are part of any of these objects. It’s easy to think that clicks belong to users or campaigns.

But if you really dig deeper, the above simplification is really wrong. On your system, Clicks are truly the central focus. In fact, you can even say that users and campaigns are really just “related” to a click.

Take a look at the questions / requests that you ask. All of these issues are actually centered around clicks. Users and campaigns are not central to your data, clicks.

In addition, clicks will be the most abundant data in your system. You will have more clicks than anything else.

This is the biggest hitch in designing a schema for such data. Sometimes you need to push away the "parent" objects when they are not the most important. Imagine creating a simple e-commerce system. It is clear that orders will “belong” to Users , but orders are so central to the system that it will become a top-level object.

Wrap it up

You will probably need three collections:

  • User → has a list of campaign._id
  • the campaign
  • Clicks → contains user._id, campaign._id

This should satisfy all your needs:

See information from every click, such as IP, Referer, OS, etc.

 db.clicks.find() 

See how often clicks come from X IP, X Referer, X OS

db.clicks.group() or run Map-Reduce .

Associate each click with a user and campaign

db.clicks.find({user_id : blah}) You can also click click ids for users and campaigns (if that makes sense).

Please note that if you have many, many clicks, you really have to analyze the queries that you run the most. You cannot index in each field, so you often want to run Map-Reduces to "collapse" the data for these queries.

+84
Jan 13 2018-11-11T00:
source share

The main problem that I see here is that you are trying to apply the concepts of a relational database to a document-oriented database. The main difference between the two is that you don’t worry about the schema or structure in the NOSQL database, but about the assembly and documents.

It is very important / necessary to understand that there are no concepts of joining in many NOSQL implementations, as in SQL. This means that if you distribute your data to collections, then you will work hard to stick them later. In addition, there is no other benefit to distributing your data in collections, as in normalizing SQL db. You need to think about what data is part of your document and which collection it belongs to and does not worry about implementation under NOSQL db. Therefore, for your problem, the answer may be ... and will support everything that you requested ...

db.trackclicks ==> collection
trackclick = {OS: XP, User: John Doe, Campaign: {title: test, desc: test, link: url}, Referrer: google.com}

+3
Jan 12 2018-11-12T00:
source share
  • This is not a problem for mongodb to update a large number of documents if something in a company has been changed.

  • A nested collection, or not at all, depends on how much data is in the collection. In your case, if you know that the Clicks collection will contain a LOT of data, you need to create a separate collection. Because for "clicks" you need paging, filtering, etc., And the user will be a "light" collection.

Therefore, I suggest the following:

 User { Campaigns: [] } Clicks { user_id, camp_id } 
+2
Jan 11 2018-11-21T00:
source share



All Articles