Introduction

We are going to explore a way to handle content hydration with Node.js and Mongo

This is an approach useful when sending and displaying a big list of documents.

Balancing initial and subsequent load time while also providing per-document optimizations.

And some offline usability with a solid base that we can later extend uppon.

Some disclaimer

I will only talk about the case where you are the “single source of truth” e.g. the list of Facebook’s pages you like or your Amazon wishlist.

I make this distinction because we won’t have any problem with data synchronization (since the user is the only ones that can modify the documents), when multiple users share the same list then we need to have a strategy to merge documents if 2 or more users edited it at the same time.

Stay tuned for the follow-up post where I cover that!

The idea is to display the list of Mongo documents as a whole because it offers a better user experience compared with pagination, this approach though is not a perfect fit for all problems, if you have millions of documents in the list you will need to break it down into pieces.

The problem to solve

In one of the projects I had to develop a view that was a customer list, where if you clicked in one of the items you would be redirected to another view displaying more details, a pretty straightforward and common requirement.

The initial approach was the most naive one I could think of, once you enter in the list view it would fetch the customers from the node.js server, iterating through all of them constructing the view, something like:

{customers.map(customer) => 
  <item onClick={this.openCustomerDetail(customer)} label={customer.name}/>
}

That initial implementation was working fine with the 10 or 20 initial customers, the problem was that wasn’t a real use case…

Vibe Reality check for boomers

One week before launching the project, our client asked us if we could migrate their current database (DBF 😖) to our new shiny system in Node.js and Mongo.

Since their DB was basically an excel file, I just created a script to transform it into JSON format, then I mapped the properties and insert them on Mongo, and charge some extra money for little effort and happy customers.

The problem was that they had more than 500000 records, each with 5 or more fields with information.

So, After the migration I tried to load the view and well… it hanged.

Two problems appeared:

  • The getAllCustomers request was taking around 30 to 45 seconds to finish.
  • The scrolling was extremely laggy

I started looking for solutions, because the cause was pretty obvious.

First, instead of returning every customer with populated data, I’d just sent the unique Mongo indentifier: _id but also the name property to the frontend.

Instead of:

openCustomerDetail(customer) {
  this.router.navigate({ path: '/customerDetails', props: customer });
}

I changed it to:

async openCustomerDetail(customer) {
  const hydratedCustomer = await this.customersService.getCustomer(customer._id); // Request it to the node.js server
  this.router.navigate({ path: '/customerDetails', props: hydratedCustomer });
}

When I say hydrated I’m referring to the state of a Mongo document, that it has all the properties that are references to other collections, present and populated (instead of just objects IDs). On the other hand, a dry document is a minimal expression of itself, not only unpopulated but also with missing properties, due to a projection.

This was by far the biggest performance gain, but the caveat with this approach is that now users have to wait for the hydration to finish every time they access a customer’s details.

Annoyingly, if you were to load a customer, exit and enter again, you would have to load it once again. (Yep, network caching can help but there is still a better solution)

Flux architecture to the rescue, sort of

Before tackling down the caveat, I’ll mention there is another improvement that we can do, and it will lay down the basis to later extend on.

Currently, every time a user enters the view, it will fire the getAllCustomers request, most of the times needlessly.

Here is where the flux architecture comes handy, it allows us to load everything once and store it, then the view just have to consume it from the store.

The following example is taken from my project, where I used Angular + Ngxs

In the view:

<div>
  <item 
    *ngFor="let customer of customers$ | async"
    (click)="openCustomerDetail(customer)"
  >{{ customer.name }}</item>
</div>

In the controller:

...

@Select('customers') customers$: Observable<Customer[]>; // Slice of the store

...

ngOnInit() {
  this.customers$.subscribe(storedCustomers => {
    if (storedCustomers.length === 0) {
      const customers = await this.customerService.getAll();
      this.store.dispatch(new StoreCustomers(customers));
    }
  });
}

Extra note: This will also allow us to have basic offline capabilities, because if at least one request is finished, we can populate the store to display that, much better than nothing.

I’ll be writing about strategies to enable the user to work offline and then resync the data with the cloud, stay tuned!

In sync

The solution that we crafted is pretty nice, but it has a big problem.

How are we going to keep the store in sync with the database in the cloud?

But as I mentioned at the beginning, in this post I’m just going to cover the case where you are the only that can change the list of elements.

We could go the naive way of return the full list of Mongo documents on every create, update or delete and update the store.

But were are better than that, aren’t we?

So we will do atomic updates on the store instead.

Basically:

  • Do the request
  • On a create/update, from the node.js server we return the new/edited document. (Don’t forget the new: true option if you are using mongoose!)
  • Once the response arrives, dispatch an action to update the store

In this example, on a create we do something like:

case NEW_CUSTOMER:
  const newState = { ...state };

  // Keep on reading to know about this `hydrated` flag!
  newState.customers.push({ ...action.customer, hydrated: true })

  return newState;
}

For an update:

case UPDATE_CUSTOMER:
  const newState = { ...state };

  const index = newState.customers.findIndex(x => x._id === action.customer._id);
  newState.customers[index] = { ...action.customer, hydrated: true };

  return newState;
}

Last but not least, for a deletion:

case DELETE_CUSTOMER:
  const newState = { ...state };

  const index = newState.customers.findIndex(x => x._id === action.customerId);
  newState.customers.splice(index, 1);

  return newState;
}

Mixing things a little bit

With our current solution, now we can tackle the caveat of loading the same customer over and over again, to do it, and to make sense of the tittle.

I thought about having a mixed data structure in the store, more easily explained with the following flowchart and typescript interface:

Flowchart with the logic about the mix data structure

interface DryCustomer {
  _id: string;
  name: string;
  hydrated: boolean;
}

interface HydratedCustomer extends DryCustomer {
  ... // Some other non-critical for the list view data like age, address, etc
}

export class CustomerStateInterface {
  customers: Array<DryCustomer | HydratedCustomer>; // or the alternative sintax: (DryCustomer | HydratedCustomer)[]
}

So, lets update the openCustomerDetail function:

openCustomerDetail(customer) {
  const customer = await this.customerService.getCustomer(customer._id);
  this.router.navigate({ path: '/customerDetails', props: customer });
}

And the service:

async function getCustomer(customerId: string): Promise<HydratedCustomer> {
  const customerFromStore = await this.getCustomerFromStore(customerId); // Looking it up in the store

  if (customerFromStore.hydrated) {
    return customerFromStore;
  } else {
    const customer = await fetch(...); // Request it from the node.js server

    dispatch(UPDATE_CUSTOMER, customer: {
      _id: customerId,
      ...customer
    }); // Next time we look for it we won't have to fetch it from the node.js server

    return customer;
  }
}

Smart pre-hydration for Mongo documents

Now we have a solid foundation to extend on, what about if we send hydrated the Mongo documents that we think are most likely to be used? Some strategies to figure that out could it be:

  • The most frequent user/customer/products.
  • If it’s a social-media-like project, friends, family and liked pages are most likely to get visited that another document that you don’t have a direct connection to.
  • Adding a counter to see how many times a given document was edited, so we hydrate the top X most used ones. Check the __v key for Mongoose ODM*.
  • The most recent ones edited. Use timestamps: true in Mongoose.
  • Maybe you never pre-hydrate on the initial load but use something like guess.js to fetch data on hover.

Everything depends on your product and business logic.

This also allows us to have granular control over the hydration in a per-basis Mongo document.

For example we can easily implement a data saving feature where we just send the bare minimum, or maybe the complete opposite, if the user is using the app from a computer (most likely doesn’t have a limited internet bandwidth) you can enable a more aggressive configuration.

Maybe you can combine both, sending the minimum for fast initial load time and in the background populating the remaining data. And who says that we can only have two states? You could have a ‘partial state’, get creative!

Misc, other minor optimizations

  • Since in my project the customer list was the most used view I did the getAllCustomers request, to the node.js web server, in the background as soon as I could, if the user was logged then I fired it on the app initialization if not, after a successful login.
  • I used a compression utility for node.js, in my case as I was using fastify.js framework, so I chose fastify-compression.
  • And was transforming the name of the customer before returning the data (trim spaces, fix casing, etc), but instead of continue doing that I ran a script to migrate those properties, skiping this step. This will help you a lot if you do complex transformations and/or have a big number of documents.

But this may not apply to every case though. Finally, you also have to keep in mind if you save a computed property, let’s say fullName which is firstName + lastName when the user changes one of those properties you will have to recalculate the computation.

  • What about infinite scroll/pagination? I didn’t like the user experience, specially in my particular use case they needed easy and quick access to the full list. Also, in that view, I have a search bar that filters by name, if I did pagination then I would have to change the logic in which the filtering worked.
  • Use a virtual scroll component (Just renders what is visible in the viewport).

TL;DR:

Balanced Thanos meme