Caching and Sharing Firestore Data with RxJS and ShareReplay

Josh Morony

October 05, 2021 7 min read

Architecture Performance Backend angular firestore angularfire rxjs caching

Originally published October 05, 2021

Recently, I've been posting more about the benefits of reactive/declarative programming and the benefits of building your applications with this approach.

RxJS and observables are the perfect match for building reactive/declarative applications and you can typically find an RxJS operator to do the job you need. However, getting observable streams and operators to behave the way you want can be confusing and complex even if the end results are simple and beautiful.

In a client application I am working on right now, I am heavily relying on Firestore (AngularFire) and RxJS (I will be documenting the entire application build in an Elite Ionic Pro module soon, so keep an eye out for that). Of course, since I'm all about that Reactive programming lately, I supply one of my Firestore collections to my application as a stream that can be accessed through a ClientsService:

public getClients() {
  const clientsCollection = collection(this.firestore, 'clients');
  this.clients$ = collectionData(clientsCollection, { idField: 'id' }) as Observable<Client[]>;
}

Then I can just easily display this data using the | async pipe and I never even have to worry about subscribing to or unsubscribing from the observable:

<ion-list>
  <ion-item
    button
    routerLink="/clients/{{client.id}}"
    routerDirection="forward"
    *ngFor="let client of clients$ | async"
  >
    {{client.name.first}} {{client.name.last}}
  </ion-item>
</ion-list>

Nice. But... notice how I am linking to the following page:

routerLink="/clients/{{client.id}}"

As is typically the case, I also have a detail page to view the specific details of an individual client. I pass the id in to this page, and then I can use that id to grab the specific clients data. How should I actually get access to data on that detail page though? There is more going on here than you might initially think.

Our ultimate saviour will be the RxJS shareReplay operator, so if you want to get right to the solution you can skip to the final section. However, there is some important stuff to understand here so I would recommend reading through the shortcomings of the other approaches.

Outline

What if we just call getClients again?
What if just grab a single document stream?
What if we use shareReplay?
What happens if nobody is listening?
Summary

What if we just call getClients again?

We already have a getClients method that is an observable stream of all clients, why not just return a modified version of that to just return one specific client? That's not a bad idea on the surface. We could just add an additional method to our ClientsService that looks like this:

  public getClients() {
    const clientsCollection = collection(this.firestore, 'clients');
    this.clients$ = collectionData(clientsCollection, { idField: 'id' }) as Observable<Client[]>;
  }

  public getClient(id: string) {
    return this.getClients().pipe(map((clients) => clients.find((client) => client.id === id)));
  }

This is quite a simple and neat solution. However, since it is using the getClients method it is going to create a new observable every time it is called, and is therefore going to make a call to collectionData to pull in the entire clients collection from Firestore. If we are paying per document read with Firestore this isn't the most ideal solution. We would have to do this every time we wanted data on an individual client, which might happen many times whilst using the application.

NOTE: I'm not 100% sure on the exact document read/pricing structure of Firestore here. I think Firestore is probably caching some data locally and so it wouldn't necessarily always require reading a document from the remote Firestore database every time collectionData is called. However, the approach we are about to discuss is a good practice in general - you can swap out collectionData here with any kind of expensive operation you might want to perform like requests to any other kind of API or just anything that is slow/computationally expensive.

What if just grab a single document stream?

If pulling down the entire collection of clients to view a single client isn't a great idea, what if we just create a stream of one individual client document from Firestore?

We can use the onSnapshot method from AngularFire to get notified every time a document updates, e.g:

const clientDocReference = doc(this.firestore, `clients/${id}`);
onSnapshot(clientDocReference, (doc) => {
  // do something
});

But to use onSnapshot we need to pass it a handler function, and it will return us the document data along with a method that we can use to stop receiving document updates. We can't just return onSnapshot directly because it isn't an observable stream.

We could get really funky here and use an RxJS operator called fromEventPattern. The basic idea is that it will allow us to convert this handler based API to an observable stream. Let's take a look at what the code would look like and then we will discuss how it works:

  public watchClient(id: string): Observable<Client> {
    const clientDocReference = doc(this.firestore, `clients/${id}`);
    return fromEventPattern(
      (handler) => onSnapshot(clientDocReference, handler),
      (handler, unsubscribe) => {
        unsubscribe();
      }
    );
  }

We supply fromEventPattern with two functions. The first function allows us to call whatever code it is that accepts the callback/handler function (in this case, what we want to do with the returned document). We need to make sure we return onSnapshot in this function because that is how we get access to the function that allows us to unsubscribe from document updates, e.g. this is how we "unsubscribe" from onSnapshot:

const unsubscribe = onSnapshot(clientDocReference, handler);
unsubscribe(); // stop receiving updates

In the second function, we pass in that "token" returned from onSnapshot to trigger the unsubscribe as the unsubscribe parameter, and then we invoke that function. It is a little complex, but the end result is that we just get nice stream of data of a single client like we wanted. The reason for the complexity here is that we are converting something that uses a handler based event pattern (i.e. it is not an observable stream) into an observable stream.

The end result here is fine, and we are only reading one document so it is much better than the original solution. But it is complex, and technically we don't even need to trigger that single document read (although that hardly matters, the complexity here is the biggest downside).

What if we use shareReplay?

We already have the data we need loaded into the application. We call getClients() to display a list of clients on the initial page, why can't we just reuse that data? To reuse the data from that stream on the detail page we will need to get access to it somehow.

An imperative way of thinking about this might be to just subscribe to getClients() and store the result in a member variable in the ClientsService, e.g. this.clients. Then we can just access that member variable whenever we need clients. However, that isn't really a reactive/declarative approach as we are sort of manually moving data around our application instead of just reacting to changing streams. It also has the downside of not being a stream itself, so it is harder to react to change in this.clients (we could turn this.clients back into an observable stream, but now things are getting a bit messy).

This is where shareReplay comes in to save the day. We are going to take a similar approach to the one we just discussed where we would cache the result of the initial call to getClients() on a member variable in the ClientsService. Except we are going to cache the observable stream itself, not the data contained within it.

We might initially try something like this (but this will not work how we want it to):

  private clients$: Observable<Client[]>;

  public getClients() {
    if (!this.clients$) {
      const clientsCollection = collection(this.firestore, 'clients');
      this.clients$ = collectionData(clientsCollection, { idField: 'id' }) as Observable<Client[]>;
    }

    return this.clients$;
  }

The first time getClients() is called we create the observable stream by calling collectionData which will pull in the collection from Firestore. This is then stored on this.clients$ and any subsequent requests to getClients() will just directly receive this.clients$ instead.

However, if our template subscribes to this.clients$ then we are just going to be in the same predicament because it is going to trigger the collectionData call again when setting up the observable stream. We can fix this by using the share operator, e.g:

  public getClients() {
    if (!this.clients$) {
      const clientsCollection = collection(this.firestore, 'clients');
      this.clients$ = collectionData(clientsCollection, { idField: 'id' }).pipe(
        share()
      ) as Observable<Client[]>;
    }

    return this.clients$;
  }

The basic idea with the share() operator is that it will share the data from the source observable rather than re-subscribing to it for each new subscription (i.e. in this case the collectionData observable would only be set up for the first subscriber, not for any subsequent subscribers). Now we would just have a single stream of clients that we could listen to from anywhere in our application, and we only ever need to execute collectionData once.

This is close but it still won't work how we want, but let's paint a picture of where things would go wrong:

User loads the list of clients
User clicks an individual client to view its details

The first step would work fine, the clients data would be returned appropriately. However, for the second step our | async pipe in the template is going to try to subscribe to this.clients$. This subscription will be successful but no data will be returned. That is because this stream is already active and it has already emitted its initial data that was delivered to the home page to display the list of clients. Everything is sharing the same stream of data now, and our home page will have rudely used the first data emission just for itself. Latecomers to the party don't get any data. Now, our second subscription will only receive data if a new data emission on the stream is triggered by one of the client documents being updated. In this case, both the home page and the client detail page will get this data at the same time, and can both use it because they are both subscribed at the time it is emitted. However, We don't want to wait around for an update to the database to get our initial set of data.

That is why we use shareReplay:

  public getClients() {
    if (!this.clients$) {
      const clientsCollection = collection(this.firestore, 'clients');
      this.clients$ = collectionData(clientsCollection, { idField: 'id' }).pipe(
        shareReplay(1)
      ) as Observable<Client[]>;
    }

    return this.clients$;
  }

This is basically the exact same concept, except now when the second (or subsequent) subscriber comes along it is going to "replay" the latest value that was emitted on the stream. Since we use shareReplay(1) it will emit one previous value when we subscribe to it. This allows us to still get an initial set of client data without having to wait for an update to occur.

What happens if nobody is listening?

An important thing to understand about shareReplay is what happens when nobody is listening? We have this "hot" observable that just emits its data whenever it has something to emit regardless of when something subscribes to it (except for the first subscriber). This is different to a more typical "cold" observable that is executed and returns its data whenever it is subscribed to.

We have used a value of 1 as a parameter to shareReplay:

shareReplay(1);

This is because we want a bufferSize of 1, i.e. we want just the one previous value to be "replayed" for each new subscriber to our shared stream. However, shareReplay also optionally accepts a ShareReplayConfig:

shareReplay({ bufferSize: 1, refCount: true });

This allows us to supply our bufferSize along with a refCount boolean value of true (it will be false by default). If refCount is true it will cause our shareReplay stream to unsubscribe from the source observable when there are no longer any subscribers to it, e.g. if both our home page and the client detail page unsubscribed from getClients(). Our "hot" observable will become "cold" again. If someone else comes a long later and subscribed to it again, a new stream will be created that again subscribed to the source observable (which would trigger our collectionData again).

If we keep the default value of false, our shareReplay stream will remain subscribed to our source observable and just keep running forever even if nobody is subscribed to it. This probably makes sense in this case, we can just keep this single stream of clients consistently available throughout the entire life of the application. In other cases though it might cause problems. Make sure you consider what the appropriate behaviour for your specific circumstances would be.

Summary

One of the hard things about RxJS and reactive/declarative programming is that there are a lot of complexities that, if overlooked, can cause you problems. Understanding the implications of subscribing to something with and without shareReplay is certainly not intuitive, and you might not even be aware of the existence of such operators.

However, once you do grasp how to do it, the solution is very elegant. The end result here is that we can get a stream of clients anywhere in our application just by calling the getClients() method and it is going to handle automatically caching/sharing data for us wherever possible, and we can instantly react to the most up to date data without having to do any additional manual handling.