Paging and Delta Updates in Graph API

This is a quick blog post to summarise how to work through paging and delta updates in Graph API. Examples are in Typescript but you can adapt to the different SDKs Microsoft provides.

Paging

If you’ve called an API before you will most likely encountered paging before. Paging is where you receive a “page” of data, instead of all of it one go (think a page in a book as opposed to the whole book). Paging works in that you request each page separately from an API and, if required, request multiple pages if more data is required.

In most scenarios you won’t want to receive all the data from a single API call, and if you did, Graph API will most-likely not provide this. This is however possible with paging.

Iterating through pages

There is a built-in limit of the maximum number of items you can return in a page/API call. So, for example, if the limit per page for the endpoint https://graph.microsoft.com/v1.0/users is 100 and you have 1000 items you want to retrieve, you will be able to request all 1000 items across 10 separate API calls/pages.

This is called iteration and below is an example of using the Graph SDK to iterate through each page of the /users API until complete:

const users: User[] = [];
const query: PageCollection = await client
    .api("/users")
    .select("id,displayName,userPrincipalName,usageLocation,accountEnabled")
    .get();
const callback: PageIteratorCallback = (data) => {
    users.push(data);
    return true
}
const pageIterator = new PageIterator(client, query, callback);
await pageIterator.iterate()
    .catch(error => {
        throw error;
    });

What is happening here is we are creating a query to retrieve the first page of users. We then create a callback function which will be called for each page of users, and then we create a PageIterator class which will iterate through each page of users and call the callback function for each page.

The callback is what you use to handle the current page of data. Returning true will tell the class to continue iterating through the pages.

How this works underneath, and you might have noticed this in working with Graph, is there is an @odata.nextLink provided in a Graph response if there is more items, this is the query to run to get the next page of items.

Example nextLink

Batching pages

Whilst iterating through all pages one after the other is the quickest way to retrieve the data you require, it might not always be suitable. For example, if you want to retrieve 100,000 emails from Graph, and then do something with them e.g. put them in a database, or save them somewhere, it could become quite intensive and degrade performance.

You may be better of iterating through a set number of items, pausing the iteration, processing the current batch of items, and then resuming the iteration (rinse and repeat until done):

const maxBatchSize = 10;
const users: User[] = [];
const currentBatch: User[] = [];
const query: PageCollection = await client
    .api("/users")
    .select("id,displayName,userPrincipalName,usageLocation,accountEnabled")
    .get();
const callback: PageIteratorCallback = (data) => {
    // If we have reached the max batch size, return false to stop iterating
    currentBatch.push(data);
    return currentBatch.length < maxBatchSize;
}
const pageIterator = new PageIterator(client, query, callback);
await pageIterator.iterate()
    .catch(error => {
        throw error;
    });
// Iterate over the current batch
while (currentBatch.length > 0) {
    // Do something with the current batch
    // ...
    users.push(...currentBatch);
    // Reset batch
    currentBatch.length = 0;
    // Resume iteration
    if (!pageIterator.isComplete()) {
        await pageIterator.resume();
    }
}

This is a similar approach the previous code snippet, but instead of always returning true to continue iterating, we can return false to pause the iteration once the amount of data we want to work with is reached. Once processed, we can then resume the iteration.

Note: One other reason to use this approach is if you want to perform any async operations as you iterate through pages. The PageIterator class does not accept an async callback, so if you wanted to perform any async operations, you would want to pause the iteration, process the async operation and resume. If you are doing non-async operations to the page, you can just pass it in the callback of the PageIterator class.

Specific page

What about if you want to retrieve a specific page of data? Imagine creating a table of items, but you cannot show all items on the table at the same time. For this you would use pages. If a user clicked on page “3”, they would expect the table to show the data for the third page. So, how do we do this?

Going back to the example of 1000 items, with a page size of 100, to request the third page we are wanting items 300-399 to be returned. The way this is done is by effectively setting a starting point and the size of the page to return. You can do this with the $skip and $top ODATA query parameters being set:

const messages = await client
        .api("/users/{id}/messages")
        .skip(5)
        .top(5)
        .get();

In this example, we are skipping the first 5 items and then returning the next 5 items. This is effectively items 6-10.

Note: Not all ODATA query parameters such as $skip and $top work with all Graph API endpoints, so please check what is supported for the endpoint you want to use.

Delta updates

Building on top of paging is delta updates. Delta updates are a way to retrieve only the changes that have occurred since the last time you retrieved the data. This is useful if you want to keep a local copy of data in sync with Graph without having to retrieve all the data every time.

This is achieved by using specific delta endpoints, which are different to the normal endpoints. For example, the normal endpoint for retrieving users is https://graph.microsoft.com/v1.0/users. The delta endpoint for retrieving users is https://graph.microsoft.com/v1.0/users/delta.

Note: Not all endpoints have a delta endpoint, so please check if the endpoint you want to use has a delta endpoint.

How delta updates work

Similar to paging and its @odata.nextLink, delta updates have a @odata.deltaLink which is used to retrieve the next set of changes.

The first time you call the delta endpoint, you will have to iterate through all the pages of data to get a baseline. Once you have paged through all the data, you will receive a @odata.deltaLink in the response. This is the link you will use to retrieve any changes.

You can call the delta endpoint at any time to retrieve the changes since the @odata.deltaLink was returned. Think of @odata.deltaLink as a point in time, and the delta endpoint will return all changes since that point in time - whether that be 1 minute ago, 1 hour ago, or 1 day ago.

Using delta updates

With the knowledge of paging and delta updates, this is the rough workflow you would follow:

  1. Retrieve all data from the delta endpoint (using paging)
  2. On the last page of data, retrieve the @odata.deltaLink and store it somewhere
  3. When you want to retrieve the changes since the last time you retrieved the data, use the @odata.deltaLink to retrieve the changes
  4. Store the new @odata.deltaLink somewhere that is returned after all the changes have been retrieved
  5. Repeat steps 3 and 4

Below is an example of how to use the SDK to retrieve the changes since the last time you retrieved the data (if you have not retrieved the data before, you can just use the delta endpoint without the @odata.deltaLink):

Initial retrieval of data:

const users: User[] = [];
const query: PageCollection = await client
    .api("/users/delta")
    .select("id,displayName,userPrincipalName,usageLocation,accountEnabled")
    .get();
const callback: PageIteratorCallback = (data) => {
    users.push(data);
    return true
}
const pageIterator = new PageIterator(client, query, callback);
await pageIterator.iterate()
    .catch(error => {
        throw error;
    });
const deltaLink = pageIterator.getDeltaLink();

Retrieval of changes since last time:

const users: User[] = [];
const query: PageCollection = await client
    .api(deltaLink)
    .select("id,displayName,userPrincipalName,usageLocation,accountEnabled")
    .get();
const callback: PageIteratorCallback = (data) => {
    users.push(data);
    return true
}
const pageIterator = new PageIterator(client, query, callback);
await pageIterator.iterate()
    .catch(error => {
        throw error;
    });
const deltaLink = pageIterator.getDeltaLink();

In this basic example, we are retrieving all the users and storing the @odata.deltaLink in the deltaLink variable. We can then use this deltaLink variable instead of the /users/delta endpoint to retrieve the changes since the last time we retrieved the data.

Wrapping up

I hope has helped you understand how to work with paging and delta updates in Graph API. I think its a pretty neat and elegant solution to working with large sets of data. If you have any questions, please feel free to reach out to me on Twitter @lee_ford or leave a comment below.