Click on your picture in the top right corner, go to Preferences and select your notification preferences.
Please keep in mind:
1. To only comment on the topic where the discussion is about. Do you have a new question or topic? Please start a new discussion.
2. Be kind to other users!
Excel performance testing
Tygron's Custom Solutions team is currently working on a project in which performance is a key factor, with many calculations running in parallel in an excel file. Although calculations may seem trivial in construction and duration, as you scale up the amount of data in your project and thus the amount of calculations performed, small inefficiencies can begin to add up.
Specifically, some calculations require a "lookup", consulting a list of available values to find the one appropriate to the situation. For an example, you can take a look at the MKP indicators available on our store. Each has a reference table for properties and targets which different types of areas should adhere to. Each Area which defines such an area type has its own entry in the indicator, calculated on a single row. And on each for, the appropriate values are "looked up" from that reference table, for use in that Area's calculation.
For the users who have already dealt with large-scale excel calculations (on a scale not of hundreds but of thousands of calculations) it is already known that lookup operations are especially tricky. The built-in VLOOKUP and HLOOKUP functions, but also the more flexible INDEX-MATCH construction can take more than a few milliseconds each. Not much to worry about in most situations, but as you perform more and more of them it starts to add up. How can you approach a performance challenge such as this?
Is there really something to worry about?
First off, know whether it's an issue. It is not neccesary to optimize if the calculation is fast enough. If the calculations are simple, and the amount of calculations to perform is low, chances are whatever you have is already fast enough. However, the opposite of that is that there are differences in efficiency between Excel running on your own computer and the interpretation made by the Tygron Platform. Some functions may be run slower, while others may be handled more quickly.
What are your options?
Secondly, know the tools you have at your disposal.
On the one hand, there are multiple ways of performing the same task, and some creativity will provide you with a range of options. In my case:
- HLOOKUP and VLOOKUP allow a first row of column to be searched, and when a desired value is found, a value on the same column or row can be returned.
- INDEX-MATCH does the same thing, but provides more freedom in the lsit of values to search in, and the list of values to return from.
- INDIRECT-ADDRESS allows you to dynamically calculate the address of the cell and retrieve the value from that exact cell.
- SUMIF adds together all the values found, provided that certain conditions are met for a different value.
On the other hand, there are also a number of tools available to analyse how performant the calculations are.
- In any project, when you perform a calculation in the editor, you can hover over the recalculation icon in the top-left, and there select the Performance Overview. This will how much time every step of the recalculation took. Most importantly in this case are the queries and the excel itself, though it will also provide insight into other faccets which take up precious calculation time.
- For more detailed insight into what takes up most time in an individual excel, a calltree recording can be started. In the Indicators dropdown menu in the editor ribbon, select Calltree Recording. A prompt will open to ask for a threshold time to report for. 20 milliseconds can provide sufficient insight for most situations. The next time the calculation is run a log is kept of how long it takes for specific cells in the excel to calculate a result. The log can be downloaded by selecting the Excel Panel or Indicator in the editor, and then clicking on "Download calltree recording".
How to test?
Third, make sure you are testing the right thing. Testing whether the excel works in a reasonable amount of time for your purposes should be done in a project which is representative for your actual use-case. If you have a calculation which you intent to run on 5000 Areas, make sure you perform your test in a project with 5000 Areas. At the same time, look into what element of the project may muddy the water and explicitly aren't relevant for your performance. I.e., if your project contains multiple excel indicators, (temporarily) remove all excels from the project which do not interact directly with the excel you are trying to analyse. Otherwise, you will be left waiting until all calculations have completed instead of just the one you are truely interested in.
What does that mean?
Fourth, your milage may vary. Even in simple projects or simple Excels, there may be minor details which affect the whole in significant ways. Testing calculation speed for one Excel with 5000 Areas does not mean an Excel with 2500 Areas will take exactly half the time. Subtle things like what cells are or aren't used in output, what kind of queries feed them, whether it's dealing with numerical or text data, and more. In addition, some implementations may lend themselves for certain data while other's don't. For example, you can't use SUMs to get retrieve text values.
My process in brief
I've created a version of my project which I was able to use to try different implementations. It had around 8000 of the items I was trying to test, which in my basic implementation took between 40 minutes to over an hour. I used a calltree recording and found out that what was taking the most time in the excel's calculation was a lookup structured as an INDIRECT-ADDRESS. Interestingly enough, because in previous excercises where this was tested that implementation was fast enough. But for thise use-case, that was appearently no longer the case.
Since an hour is far too long to see whether the calculation works or not I reduced the amount of formulas to a range of 400 items rahther thatn 8000. That was small enough that I got to see in a reasonable amount of time whether the calculation completes fast enough, but large enough so I can still get some impression on which implementations take long.
I then created a number of variations of the same excel, in which I replaced specifically that lookup with different functions to accomplish the same goal. I tried (and found) the following:
- INDIRECT-ADDRESS, as a reference, removing everyting from it that was't related to the lookup without breaking the reference betweent the output cells and the lookups in question.
- OFFSET, a different way to compute a reference, but which turned out to be about equally slow
- Stack of IFs, in which I just strung a few "IF" formulas together. This was faster than the other options so far, but would mean no longer having a reference table to look values up in, but "hard-coding" the values in an less-readable format.
- SUMIFS, add together all the values where a certain key value was found. This was actually surprisingly fast, and reduced the calculation time from 40+ minutes to something in the order of seconds.
With that fourth result I have my solution. So in my actual excel I have replaced the INDIRECT-ADDRESS function with a SUMIFS function and can now continue on with my excel calculating as it should.
Takeaway
Should you find yourself in a similar situation, take a step back, inventory the situation and your options, and slowly and carefully test which works best for you.
Sprawling spreadsheets so intricate Alexander the Great cuts them in half.