Cache Dynamic SQL Pagination
Would like to know if there is an alternative or better way to paginate through a dataset using dynamic SQL than what I am using below. The problem is that as the potential pool of data gets larger, this code slows down to the point of not being useable. In analyzing each line of code below, it appears the slow down is related to the initial rset.%Next() iteration. Is there anything available which does not require a subquery/%VID such as a simple LIMIT/OFFSET?
My code is similar to :
s sql=##class(%SQL.Statement).%New()
s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100"
s sc=sql.%Prepare(query)
s rset=sql.%Execute()
while rset.%Next() {.....
See:
PS: need to replace the heading on "Speedup SQL pagination"
Thanks for the suggestions, but I'm afraid none of these will fix my problem.
- ScrollableResultSet has the most promise, but still does not satisfy what I need. The data is constantly being updated so using this would not be viable.
-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets. For ex. The following query has a potential pool of 1mil results. SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100. Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.
- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET. Also using Cursors or Keysets would not be viable as my application allows jump to, sorting and filtering functionality.
- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.
Any other thoughts?
So, in order.
Caché has analogues - TOP N and %vid, which with more than replace the LIMIT/OFFSET. In the second link this is discussed in detail.
The essence of the article is to replace query
to
In this case, the following query is sufficient:
Is there an index for "prop"?
You tuned the table?
I have all works quickly:
Result:
I'm sure I'm missing something, but so far, I'm just not seeing how this would speed up my query.
My reason for thinking the 4th article wouldn't work for me is because I am allowing the user to jump to a page, and with filtering and sorting also available (as well as a constantly changing database), I don't see how I could determine the keyval/id to use in the query.
Also, this query will work when grabbing the first page, but not for subsequent pages SELECT TOP 100 prop FROM table WHERE prop=?
The TOP N and %vid are very slow when dealing with a large result set.
The example you give is great, but I don't think it has 1mil possible matches which are being filtered down to 100 results (not 100% sure about that statement..:)). This is the problem I am having with using TOP N and %vid.
For example, say I want to grab all data for the month of July that match some specific search parameters (which happens to result in 1,000,000 matched items), and since I am currently on page 5 (and each page displays 100 results), I want to offset the results by 400 items. The first %Next() takes a tremendous performance hit.
s sql=##class(%SQL.Statement).%New()
s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=val) WHERE %VID BETWEEN 401 AND 500"
s sc=sql.%Prepare(query)
s rset=sql.%Execute()
s t=$zh d rset.%Next() w $zh-t
6.105871
Could you provide a complete example similar to mine (code + data + queries with sorting and filtering) that you have a issue with?
How much all rows in the table?
How much RAM?
Which version $zv?
Pardon the formatting but please see below. Conclusion - Every time I add 100,000 to the table, it takes longer to perform the initial %Next(), even though I'm still only asking for the first 5 results. This is a simple table with a simple query. The one I'm dealing with in my app is a much more complex table and query with an ORDER BY clause which slows it down even more, but the example below gives an idea of what I am talking about.
Class mp.test Extends (%Persistent)
{
Property prop1;
Property prop2;
Property prop3;
Index idx3 On (prop1, prop2, prop3);
ClassMethod Fill(total)
{
;d DISABLE^%NOJRN
;d ..%KillExtent()
s prop1="name"
s prop2="prop2"
f i=1:1:total {
s prop3=$r(30)
s id=$g(^mp.testD)+1
s ^mp.testD(id)=$lb(""," "_prop1," "_prop2," "_prop3)
s ^mp.testI("idx3",$$$SQLUPPER(prop1),$$$SQLUPPER(prop2),$$$SQLUPPER(prop3),id)=""
s ^mp.testD=$g(^mp.testD)+1
}
}
ClassMethod Query()
{
s sql=##class(%SQL.Statement).%New()
s query = "SELECT *,%vid FROM (SELECT %ID,prop3 FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2') WHERE %VID BETWEEN 1 AND 5"
s sc=sql.%Prepare(query)
i $$$ISOK(sc) {
s rset=sql.%Execute()
i 'rset.%SQLCODE {
s t1=$zh
d rset.%Next()
w $zh-t1,!
d rset.%Display()
}
}
}
}
TEST 1 - Loading 100,000 items into table. Query takes .314 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.314167 secs
ID prop3 Literal_3
2 19 2
3 19 3
4 19 4
5 15 5
5 Rows(s) Affected
------------------------------------------------
TEST 2 - adding another 100,000 to table. Same query takes .64secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.640992 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
--------------------------------------------------------
TEST 3 - another 100,000. query takes .865 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.865654 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
----------------------------------------
TEST 4 - another 100,000 takes 1.16secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
1.169932 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
--------------------------------------------
TEST 5 - another 100,000. query takes 1.44 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
1.44849 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
And if so?
Whoa!!! Upon initial testing that is lightning fast!!! Went from +1s down to .001. I'm going to fool around with this some more but so far this is exactly what I was looking for. Thank you so much!!!!
is there a way to ORDER this? Say by prop3 DESC
I am having similar issues.
The speed is faster, but the order by is necessary.
Our table holds around 5M entries, with the select done as per above, it still takes 7 to 8 seconds to load each page, which is unacceptable for my client.
Have you tried this?
Yes. The top all kills the performance.
59.516 seconds from 7.8 seconds without the order by and top all.
Source code
USER>d ##class(mp.test).Fill(5000000)
USER>d ##class(mp.test).Query()
count=3833346
first 5 [1:5]
ID prop3
3 3
4 3
24 3
30 3
97 3
5 Rows(s) Affected
.000328 secs
last 5 [3833342:3833346]
ID prop3
4999798 1
4999817 1
4999836 1
4999866 1
4999947 1
5 Rows(s) Affected
2.884304 secs
PS: for those who put a minus. May I ask why?
Have you tried that with an order by on one of the fields?
When doing the order by, you need to select a top all on the inner query so that it can order by.
The sorting needs to happen before the %VID filters areapplied.
Example: Filter on a property that has a bitmap index for many records.
order by another property wich is an alphanumeric value, which is not unique.
I don't think I quite understand you.
That is, you need to:
If not, provide a ready-made example with data similar to the example by @MARK PONGONIS
Let's change the data and then test it with an order by:
Prop1 is i %Integer with valuelist of 1 to 10. Populated randomly, with 60% of the values being 2.
Prop2 is can stay as is and not relevant in this test.
Prop3 is a %String of length 20. Populate with 4M random values.
Prop1 has a bitmap index.
Prop2 has a bitmap index.
Prop3 has an index.
To use order by, it needs to look like this, and this is a lot slower:
select %ID,prop3
from mp.test
where %ID in
(SELECT * FROM
(SELECT TOP ALL %ID
FROM mp.test
WHERE prop1 = 2
ORDER BY prop3
WHERE %VID BETWEEN 3000000 AND 30000200
)
Class mp.test Extends (%Persistent, %Populate)
{
Property prop1 As %Integer(POPSPEC = ".PopulateProp1()", VALUELIST = ",1,2,3,4,5,6,7,8,9,10");
Property prop2 As %Boolean(POPSPEC = "Integer(0,1)");
Property prop3 As %String(MAXLEN = "", POPSPEC = "LastName()");
Index prop1Index On prop1 [ Type = bitmap ];
Index prop2Index On prop2 [ Type = bitmap ];
Index prop3Index On prop3;
ClassMethod Fill(total)
{
;d DISABLE^%NOJRN
;d ..%KillExtent()
;do ..Populate(5000000)
}
Method PopulateProp1() As %Integer
{
set tInt = $random(19) + 1
set:(tInt > 10) tInt = 2
return tInt
}
ClassMethod Query()
{
s sql=##class(%SQL.Statement).%New()
s query = "select %ID,prop3 from mp.test where %ID in (SELECT * FROM (SELECT TOP ALL %ID FROM mp.test WHERE prop1 = 2 ORDER BY prop3) WHERE %VID BETWEEN 3000000 AND 30000005)"
w !,"Query Before Prepare: ",$zh
s sc=sql.%Prepare(query)
w !,"Query After Prepare: ",$zh
i $$$ISOK(sc) {
s rset=sql.%Execute()
w !,"Query After Execute: ",$zh
i 'rset.%SQLCODE {
s t1=$zh
d rset.%Next()
w $zh-t1,!
d rset.%Display()
}
w !,"Query All Processed: ",$zh
} else {
w $System.Status.GetErrorText(sc)
}
}
/// d ##class(mp.test).Test()
ClassMethod Test()
{
w !,"Before fill: ",$zh
d ..Fill()
w !,"After fill: ",$zh
do ..Query()
}
}
Terminal Output:
DEV>d ##class(mp.test).Test()
Before fill: 84807.575139
After fill: 84807.575144
Query Before Prepare: 84807.575158
Query After Prepare: 84807.666853
Query After Execute: 84807.6669688.009005
ID prop3
0 Rows(s) Affected
Query All Processed: 84815.676129
Try adding a new index and don't forget make rebuild index/tunetable/recompile class
Here yet need the help of @Kyle.Baxter.
PS: by the way, check
to insert the correct values in the code
Because of this, is obtained
Try it with between 1 and 5 on the %VID. Just as slow. It is not about the 0 result, but about the time it takes.
The count on prop1=2 is around 2.7M/
Regarding the index:
Index idx On (prop1, prop3) [ Type = bitmap ];
I can't create an index on every possible filter and order by combination. The client selects what to filter on and what to order by.
We use single field indices to cater for this, and combined field indexes for uniqueness constraints.
I ran your code with new index.
count(prop1=2) = 2629833
%VID BETWEEN 1 AND 5
.000293 secs
%VID BETWEEN 2629829 AND 2629833
3.63472 secs
Waiting for @Kyle.Baxter response.
We really need to stop narrowing this). Filed.
So isn´t there a solution for linear cost pagination with customizable filtering and ordering?
THanks for the question @Rodrigo Werneck
Calling for @Benjamin De Boe and @Bob Kuszewski to help.
I'm not sure what you mean with "linear cost pagination", but as soon as your query plan gets more complex than reading a single stream (and especially when sorting) there is upfront work before you can start reading the first row so any pagination of that final result simply cannot be linear.
That small disclaimer aside, "customizable filtering and sorting" sound like application-level features you'd apply on top of the result set returned by SQL and the %ScrollableResultSet class offers something along those lines. If you're looking for server-side support, IRIS SQL currently only supports the TOP clause, but we have support for LIMIT / OFFSET semantics queued for development.
link to a solution with IRIS 2021.1+ with the use of the new window (OVER) function :
https://community.intersystems.com/post/scrollable-resultset-pagination-...
I would go with an (old) approach for pagination:
1. Store only the IDs/page in a temporary table
2. For any specific page, get the IDs and query the data from the main table
The pagination class:
{
Index Main On (Session, QueryName, PageNo) [ IdKey, Unique ];
Property IDs As list Of %String(TRUNCATE = 1);
{ SELECT PageNo FROM TempSQL WHERE (Session = :session) AND (QueryName = :queryname) }
The function to populate the pagination class:
{
N (SQL,session,queryname,PageSize,Special,%session)
&sql(delete from DB.TempSQL Where Session = :session and QueryName = :queryname)
S rs=##class(%ResultSet).%New()
D rs.Prepare(SQL), rs.Execute()
S (count,page,SQLCODE)=0,TimeStart=$P($ZTS,",",2)
S entry=##class(DB.TempSQL).%New()
S entry.Session=session,entry.QueryName=queryname
F {
I (rs.Next()'=1) {
I (entry.IDs.Count()>0) {
S page=$I(page),entry.PageNo=page D entry.%Save()
K entry
}
Quit ; last one !
Else {
I queryname'="Search"||'##class(Utils.Lists).CheckIBlack(rs.GetData(1)) {
S count=$I(count) D entry.IDs.Insert(rs.GetData(1))
}
I (count=PageSize) {
S page=$I(page),entry.PageNo=page
D entry.%Save() K entry
S count=0,entry=##class(DB.TempSQL).%New()
S entry.Session=session,entry.QueryName=queryname
}
}
}
S TimeEnd=$p($zts,",",2)
S %session.Data("MaxPageNo"_$S(Special:queryname,1:""),1)=page
S %session.Data("Page",1)=$S(page>0:1,1:0)
S %session.Data("SqlSearchTime",1)=$FN(TimeEnd-TimeStart,",",3)
D rs.Close() K rs
}
Code for specific page data retrival:
F i=1:1:entry.IDs.Count() {
&sql(
}
}
Congrats:
I like your code example that demonstrates deep understanding of the DB concept.
💪
Thanks Robert. working with Intersystems (and other M technologies) since 1991...
Looking at your example class, and maybe its not practical as we aren't seeing the entire context of your real world usage but might you see better resuilts if you were using instead of a composite index on prop1, prop2, prop3 but rahter bitmap indices on prop1 as it seems like it is the field name and would have a suffeceintly small number of values, a bitmap index on prop2, and a traditional index on prop3.
If you were to show the query plan that would also provide some insight.
My view on this solution started as that is will not work practically, but the more I thought about the scenarios, the more feasible it looks. My thinking was as follows.
When is temp table built, when the page is opened? Because that will cause a waiting time for the first load, especially if no filters have been applied yet. But in this scenario, you can just select like the top 200 or something else, as long as it is not all the rows.
Then also, every time a filter changes or the page size selection changes, you will have to rebuild the temp table. This means your temp table's lookup needs to include filters and page size to determine that it has changed for the session. It is not in the solution above, but not difficult to implement.
If you hit the indexes correct and get a small result this may be useful.
What about REST APIs, which are usually built to end the session after each request.
This will not work for REST APIs requiring pagination. There can be worked around this by letting the front-end pass in a value for the session which is more related to the front-end session.
You will also need to build some cleanup mechanism to purge rows after a session has ended. The front-end can send you some instruction on a logout, but not if the browser is just closed. It will have to be a task that runs at night or some other random time and truncate it.