google bigquery - Random sampling complete rows -


i know question 1 can random sampling rand.

select * [table] rand() < percentage

but require full table scan , incur equivalent cost. i'm wondering if there more efficient ways?

i'm experimenting tabledata.list api got java.net.sockettimeoutexception: read timed out when index large (i.e. > 10000000). operation not o(1)?

bigquery .tabledata() .list(tableref.getprojectid, tableref.getdatasetid, tableref.gettableid) .setstartindex(index) .setmaxresults(1l) .execute()

i recommend paging tabledata.list pagetoken , collect sample rows each page. should scale better.

another (totally different) option see use of table decorators
can in loop grammatically generate random time (for snapshot) or time-frame (for range) , query portions of data extracting needed data.
note limitation: allow sample data less 7 days old.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -