r - Kappa Statistic Extremely Large/Sparse matrix -
i have large sparsematrix (mat):
138493 x 17694 sparse matrix of class "dgcmatrix", 10000132 entries
i want investigate inter-rating agreement using kappa statistics when run fleiss:
kappam.fleiss(mat)
i shown following error
error in asmethod(object) : cholmod error 'problem large' @ file ../core/cholmod_dense.c, line 105
is due matrix being large?
is there other methods can use calculate kappa statistics irr on matrix large?
the best answer can offer not possible due extreme sparsity in matrix. problem: 10,000,132 entries 138,493 * 17694 = 2,450,495,142 cell matrix, have (99.59%) missing values. irr
package allows these here placing extreme demands on system, asking compare ratings users films not overlap.
this compounded problem methods in irr
package a) require dense matrixes input, , b) (at least in kripp.alpha()
loop on columns making them slow.
here illustration constructing matrix similar in nature yours (but no pattern - in reality situation better because viewers tend rate similar sets of movies).
note used krippendorff's alpha here, since allows ordinal or interval ratings (as data suggests), , handles missing data fine.
require(matrix) require(irr) seed <- 100 (sparseness <- 1 - 10000132 / (138493 * 17694)) ## [1] 0.9959191 138493 / 17694 # multiple of movies users ## [1] 7.827117 # nraters <- 17694 # nusers <- 138493 nmovies <- 100 nusers <- 783 ratermatrix <- matrix(sample(c(na, seq(0, 5, = .5)), nmovies * nusers, replace = true, prob = c(sparseness, rep((1-sparseness)/11, 11))), nrow = nmovies, ncol = nusers) kripp.alpha(t(as.matrix(ratermatrix)), method = "interval") ## krippendorff's alpha ## ## subjects = 100 ## raters = 783 ## alpha = -0.0237
this worked size matrix, fails if increase 100x (10x on each dimension), keeping same proportions in reported dataset, fails produce answer after 30 minutes, killed process.
what conclude: not asking right question of data. it's not issue of how many users agreed, sort of dimensions exist in data in terms of clusters of viewing , clusters of preferences. want use association rules or dimensional reduction methods don't balk @ sparsity in dataset.
Comments
Post a Comment