r - Kappa Statistic Extremely Large/Sparse matrix -

- July 15, 2015

i have large sparsematrix (mat):

138493 x 17694 sparse matrix of class "dgcmatrix", 10000132 entries

i want investigate inter-rating agreement using kappa statistics when run fleiss:

kappam.fleiss(mat)

i shown following error

error in asmethod(object) :    cholmod error 'problem large' @ file ../core/cholmod_dense.c, line 105

is due matrix being large?

is there other methods can use calculate kappa statistics irr on matrix large?

the best answer can offer not possible due extreme sparsity in matrix. problem: 10,000,132 entries 138,493 * 17694 = 2,450,495,142 cell matrix, have (99.59%) missing values. irr package allows these here placing extreme demands on system, asking compare ratings users films not overlap.

this compounded problem methods in irr package a) require dense matrixes input, , b) (at least in kripp.alpha() loop on columns making them slow.

here illustration constructing matrix similar in nature yours (but no pattern - in reality situation better because viewers tend rate similar sets of movies).

note used krippendorff's alpha here, since allows ordinal or interval ratings (as data suggests), , handles missing data fine.

require(matrix) require(irr) seed <- 100 (sparseness <- 1 - 10000132 / (138493 * 17694)) ## [1] 0.9959191 138493 / 17694 # multiple of movies users ## [1] 7.827117 # nraters <- 17694 # nusers <- 138493 nmovies <- 100 nusers <- 783 ratermatrix <-      matrix(sample(c(na, seq(0, 5, = .5)), nmovies * nusers, replace = true,                   prob = c(sparseness, rep((1-sparseness)/11, 11))),            nrow = nmovies, ncol = nusers) kripp.alpha(t(as.matrix(ratermatrix)), method = "interval") ## krippendorff's alpha ## ## subjects = 100  ##   raters = 783  ##    alpha = -0.0237

this worked size matrix, fails if increase 100x (10x on each dimension), keeping same proportions in reported dataset, fails produce answer after 30 minutes, killed process.

what conclude: not asking right question of data. it's not issue of how many users agreed, sort of dimensions exist in data in terms of clusters of viewing , clusters of preferences. want use association rules or dimensional reduction methods don't balk @ sparsity in dataset.

Search This Blog

Jal

r - Kappa Statistic Extremely Large/Sparse matrix -

Comments

Post a Comment

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -