string - Split rows according to text in two columns (Python, Pandas) -


this dataframe (with many more letters , length of ~35.5k) , stuff – other relevant strings). variables strings , ['c1','c2'] multiindex.

tmp  c1    c2     c3    c4    c5    start    end     c8     1      -      -     -    12       14      -     2      -      -     -    1,4,7    3,6,10  -     3      -      -     -    16,19    17,21   -     4      -      -     -    22       24      - 

i need become (split every row contains commas maintaining else):

c1    c2     c3    c4    c5    start  end   c8   appearance     1      -      -     -    12     14    -    1     2      -      -     -    1      3     -    1     2      -      -     -    4      6     -    2     2      -      -     -    7      10    -    3     3      -      -     -    16     17    -    1     3      -      -     -    19     21    -    2     4      -      -     -    22     24    -    1 

i tried script pandas: how split text in column multiple rows?

as

s = tmp['start'].str.split(',').apply(series, 1).stack() s.index = s.index.droplevel(-1) s.name = 'start del tmp['start'] final = tmp.join(s) 

but result larger should! thousands of repeats , trying split 'start'. can't imagine trying both start , end (every comma in 'start' implies comma in 'end'.

lengths: tmp   = 35568 s     = 35676 final = 293408 

you can create new df s1 , s2 , join. better use parameter expand=true in str.split , delete multiple columns drop:

for creating column appearance use groupby index cumcount.

s1 = tmp['start'].str.split(',', expand=true).stack() s1.index = s1.index.droplevel(-1) s1.name = 'start'  s2 = tmp['end'].str.split(',', expand=true).stack() s2.index = s2.index.droplevel(-1) s2.name = 'end' tmp.drop(['start', 'end'], inplace=true, axis=1)  df = pd.dataframe({'s1':s1, 's2':s2}, index=s1.index) final = tmp.join(df)  final['appearance'] = final.groupby(final.index).cumcount() + 1 print (final)   c1  c2 c3 c4 c5 c8  s1  s2  appearance 0    1  -  -  -  -  12  14           1 1    2  -  -  -  -   1   3           1 1    2  -  -  -  -   4   6           2 1    2  -  -  -  -   7  10           3 2    3  -  -  -  -  16  17           1 2    3  -  -  -  -  19  21           2 3    4  -  -  -  -  22  24           1 

edit comment:

you can try reset_index first:

print (tmp)       c3 c4 c5  start     end c8 c1 c2                             1   -  -  -     12      14  -    2   -  -  -  1,4,7  3,6,10  -    3   -  -  -  16,19   17,21  -    4   -  -  -     22      24  -  tmp.reset_index(inplace=true) print (tmp)   c1  c2 c3 c4 c5  start     end c8 0    1  -  -  -     12      14  - 1    2  -  -  -  1,4,7  3,6,10  - 2    3  -  -  -  16,19   17,21  - 3    4  -  -  -     22      24  - 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -