uweziegenhagen.de

Posts tagged ‘aggregate’

Daten aggregieren mit pandas

2016-02-28, 21:29

I recently came across a „challenge“ where I needed to combine various rows. Each row was identified by Key1 and Key2 and had two interesting columns, Foo and Bar. For each Key1 there may be a few Key2, for each Key2 n Foo/Bar entries. While all Foos are distinct per Key1 and Key2 the Bar column may appear j times.

The goal was to get a list of unique Bar items for each Key1/Key2 combination.

	Key1	Key2	Foo	Bar
0	C1	T1	a1	rc-1
1	C1	T1	a2	rc-1
2	C1	T1	a3	rc-1
3	C1	T1	a4	rc-1
4	C2	T2	b1	rc-1
5	C2	T2	b2	rc-2
6	C3	T3	c1	rc-3
7	C4	T4	d1	rc-4
8	C4	T4	d2	rc-5
9	C4	T4	d3	rc-4

The following Python code nicely did the job, thanks to http://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings

# -*- coding: utf-8 -*-
import pandas as pd
 
def unique(liste):
    """ takes a list of elements, separated by comma and returns sorted string of unique items separated by comma """
    a = liste.split(',')
    b = sorted(set(a))
    return ','.join(b)
 
df = pd.read_excel('groupb_Beispiel.xlsx')
print(df)
 
grouped = df.groupby(['Key1','Key2'],as_index=False)['Bar'].agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
 
grouped['Unique'] = grouped['Bar'].apply(unique)
 
print(grouped)
 
grouped.to_excel('result.xlsx')

	Key1	Key2	Bar	Unique
0	C1	T1	rc-1,rc-1,rc-1,rc-1	rc-1
1	C2	T2	rc-1,rc-2	rc-1,rc-2
2	C3	T3	rc-3	rc-3
3	C4	T4	rc-4,rc-5,rc-4	rc-4,rc-5

Uwe

Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined. Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.

Daten aggregieren mit pandas

Uwe

Links

Seiten

Kategorien

Meta

uweziegenhagen.de

Daten aggregieren mit pandas

Uwe

Links

Seiten

Kategorien

Meta

Schlagwörter