uweziegenhagen.de » Blog Archive » Daten aggregieren mit pandas

Daten aggregieren mit pandas

2016-02-28, 21:29

I recently came across a „challenge“ where I needed to combine various rows. Each row was identified by Key1 and Key2 and had two interesting columns, Foo and Bar. For each Key1 there may be a few Key2, for each Key2 n Foo/Bar entries. While all Foos are distinct per Key1 and Key2 the Bar column may appear j times.

The goal was to get a list of unique Bar items for each Key1/Key2 combination.

	Key1	Key2	Foo	Bar
0	C1	T1	a1	rc-1
1	C1	T1	a2	rc-1
2	C1	T1	a3	rc-1
3	C1	T1	a4	rc-1
4	C2	T2	b1	rc-1
5	C2	T2	b2	rc-2
6	C3	T3	c1	rc-3
7	C4	T4	d1	rc-4
8	C4	T4	d2	rc-5
9	C4	T4	d3	rc-4

The following Python code nicely did the job, thanks to http://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings

# -*- coding: utf-8 -*-
import pandas as pd
 
def unique(liste):
    """ takes a list of elements, separated by comma and returns sorted string of unique items separated by comma """
    a = liste.split(',')
    b = sorted(set(a))
    return ','.join(b)
 
df = pd.read_excel('groupb_Beispiel.xlsx')
print(df)
 
grouped = df.groupby(['Key1','Key2'],as_index=False)['Bar'].agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
 
grouped['Unique'] = grouped['Bar'].apply(unique)
 
print(grouped)
 
grouped.to_excel('result.xlsx')

	Key1	Key2	Bar	Unique
0	C1	T1	rc-1,rc-1,rc-1,rc-1	rc-1
1	C2	T2	rc-1,rc-2	rc-1,rc-2
2	C3	T3	rc-3	rc-3
3	C4	T4	rc-4,rc-5,rc-4	rc-4,rc-5

Uwe

Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined. Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.

More Posts - Website

Schlagwörter: Python, Pandas, aggregate, group by
Category: Allgemein, Python / SciPy / pandas

Entries (RSS) and Comments (RSS). Valid XHTML and CSS.
Powered by WordPress and Fluid Blue theme.

Durch die weitere Nutzung der Seite stimmst du der Verwendung von Cookies zu. Weitere Informationen