/prog/ - Stop pretending studying arrow theory improves your apping

Name: Anonymous 2015-03-11 7:12

Given two functors S,T:C→B, a natural transformation τ:S→T is a function which assigns to each object c of C an arrow τ_c=τc:Sc→TC of B in such a way that every arrow f:c→c' in C yields a diagram
ﾠﾠﾠﾠﾠﾠﾠﾠﾠﾠﾠτc
c ﾠﾠ Sc---→Tc
|ﾠﾠﾠﾠﾠﾠﾠﾠ|ﾠﾠﾠﾠﾠﾠﾠ|
|fﾠﾠﾠﾠﾠSf↓ﾠﾠﾠﾠﾠ↓Tf
↓ﾠﾠﾠﾠﾠﾠﾠ|ﾠﾠτc'ﾠﾠ|
c', ﾠﾠﾠ Sc'--→Tc'

which is commutative. When this holds, we also say that τ_c=τc:Sc→Tc is natural in c. If we think of the functor S as giving a picture in B of (all the objects and arrows of) C, then a natural transformation τ is the set of arrows mapping (or, translating) the picture S to the picture T, with all squares (and parallelograms!) like that above commutative:
ﾠaﾠﾠﾠﾠﾠﾠﾠﾠﾠﾠSa----------→Ta
ﾠ|ﾠ╲fﾠﾠﾠﾠﾠﾠﾠﾠ|ﾠ╲Sfﾠﾠﾠﾠﾠﾠﾠﾠﾠ|ﾠ╲Tf
ﾠ|ﾠﾠﾠ↘ﾠﾠﾠﾠﾠﾠﾠ|ﾠﾠﾠ↘ﾠﾠﾠτbﾠﾠﾠ|ﾠﾠﾠ↘
ﾠ|ﾠﾠﾠﾠﾠbﾠﾠﾠﾠﾠ|ﾠﾠﾠﾠSb----------→Tb
ﾠ|ﾠﾠﾠ╱ﾠﾠﾠﾠﾠﾠﾠ|ﾠﾠﾠ╱ﾠﾠﾠﾠﾠﾠﾠﾠﾠ|ﾠﾠﾠ╱
↓ﾠ↙ﾠﾠﾠﾠﾠﾠﾠ↓↙Sgﾠﾠﾠﾠﾠﾠﾠﾠ↓ﾠ↙Tg
ﾠcﾠﾠﾠﾠﾠﾠﾠﾠﾠﾠSc----------→Tc

Name: Anonymous 2015-03-14 14:59

Following a discussion on this thread, telling me that I was abusing tuples, I decided to post a separate question about giant tuples or an alternative to (R) data-frame. When I say giant tuples, I mean from 5-uples, to 50-uples+ ;-).

The general consensus seems to be that tuples should be avoided (especially long one) and (R) data.frame smells of bad design, therefore any related question are usually not considered as real world and as such does't come up with a "real world" asnwer . However I have a real case scenario where giant tuple/record is a necessity : generating corporate reports using Word (or equivalent).

An easy way to generate nice reports is to use Word and its mail merging feature. For people how doesn't know, you create a template document with fields and feed Word with a csv. Word instantiate the templates for each row, each field being a column in the csv.

So far so good, most of the reports are just map-reduce problems, seems an easy job for Haskell. The problem is , a simple report would have a minimum 20 fields, whereas a big one could have 50-80 fields, meaning as many column in a csv.

How do you deal with that in Haskell ? I know people will just answer : "create a big record with all you fields and the job is done" and that's probably the answer, however the workflow I'm use to (which is probably ) needs more flexibility.

I started writing those report in Excel. The workflow is basically, load a csv, and each time you need to compute something, create a new column (even intermediate results). Keeping intermediate result (as a column) is a good practice because it means than formula are shorter and if something gets wrong, you can easily find problem and/or rewrite formulat.

However, doing things manually becomes quickly a pain so I then used R. R with data.frame is great an allow to replicate this workflow. For people how doesn't R, data.frame basically a table (as in SQL or in a excel spreadsheet). They have rows, (named) columns and you can do everything you need with them (adding row, adding column, join them) etc ...

You can do things like that :

data1 <- read.csv(file1) -- read a csv
data2 <- read.csv(file2) -- read another csv
data <- merge(data1, data2) -- do a join using common columns
data$q6 = roundTo(data$quantity, 6) -- round up quantities to multiple of 6.
data$amount = data$q6*data$price -- great a new column : amount = price*q6

It's damm quick to write and you can't really beat it. However, 6 months later you have no idea how many columns data has, and worst, you don't know if quantity is coming originaly from file1 or file2.

I'm trying to do the same in Haskell. Reading a csv is easy. Merging them too (thanks to the these package. The main problem is data representation.

What's is hard to do in Haskell (but easy in R) is that each intermediate calcul (or steps) creates a new shape by extending the previous one.

For example

data1 would be (String, Int) (item code, quantity).
data2 would be (String, Double) (item code, price). the result of merge(data1,data2) , (String, Int, Double). when adding q6 : (String, Int, Double, Int). when adding amount : `(String, Int, Double, Int, Amount).

etc ...

This type extension ((String, Int, Double) => (String, Int, Double, Int) => (String, Int, Double, Int, Double) is needed for different reason and is the problem I would like to solve.

The different approach I found are, tuples, records, HList.Record and dataframe ?
tuples

Tuples are great mainly because you don't need to create a type but also because they come batteries including ie instantiate all the standard typeclass (like Monoid, csv reader etc ...). However, they don't compose well (how do you transform ((a,b), c) to (a, b,c), they are a pain to get/set value from , and worst, the common instances have a limit and each packages as it's own limit. For example, the morph package instantiate HFoldable for 13-uple whereas Monoid stops as 5-uples.
Record

I've been told that types are cheap in haskell, so I could indeed creates a new type for each steps. The main problems here are : instanciation of common typeclass (Monoid, fromCSv etc ..) and find a good name for each type variation (and their accessors).
Record with polymorphic tail

It's probably equivalent to HList.Record ...
HList.Record

HList.Record seems to tick all the boxes, they are extendable, easy to access etc ... However, I don't how well the perform with LOTS of fields. And, my main concern at the moment, is type signature. The type signature of a HList.Record with 10 fields would probabaly not fit in a page.
DataFrame

I've heard of a dataframe package but I don't know if it's ready and or is really different from the HList.Record approach.

Any thought are welcome :-)

Stop pretending studying arrow theory improves your apping

1 Name: Anonymous 2015-03-11 7:12

35 Name: Anonymous 2015-03-14 14:59

Name: Anonymous 2015-03-11 7:12

Name: Anonymous 2015-03-14 14:59