[Haskell-beginners] Is Haskell for me?

Discussion:

Luis P. Mendes

2009-11-06 13:58:16 UTC

Hi,

I'd like to have some points of view on this subject.

I need to start a project on neural analysis and genetic algorithms
that will process many millions of records in a Linux X86_64 box (some
development also in a X32 computer). Maybe I'll use fuzzy logic, or
case-based reasoning, or decision trees, too.

Recently, I finished a small project on genetic algorithms with
Python, but since run-time is very important, my next project will
need a fast language. Either I'll go to C++ or similar language or
I'll try a functiional language - Haskell.
Apart from some features in Python, I've never programmed thinking in
a functional way.
I'd like to avoid having to learn C++. I'd like to concentrate in
getting the work done and Haskell seems like a solution. But...

My questions are:
- Is Haskell able to read (also write to a point) data from databases
in a fast and reliable way? (MySql or PostgreSQL)

- how could I program something like this in Haskell:
.. generate random population
.. for each one of the population:
.. for time period 1 to ten million:
.. evaluate method 1, 2, 3, 4, 5, 6, ....
..evaluate fitness of each one
.. generate new population based on results of previous generation

It seems relatively intuitive for me to program this in an imperative
language. But what about in Haskell?

- Is Haskell suitable to process data like this in a fast way
(aproximate to C++?)

- In order for Haskell to be fast, coding is done in a 'natural' way
or with use of special hidden details of the language?

- Although I always liked math, I no longer have the knowledge I used
to have several years ago. Is this important to help program in this
funcional language?

- Are there graphical packages available to plot results or is it easy
to connect it to a Python (or C) library?

- Is code easily reusable in different future projects? Since it has
no objects... how can it be done?

Sorry for all these questions, but I really need to know about this
and that's why I want to read answers from knowledgeable people.

Luis

Luca Ciciriello

2009-11-06 14:33:41 UTC

Permalink

Hi.

Why not?

A friend of mine is a resercher in AI and is an expert in Neural network (solution spaces, matrices, etc). His programming language is Haskell. He is very happy to use Haskell (on MacOS X) in its reserach.

Personally I use Haskell with Postgres using HDBC or calling the postgress function in pqlib using the Haskell FFI (may favourite way).

I'm also an experienced C++ programmer (12 years) and I can say that C++ is a very complex language to learn from scratch and master it. Now I'm finding a little bit hard to switch from an imperative language to a functional language.

Anyway Haskell is a very beautiful and elegant and powerful language that you can use for everything.

About using a GUI, I've never used one on Mac. Sorry. I know that exist a Haskell extension called GTK2HS (see: http://www.haskell.org/gtk2hs/).

If you want to use GHC (the de-facto compiler for Haskell), the code produced is very fast and you have a very powerful support for concurrency and parallelism.

For further information go to: http://www.haskell.org/ghc/ where you can find a lot of useful documentation and answers to ur questions.

Luca.

Date: Fri, 6 Nov 2009 13:58:16 +0000
Subject: [Haskell-beginners] Is Haskell for me?
Hi,
I'd like to have some points of view on this subject.
I need to start a project on neural analysis and genetic algorithms
that will process many millions of records in a Linux X86_64 box (some
development also in a X32 computer). Maybe I'll use fuzzy logic, or
case-based reasoning, or decision trees, too.
Recently, I finished a small project on genetic algorithms with
Python, but since run-time is very important, my next project will
need a fast language. Either I'll go to C++ or similar language or
I'll try a functiional language - Haskell.
Apart from some features in Python, I've never programmed thinking in
a functional way.
I'd like to avoid having to learn C++. I'd like to concentrate in
getting the work done and Haskell seems like a solution. But...
- Is Haskell able to read (also write to a point) data from databases
in a fast and reliable way? (MySql or PostgreSQL)
.. generate random population
.. evaluate method 1, 2, 3, 4, 5, 6, ....
..evaluate fitness of each one
.. generate new population based on results of previous generation
It seems relatively intuitive for me to program this in an imperative
language. But what about in Haskell?
- Is Haskell suitable to process data like this in a fast way
(aproximate to C++?)
- In order for Haskell to be fast, coding is done in a 'natural' way
or with use of special hidden details of the language?
- Although I always liked math, I no longer have the knowledge I used
to have several years ago. Is this important to help program in this
funcional language?
- Are there graphical packages available to plot results or is it easy
to connect it to a Python (or C) library?
- Is code easily reusable in different future projects? Since it has
no objects... how can it be done?
Sorry for all these questions, but I really need to know about this
and that's why I want to read answers from knowledgeable people.
Luis
_______________________________________________
Beginners mailing list
http://www.haskell.org/mailman/listinfo/beginners

_________________________________________________________________
New Windows 7: Simplify what you do everyday. Find the right PC for you.
http://www.microsoft.com/uk/windows/buy/

Maurício CA

2009-11-06 15:28:41 UTC

Permalink

- how could I program something like this in Haskell: ..
generate random population

Since Haskell is a "pure" language, after you attribute a value
to a variable they are glued together forever. So, the language
itself can't have such a thing as a random number. (Here, someone
with technical knowledge will probably correct me, but you get
the poing.) Dealing with this (and other "outside world" stuff)
requires learning how to deal with a special type construct
called "Monad". It's the major step you will have to as a Haskell
begginer. It's, though, extremely interesting, and extremely
powerfull after you understand it.

- Although I always liked math, I no longer have the knowledge
I used to have several years ago. Is this important to help
program in this funcional language?

No. But if you like math, you're probably going to find links to
really interesting math while you learn.

(...) is it easy to connect it to a Python (or C) library?

To C libraries the answer is yes. Actually, if I want to do C, I
prefer to do it in Haskell :) A friend of mine, who uses neural
networks in his MSc work, was impressed that in half an hour I
could get better results using a low-level binding from libfann to
Haskell (link below) than his own hand written code.

http://hackage.haskell.org/package/bindings-fann

- Is code easily reusable in different future projects? Since it
has no objects... how can it be done?

Yes, to a point you may find too radical at first. Even the most
basic constructs are usually combinations of other pieces. After
some time, you will start any code you write by searching pieces
to join together instead of writing everything yourself. It is,
though, usually easier to write code from scratch in Haskell than
reuse code in imperative languages :)

Biased opinions, of course, but I hope they help. Best,
Maurício

Cory Knapp

2009-11-06 16:52:48 UTC

Permalink

I'm certainly not an expert on Haskell, and my studies over the last
year have prevented me from really working in Haskell (or any language)
much, but...

Post by MaurÃÂcio CA

- how could I program something like this in Haskell: ..
generate random population

I don't think you need that deep of an understanding of monads in order
to write random code. You need a basic understanding of "monads as
computations", and you need to know how to use do notation, which should
feel very much like programming in an imperative language.
On the other hand, the rest of the algorithm Luis mentioned is *very*
natural in Haskell once you learn how to deal with lists.

Post by MaurÃÂcio CA

- Although I always liked math, I no longer have the knowledge
I used to have several years ago. Is this important to help
program in this funcional language?

No. But if you like math, you're probably going to find links to
really interesting math while you learn.

I Definitely agree. Being a math student, one of the things I like about
Haskell is how it's a "down to earth" example of some very high level
concepts showing up-- without necessarily needing to learn all the math
that's there. (I spend more time learning the math than the Haskell, but
that's a result of my biases, not need.)

Now, your other (related) questions:

- Is Haskell suitable to process data like this in a fast way
(aproximate to C++?)

- In order for Haskell to be fast, coding is done in a 'natural' way
or with use of special hidden details of the language?

Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow Python away. Since I assume you'll be working with large data sets, lazy evaluation may actually make your life much easier. If you really are pushing for lightening fast code, you can do some Don Stewart style hacking, or you can link to C using the foreign function interface. I have no experience with either, but I've heard nothing but the best about Haskell/C interaction.

Cheers,
Cory

Shawn Willden

2009-11-06 17:19:50 UTC

Permalink

Post by Cory Knapp
Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow
Python away.

Based on the little bit of stuff I've done, I think I'd characterize it this
way: C++ will be maybe twice as fast as Haskell. Maybe a little more, maybe
a little less, depending on a lot of details. For heavy computation, Python
will be a couple orders of magnitude slower than both.

IOW, Haskell is slower than C++ but it's in the same ballpark.

Would anyone disagree?

Shawn.

Rafael Gustavo da Cunha Pereira Pinto

2009-11-06 18:08:32 UTC

Permalink

WARNING: THIS IS A VERY LONG REPLY

Luis,

If you are in a hurry skip to the point where I give advices ;-)

----------------------------------------SKIP IF YOU LIKE
TO----------------------------------------------------

I'll ask some more questions!! :-P

- Nothing beats Assembly in speed.

Why isn't everyone programming in Assembly??

- Python gives me the flexibility for fast prototyping, because it easily
connects to C and Java libraries

Why isn't every enterprise service bus (ESB) implementation written in
Python??

- Ruby has Rails!! One of most successful MVC frameworks ever.

Why isn't every system written in Ruby on Rails?

Basically, what I am trying to say is that even though Haskell is not
blazing fast, it can be made fast enough for you, given you use the right
algorithms and optimizations.

For this to happen, you must first understand, the functional paradigm, the
language and how the compiler optimizes your code.

For the rest of your questions:

- Is Haskell able to read (also write to a point) data from databases in a
fast and reliable way? (MySql or PostgreSQL)

Yes, there is a lot of ways. Look at Hackage
(HackageDB)<http://hackage.haskell.org/packages/hackage.html>for
database related packages

- how could I program something like this in Haskell:
.. generate random population
.. for each one of the population:
.. for time period 1 to ten million:
.. evaluate method 1, 2, 3, 4, 5, 6, ....
..evaluate fitness of each one
.. generate new population based on results of previous generation

It seems relatively intuitive for me to program this in an imperative
language. But what about in Haskell?

It looks rather imperative to me, either. You can implement imperative
things on the IO and State monads, but I would really suggest you rethink
the algorithm.

- Is Haskell suitable to process data like this in a fast way
(aproximate to C++?)

How fast, again, depends on the algorithm.
Haskell programs look like a collection of mathematical functions.

One could write a program like a composition of functions:

main= count . words . readfile

and then define functions individually...

- In order for Haskell to be fast, coding is done in a 'natural' way
or with use of special hidden details of the language?

The natural way makes you code right, but you have to think every step you
do, since a badly organized recursion, or the excess of lazyness can make
your program hog on memory and/or go dead slow.

- Although I always liked math, I no longer have the knowledge I used
to have several years ago. Is this important to help program in this
funcional language?

The math needed in your first steps is not hard.

After you started learning Haskell you will most certainly step on Monads.
These might require some abstract algebra, if you want to understand what's
behind the curtains.

Refrain yourself from trying to understand them using math and take a look
on Philip Wadler's "Monads for functional
programming"<http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf>

- Are there graphical packages available to plot results or is it easy
to connect it to a Python (or C) library?

Hackage is the way to go.

- Is code easily reusable in different future projects? Since it has
no objects... how can it be done?

Like all other languages, it all depends on YOU.

Haskell has no objects, but it has polymorphic functions, which are just as
powerful.

It also has type classes, which are almost like C++ object classes, but
completely different. Type classes can restrict or increase polymorphism,
depending on how you use them.

There are also MANY extensions implemented on GHC that expand the type
system to its limit.

---------------------------------------- END OF SKIP
BLOCK----------------------------------------------------

Is Haskell for you? I can't answer, but I can give some advice:

1) If you are in a hurry, go for what you know (C++, Java, Python,
Assembly...)

2) If you really want to dig in, dig in. Learning Haskell is a wonderful
experience and can dramatically change your way of programming. Think of a
mind altering experience!!!

3) Even though it is hard to write great programs at first, in the end it is
very rewarding. As your programming style improves, you'll see how elegant
algorithms are implemented in functional programming.

I still consider myself a beginner, but I can assure you Haskell is a great
language for functional programming.

Best regards,

Rafael Gustavo da Cunha Pereira Pinto

Daniel Fischer

2009-11-06 18:44:39 UTC

Permalink

Post by Shawn Willden

Post by Cory Knapp
Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow
Python away.

Based on the little bit of stuff I've done, I think I'd characterize it
this way: C++ will be maybe twice as fast as Haskell. Maybe a little
more, maybe a little less, depending on a lot of details.

Difficult waters. Depending on a lot of things, (rather idiomatic) Haskell is between a
little faster (that's rare, however) and *much* slower than C (or C++, but I can only
strongly advise against using that; pick C, C# or Java if you want to use a fast, halfway
sensible imperative language. C++ consistently chose what I find the worst parts of both
worlds - YMMV).

Two things should be noted:
- it's easy, especially if one isn't yet experienced, to write Haskell code that is much
slower than C because of the wrong choice of data structures or not making things strict
in the right places.
- but if that happens, you have a good chance of quickly getting help to get up to speed
here or on #haskell.

That said, a factor of 2-3 relative to C is usually achievable without low-level tweaking,
sometimes better, sometimes worse.

Post by Shawn Willden
For heavy
computation, Python will be a couple orders of magnitude slower than both.

For not very large values of 'couple': don't expect a factor of more than 100 - that's
extremely rare.

Post by Shawn Willden
IOW, Haskell is slower than C++ but it's in the same ballpark.

Yes, as a general rule, that's it.

Post by Shawn Willden
Would anyone disagree?

Bulat?

Post by Shawn Willden
Shawn.

Gaius Hammond

2009-11-06 20:48:53 UTC

Permalink

Post by Shawn Willden
Based on the little bit of stuff I've done, I think I'd characterize it this
way: C++ will be maybe twice as fast as Haskell. Maybe a little more, maybe
a little less, depending on a lot of details. For heavy
computation, Python
will be a couple orders of magnitude slower than both.

To be fair, Python offloads its heavy lifting to C libraries - NumPy
and SciPy run at very close to full C speed on large datasets. This is
also how Matlab works. Unladen Swallow is an upcoming JIT compiler for
Python.

Where Haskell shines for computation is when you can leverage lazy
evaluation.

Cheers,

G

Felipe Lessa

2009-11-06 21:58:30 UTC

Permalink

Post by Gaius Hammond
To be fair, Python offloads its heavy lifting to C libraries - NumPy
and SciPy run at very close to full C speed on large datasets. This is
also how Matlab works. Unladen Swallow is an upcoming JIT compiler for
Python.
Where Haskell shines for computation is when you can leverage lazy
evaluation.

*If* you can offload most of your work to SciPy. Depending on what you
do this is at least difficult. I don't know how much of a neural
network can be represented as big fat matrix :).

--
Felipe.

geremy condra

2009-11-06 23:37:30 UTC

Permalink

Post by Felipe Lessa

*If* you can offload most of your work to SciPy. Depending on what you
do this is at least difficult. I don't know how much of a neural
network can be represented as big fat matrix :).
--
Felipe.

Neural networking can be easily handled in both Python and C.
I've used my own Graphine graph theory package for it and
found that it was quite easy and reasonably fast, and certainly
LEDA is a very high performance, pretty easy-to-use library.
There's no need to coerce ANNs into a matrix form if you don't
want to.

Geremy Condra

Jon Harrop

2009-11-21 04:36:42 UTC

Permalink

Post by Shawn Willden

Post by Cory Knapp
Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow
Python away.

Based on the little bit of stuff I've done, I think I'd characterize it
this way: C++ will be maybe twice as fast as Haskell. Maybe a little
more, maybe a little less, depending on a lot of details. For heavy
computation, Python will be a couple orders of magnitude slower than both.
IOW, Haskell is slower than C++ but it's in the same ballpark.
Would anyone disagree?

The long-standing bug in GHC's garbage collector that makes writing a single
boxed value into an array O(n) instead of O(1), because the GC dirties and
retraverses the entire array, makes it impossible to write efficient Haskell
code to solve many basic problems.

Here is a simple dictionary benchmark where Python is 4x faster than Haskell
because this bug means it is impossible to implement a competitively-
performant dictionary:

http://flyingfrogblog.blogspot.com/2009/04/f-vs-ocaml-vs-haskell-hash-table.html

Haskell's celebrated idiomatic quicksort is actually 6,000x slower than a
traditional implementation on this machine and consumes asymptotically more
memory (making it useless for quite mundane problems):

http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell

Although people have created array-based quicksorts in Haskell the same perf
bug in the GC means that a generic quicksort in Haskell would be
asymptotically slower if it were given an array of boxed values.

As an aside, purity complicates the creation of a parallel generic quicksort
because it is necessary to mutate different parts of the same array in
parallel. AFAICT, writing an efficient parallel generic quicksort is an
unsolved problem in Haskell.

Suffice to say, I cannot agree that Haskell is in the same ballpark as C++ in
terms of performance. :-)

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Ben Lippmeier

2009-11-21 06:38:09 UTC

Permalink

Post by Jon Harrop
The long-standing bug in GHC's garbage collector that makes writing a single
boxed value into an array O(n) instead of O(1), because the GC dirties and
retraverses the entire array, makes it impossible to write efficient Haskell
code to solve many basic problems.

Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ?

The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during scanning. That could certainly slow down GC cycles, but how does it make array update O(n)?

(update of the standard Haskell "pure" arrays being a separate issue, of course).

Ben.

Nicolas Pouillard

2009-11-21 10:13:22 UTC

Permalink

Post by Ben Lippmeier

Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ?
The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during
scanning. That could certainly slow down GC cycles, but how does it make array update O(n)?

He means in worst case. Writing once, will cause O(n) during the next GC. Of
course if you do a lot of updates between two GCs their is no extra penalty.
So if you make 'k' updates between 2 GCs it costs you k*O(1)+O(n) which is
still O(n) but practically nicer than k*O(n).

--
Nicolas Pouillard
http://nicolaspouillard.fr

Ben Lippmeier

2009-11-21 11:56:09 UTC

Permalink

Post by Nicolas Pouillard

Post by Ben Lippmeier

Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ?
The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during
scanning. That could certainly slow down GC cycles, but how does it make array update O(n)?

Hmm. I'd be careful about conflating algorithmic complexity with memory management issues. By the above reasoning, if I were to run any program using arrays on a system with a two space garbage collector (which copies all live objects during each GC) I could say the worst case algorithmic complexity was O(n). That doesn't sound right.

I could take this further and say that in a virtual memory system, there is a chance that the whole heap gets copied to the disk and back between each array update. I might again say this has O(n) complexity, but it wouldn't be very helpful...

Ben.

Nicolas Pouillard

2009-11-21 14:10:05 UTC

Permalink

Post by Ben Lippmeier

Post by Nicolas Pouillard

Post by Ben Lippmeier

Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ?
The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during
scanning. That could certainly slow down GC cycles, but how does it make array update O(n)?

Hmm. I'd be careful about conflating algorithmic complexity with memory management issues. By the above reasoning, if I were to run any program using
arrays on a system with a two space garbage collector (which copies all live objects during each GC) I could say the worst case algorithmic complexity was
O(n). That doesn't sound right.
I could take this further and say that in a virtual memory system, there is a chance that the whole heap gets copied to the disk and back between each
array update. I might again say this has O(n) complexity, but it wouldn't be very helpful...

Your algorithm is O(1) but the current run-time system can take a time closer to O(n). This could be the case with a two spaces GC or with swapping.

--
Nicolas Pouillard
http://nicolaspouillard.fr

Jon Harrop

2009-11-21 18:02:33 UTC

Permalink

Post by Ben Lippmeier
Hmm. I'd be careful about conflating algorithmic complexity with memory
management issues.

No need. Just look at how badly the performance scales for Haskell vs other
languages. For example, inserting 1-16 million floating point key/values into
a hash table:

Haskell OCaml F#
1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x
2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x
4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x
8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x
16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x

Note that Haskell is 290x slower than F# on that last test.

In practice, you would turn to a purely functional dictionary in Haskell based
upon balanced binary trees in order to work around this long-standing bug in
the GC but those trees incur O(log n) indirections and typically run orders
of magnitude slower than a decent hash table.

Suffice to say, Haskell is nowhere near being in the ballpark of C++'s
performance for basic functionality like dictionaries and sorting.

Post by Ben Lippmeier
By the above reasoning, if I were to run any program
using arrays on a system with a two space garbage collector (which copies
all live objects during each GC) I could say the worst case algorithmic
complexity was O(n). That doesn't sound right.

Can you write a program that demonstrates this effect as I did?

Post by Ben Lippmeier
I could take this further and say that in a virtual memory system, there is
a chance that the whole heap gets copied to the disk and back between each
array update.

Can you write a program that demonstrates this effect as I did?

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Ben Lippmeier

2009-11-21 23:52:51 UTC

Permalink

Post by Jon Harrop

Can you write a program that demonstrates this effect as I did?

I believe your numbers, but I don't have a F# install to test against ATM.

Post by Jon Harrop
Suffice to say, Haskell is nowhere near being in the ballpark of C++'s
performance for basic functionality like dictionaries and sorting.

Sorry, I wasn't trying to start a pissing match. I just wanted to know what the issue was so I could avoid it in my own programs.

Thanks!

Ben.

Bernhard Lehnert

2009-11-06 16:58:23 UTC

Permalink

Hi!

Post by Luis P. Mendes
- Is Haskell able to read (also write to a point) data from databases
in a fast and reliable way? (MySql or PostgreSQL)

You might want to check out these links:
http://software.complete.org/software/projects/show/hdbc
http://software.complete.org/static/hdbc-odbc/doc//HDBC-odbc/Database-HDBC-ODBC.html
http://sites.google.com/site/haskell/notes/connecting-to-mysql-with-haskell
http://www.volker-wysk.de/mysql-hs/

Post by Luis P. Mendes
.. generate random population
.. evaluate method 1, 2, 3, 4, 5, 6, ....
..evaluate fitness of each one
.. generate new population based on results of previous generation

You will have to change your thinking a lot. Forget about "for each one
of the population". Think "Map a function to the list which contains the
population". When I started I thought I knew map from Python - and in
fact I did underestimate "map" all the time in Python.
For time 1 to ten million evaluate a method? No - map "evaluate fitness"
to the list [1..10000000] (Don't worry about big numbers - it's all
lazily evaluated.
"Generate new population based on previous generation"? It's just
recursion and where will you find better recursion than in a functional
language?

Post by Luis P. Mendes
- Are there graphical packages available to plot results or is it easy
to connect it to a Python (or C) library?

There are some graphical packages, but if everything else fails: Do fast
computing in Haskell and plot the results using Python.

Post by Luis P. Mendes
- Is code easily reusable in different future projects? Since it has
no objects... how can it be done?

You do not reuse objects, you reuse functions - that's why they call it
functional and yes

Post by Luis P. Mendes
Sorry for all these questions, but I really need to know about this
and that's why I want to read answers from knowledgeable people.

Sorry, I'm a beginner myself and would not actually call myself
"knowledgeable". However, the learning curve is steep and lots of things
that seem to be very complicated are soon very logic!

You'll need some time to get the new ideas. If you got that, it's really
worthwhile!

Greets,
Bernhard

Nathan M. Holden

2009-11-21 18:57:21 UTC

Permalink

I'm not a Haskell expert-- in fact, I'm a beginner, and that's why I'm here,
but I read your blog post on the subject of this hash table program, and it
seems to me from reading the comments in reply that you're just here trolling,
because you're using an algorithm that is fundamentally not purely functional,
and of course that's going to be slower, just like asking Joel Zumaya to pitch
left handed.

If you were being honest about your complaint, you'd make an apples-to-apples
comparison, and a number of commenters on your blog have proposed
implementations that perform much better than your example.

Sorry if I'm just being a jerk,
Nathan M. Holden,
Haskell Beginner

Message: 4
Date: Sat, 21 Nov 2009 18:02:33 +0000
Subject: Re: [Haskell-beginners] Re: Is Haskell for me?
Content-Type: text/plain; charset="iso-8859-1"

Post by Ben Lippmeier
Hmm. I'd be careful about conflating algorithmic complexity with memory
management issues.

No need. Just look at how badly the performance scales for Haskell vs other
languages. For example, inserting 1-16 million floating point key/values
Haskell OCaml F#
1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x
2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x
4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x
8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x
16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x
Note that Haskell is 290x slower than F# on that last test.
In practice, you would turn to a purely functional dictionary in Haskell
based upon balanced binary trees in order to work around this
long-standing bug in the GC but those trees incur O(log n) indirections
and typically run orders of magnitude slower than a decent hash table.
Suffice to say, Haskell is nowhere near being in the ballpark of C++'s
performance for basic functionality like dictionaries and sorting.

Can you write a program that demonstrates this effect as I did?

Post by Ben Lippmeier
I could take this further and say that in a virtual memory system, there
is a chance that the whole heap gets copied to the disk and back between
each array update.

Can you write a program that demonstrates this effect as I did?

Jon Harrop

2009-11-21 20:52:30 UTC

Permalink

Post by Nathan M. Holden
I'm not a Haskell expert-- in fact, I'm a beginner, and that's why I'm
here, but I read your blog post on the subject of this hash table program,
and it seems to me from reading the comments in reply that you're just here
trolling, because you're using an algorithm that is fundamentally not
purely functional, and of course that's going to be slower, just like
asking Joel Zumaya to pitch left handed.
If you were being honest about your complaint, you'd make an
apples-to-apples comparison, and a number of commenters on your blog have
proposed implementations that perform much better than your example.
Sorry if I'm just being a jerk,

Not at all, that is a perfectly reasonable concern but I did already try to

Post by Nathan M. Holden

Post by Jon Harrop
In practice, you would turn to a purely functional dictionary in Haskell
based upon balanced binary trees in order to work around this
long-standing bug in the GC but those trees incur O(log n) indirections
and typically run orders of magnitude slower than a decent hash table.

For example, the following Haskell program builds a purely functional
Data.Map:

module Main where

import Prelude hiding (lookup)
import Data.Map (empty, insert, lookup, size)

n = 1000000

build m 0 = m
build m n = build (insert x (1.0 / x) m) (n-1)
where x = fromIntegral n :: Double

main = do
let
m = build empty n
(Just v) = lookup 100 m

print $ size m
print v

Running this program with different "n" gives:

Data.Map
1M: 2.797s 1.0x
2M: 6.090s 2.2x
4M: 14.226s 5.1x
8M: 28.449s 10.2x
16M: 83.171s 29.7x

This is several times faster than Haskell's Data.Hashtable (because of the
long-standing bug in their GC that I described) and is scaling better.
However, if you compare with the timings I gave before:

Data.Hashtable OCaml F#
1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x
2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x
4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x
8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x
16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x

you'll see that the absolute performance of this idiomatic Haskell solution is
still absolutely awful: consistently about 50x slower than the F#.

This is also true in the context of sorting: Haskell's standard library
routines for sorting are orders of magnitude slower than those found in most
other compiled languages.

Suffice to say, idiomatic Haskell is also nowhere near being in the same
ballpark as C++ with respect to performance. Realistically, with enough
expertise you should be able to optimize most of your Haskell programs to
beat Python's performance but there are some important cases where you will
not even be able to do that.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e