This page is no longer maintained — Please continue to the home page at www.scala-lang.org

was Foreach[+A]: [OT] a passion of mine

7 replies
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.


On Fri, Mar 26, 2010 at 7:21 AM, Paul Phillips <paulp@improving.org> wrote:
Leading with a quote from the man himself[1]:

"Two aspects of software systems tend to accelerate bit rot: lack of
explicit design and code duplication."

Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?
 

Since truer words were never spoken, last night I decided to follow my
inclinations and find out if there was some obvious reason we couldn't
do what I wanted to the collections.  And one mostly red diff later I
never found the reason.  We create this interface:

trait Foreach[+A] {
 def foreach[U](f: A => U): Unit
 // there are a few more convenient methods here but only foreach is mandatory
}

Both Iterator and Traversable implement it.  Now we can perform this
transformation many times:

-  override def ++[B >: A, That](that: Traversable[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
-  override def ++[B >: A, That](that: Iterator[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
+  override def ++[B >: A, That](xs: Foreach[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That

And that's not all! (NOW how much would you pay?) Here is the signature
of scala.util.Randomm.shuffle:

 def shuffle[T, CC[X] <: Traversable[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

I can now change this to:

 def shuffle[T, CC[X] <: Foreach[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

Don't even have to touch the implementation of the method[2], and now
this works:

scala> scala.util.Random.shuffle(1 to 5 iterator)
res0: Iterator[Int] = non-empty iterator

scala> res0 foreach println
4
5
1
2
3

So to me this seems like such a slam dunk that I'm already going into a
crouch to fend off the blow I don't see coming.  Why shouldn't we do
this? The whole branch can be seen here:

 http://github.com/paulp/scala/tree/foreach

There are probably more spots I can apply this, I was just going enough
for proof of concept.  The diff is here:

 http://github.com/paulp/scala/commit/072934073c0675d02111cb3f1f61e5be2368ceee

All tests pass, this thing is ready to fire.

[1] http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf
[2] ...but I did have to give Iterator a CanBuildFrom.

--
Paul Phillips      | A Sunday school is a prison in which children do
Vivid              | penance for the evil conscience of their parents.
Empiricist         |     -- H. L. Mencken
pull his pi pal!   |----------* http://www.improving.org/paulp/ *----------



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
Miguel Garcia
Joined: 2009-06-10,
User offline. Last seen 42 years 45 weeks ago.
queryable source code (even better ASTs) repositories
   A search for "code clones" in scholar.google.com reveals a load of recent papers with prototypes.   Same happens when googling for "mining source code repositories". There's in fact a dedicated conference for just that topic:
http://msr.uwaterloo.ca/msr2010/index.html   Coming back to Scala, the only way I know to explore ASTs is from inside a compiler plugin (there's no external representation for ASTs, "external" as in a queryable database, as e.g. this one for Java
http://semmle.com/semmlecode/documentation/ql/ )   I guess such infrastructure would allow not just clone detection but additionally other use cases.     Miguel http://www.sts.tu-harburg.de/people/mi.garcia    
----- Original Message ----- From: David Pollak
Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?
Tiark Rompf
Joined: 2009-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: was Foreach[+A]: [OT] a passion of mine
On Mar 26, 2010, at 4:33 PM, David Pollak wrote:

On Fri, Mar 26, 2010 at 7:21 AM, Paul Phillips <paulp@improving.org> wrote:
Leading with a quote from the man himself[1]:

"Two aspects of software systems tend to accelerate bit rot: lack of
explicit design and code duplication."

Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?

At least for finding duplicates, there's quite a bit of research out there. I recall that one of my undergrad professors was very much into this topic (Rainer Koschke, http://www.informatik.uni-bremen.de/~koschke/).
Here's a collection of papers that might give you some orientation:
http://drops.dagstuhl.de/opus/volltexte/2007/972/pdf/06301_abstracts_collection.972.pdf
Cheers,- Tiark
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: queryable source code (even better ASTs) repositories


On Fri, Mar 26, 2010 at 8:54 AM, Miguel Garcia <miguel.garcia@tuhh.de> wrote:
  A search for "code clones" in scholar.google.com reveals a load of recent papers with prototypes.   Same happens when googling for "mining source code repositories". There's in fact a dedicated conference for just that topic:
http://msr.uwaterloo.ca/msr2010/index.html   Coming back to Scala, the only way I know to explore ASTs is from inside a compiler plugin (there's no external representation for ASTs, "external" as in a queryable database, as e.g. this one for Java
http://semmle.com/semmlecode/documentation/ql/ )   I guess such infrastructure would allow not just clone detection but additionally other use cases.

Awesome!
 
    Miguel http://www.sts.tu-harburg.de/people/mi.garcia    
----- Original Message ----- From: David Pollak
Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: was Foreach[+A]: [OT] a passion of mine


On Fri, Mar 26, 2010 at 9:16 AM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:
On Mar 26, 2010, at 4:33 PM, David Pollak wrote:

On Fri, Mar 26, 2010 at 7:21 AM, Paul Phillips <paulp@improving.org> wrote:
Leading with a quote from the man himself[1]:

"Two aspects of software systems tend to accelerate bit rot: lack of
explicit design and code duplication."

Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?

At least for finding duplicates, there's quite a bit of research out there. I recall that one of my undergrad professors was very much into this topic (Rainer Koschke, http://www.informatik.uni-bremen.de/~koschke/).
Here's a collection of papers that might give you some orientation:
http://drops.dagstuhl.de/opus/volltexte/2007/972/pdf/06301_abstracts_collection.972.pdf

Cool.  I just added it to my reading list!
 

Cheers,- Tiark



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
Grand, Mark D.
Joined: 2009-12-24,
User offline. Last seen 42 years 45 weeks ago.
RE: was Foreach[+A]: [OT] a passion of mine

I used to work for a consulting arm of Hewlet-Packard that did migrations of a company’s code from platform to platform or language to language.  One of the tools that we used to help estimate the effort needed for these projects was a tool to find code duplication.   The idea was that the total effort would be based on number of lines of code after subtracting duplicates.

 

There was a related tool that provided a visualization of duplicate code by representing files as nodes in a graph and common pieces of code as lines connecting the nodes.  A view was created the clustered nodes by how strongly they were connected, using number of common lines of code as a weighting factor.  The result was generally interesting and allowed us to be aware of relationships between pieces of code that our customers were sometimes unaware of.

 

From: David Pollak [mailto:feeder.of.the.bears@gmail.com]
Sent: Friday, March 26, 2010 11:33 AM
To: Paul Phillips
Cc: martin odersky; scala-internals@listes.epfl.ch
Subject: [scala-internals] was Foreach[+A]: [OT] a passion of mine

 

 

On Fri, Mar 26, 2010 at 7:21 AM, Paul Phillips <paulp@improving.org> wrote:

Leading with a quote from the man himself[1]:

"Two aspects of software systems tend to accelerate bit rot: lack of
explicit design and code duplication."


Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?
 


Since truer words were never spoken, last night I decided to follow my
inclinations and find out if there was some obvious reason we couldn't
do what I wanted to the collections.  And one mostly red diff later I
never found the reason.  We create this interface:

trait Foreach[+A] {
 def foreach[U](f: A => U): Unit
 // there are a few more convenient methods here but only foreach is mandatory
}

Both Iterator and Traversable implement it.  Now we can perform this
transformation many times:

-  override def ++[B >: A, That](that: Traversable[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
-  override def ++[B >: A, That](that: Iterator[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
+  override def ++[B >: A, That](xs: Foreach[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That

And that's not all! (NOW how much would you pay?) Here is the signature
of scala.util.Randomm.shuffle:

 def shuffle[T, CC[X] <: Traversable[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

I can now change this to:

 def shuffle[T, CC[X] <: Foreach[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

Don't even have to touch the implementation of the method[2], and now
this works:

scala> scala.util.Random.shuffle(1 to 5 iterator)
res0: Iterator[Int] = non-empty iterator

scala> res0 foreach println
4
5
1
2
3

So to me this seems like such a slam dunk that I'm already going into a
crouch to fend off the blow I don't see coming.  Why shouldn't we do
this? The whole branch can be seen here:

 http://github.com/paulp/scala/tree/foreach

There are probably more spots I can apply this, I was just going enough
for proof of concept.  The diff is here:

 http://github.com/paulp/scala/commit/072934073c0675d02111cb3f1f61e5be2368ceee

All tests pass, this thing is ready to fire.

[1] http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf
[2] ...but I did have to give Iterator a CanBuildFrom.

--
Paul Phillips      | A Sunday school is a prison in which children do
Vivid              | penance for the evil conscience of their parents.
Empiricist         |     -- H. L. Mencken
pull his pi pal!   |----------* http://www.improving.org/paulp/ *----------




Miguel Garcia
Joined: 2009-06-10,
User offline. Last seen 42 years 45 weeks ago.
Re: queryable source code (even better ASTs) repositories

I refined a bit the google search and found the

Workshop Query Technologies and Applications for Program Comprehension
as part of the
International Conference on Program Comprehension
http://www.program-comprehension.org/

There's no standard "queryable AST repository" and the papers showcase
different approaches. For example,

Tiago L. Alves, Peter Rademaker, and Jurriaan Hage,
Comparative Study of Code Query Technologies,
Draft December 2009,
(For the full queries used in the paper please contact the authors).
http://wiki.di.uminho.pt/twiki/pub/Personal/Tiago/Publications/Alves09b-...

Miguel
http://www.sts.tu-harburg.de/people/mi.garcia

Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: was Foreach[+A]: [OT] a passion of mine

http://pmd.sourceforge.net/cpd.html ?

Personally I find it does the duplicate trick.  Might be a good starting point for further tooling.

On Mar 26, 2010 4:33 PM, "David Pollak" <feeder.of.the.bears@gmail.com> wrote:



On Fri, Mar 26, 2010 at 7:21 AM, Paul Phillips <paulp@improving.org> wrote:
Leading with a quote from the man himself[1]:

"Two aspects of software systems tend to accelerate bit rot: lack of
explicit design and code duplication."

Both from a development perspective and a legal perspective, I've always wanted a tool that would analyze parse trees to find code distances.  It would be very interesting to have a tool that finds the code duplication in a code base and (given the awesomeness of Git) an analysis of how the code duplication evolved in the code base.

Is there any research on this kind of thing?  Would it make an interesting GSoC project?
 

Since truer words were never spoken, last night I decided to follow my
inclinations and find out if there was some obvious reason we couldn't
do what I wanted to the collections.  And one mostly red diff later I
never found the reason.  We create this interface:

trait Foreach[+A] {
 def foreach[U](f: A => U): Unit
 // there are a few more convenient methods here but only foreach is mandatory
}

Both Iterator and Traversable implement it.  Now we can perform this
transformation many times:

-  override def ++[B >: A, That](that: Traversable[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
-  override def ++[B >: A, That](that: Iterator[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That
+  override def ++[B >: A, That](xs: Foreach[B])(implicit bf: CanBuildFrom[Traversable[A], B, That]): That

And that's not all! (NOW how much would you pay?) Here is the signature
of scala.util.Randomm.shuffle:

 def shuffle[T, CC[X] <: Traversable[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

I can now change this to:

 def shuffle[T, CC[X] <: Foreach[X]](coll: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]

Don't even have to touch the implementation of the method[2], and now
this works:

scala> scala.util.Random.shuffle(1 to 5 iterator)
res0: Iterator[Int] = non-empty iterator

scala> res0 foreach println
4
5
1
2
3

So to me this seems like such a slam dunk that I'm already going into a
crouch to fend off the blow I don't see coming.  Why shouldn't we do
this? The whole branch can be seen here:

 http://github.com/paulp/scala/tree/foreach

There are probably more spots I can apply this, I was just going enough
for proof of concept.  The diff is here:

 http://github.com/paulp/scala/commit/072934073c0675d02111cb3f1f61e5be2368ceee

All tests pass, this thing is ready to fire.

[1] http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf
[2] ...but I did have to give Iterator a CanBuildFrom.

--
Paul Phillips      | A Sunday school is a prison in which children do
Vivid              | penance for the evil conscience of their parents.
Empiricist         |     -- H. L. Mencken
pull his pi pal!   |----------* http://www.improving.org/paulp/ *----------



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland