2012-08-29

GROUP BY with MongoDB (and Scala and Morphia)

This is an example of how to run a query similar to GROUP BY from the SQL world, but in MongoDB. MongoDB doesn't have GROUP BY, but it has a very similar group command.

In my example I want to return the number of songs per artist, therefore I'm grouping by artist name.

db.MongoDbSong.group(
    {key: {artistName: true},
    initial: {totalSongs: 0},
    reduce: function(obj, prev) { prev.totalSongs++;}
})

This example is from my MongoDB + Scala + Gradle + Morphia app. It turns out that Morphia does not help with that at the moment, so we have to run the query raw and then process the results.

class SongPerArtistMining @Inject() (private val persistenceProvider: MorphiaMongoDbPersistenceProvider) {

 private val query = "db.MongoDbSong.group("+
    "{key: {artistName: true},"+
    "initial: {totalSongs: 0},"+
    "reduce: function(obj, prev) { prev.totalSongs++;}"+
    "})";
 
 private val formatter = NumberFormat.getInstance(Locale.ENGLISH)


 def mine(maxResults: Int): Option[List[(String, Int)]] = {
   val mongo = persistenceProvider.getMongo
   val result = mongo.getDB("audiolicious").eval(query)
   result match {
     case list: BasicDBList => return Some(processAndRetrieveResults(list, maxResults))
     case _ => println("Error while obtaining stats. Result of unknown type [%s].".format(result))
         return None;
   }
 }

Essentially, this does it:

val result = mongo.getDB("audiolicious").eval(query)

Processing results

And here's how to process results.

private def processAndRetrieveResults(list: BasicDBList, maxResults: Int): List[(String, Int)] = {
   return processResults(list).sortWith(compareSecondValueInteger).slice(0, maxResults)
 }

 def compareSecondValueInteger(e1: (String, Int), e2: (String, Int)) = e1._2 > e2._2
 
 private def processResults(list: BasicDBList): List[(String, Int)] = {
   var results : List[(String, Int)] = List()
   for (i <- 0 until list.size()) {
     val item = list.get(i)
     item match {
       case dbObject: BasicDBObject => results ::= processResult(dbObject)
       case _ => println("Error while obtaining stats. Result item of unknown type [%s].".format(item.getClass()))
     }
   }
   return results
 }
 
 private def processResult(dbObject: BasicDBObject): (String, Int) = {
   val artistName = dbObject.get("artistName").toString()
   val totalSongs = parseDouble(dbObject.get("totalSongs").toString(), formatter).intValue()
   return (artistName, totalSongs)
 }

 // http://stackoverflow.com/a/9542323
 def parseDouble(s: String, nf: NumberFormat) = {
   val pp = new ParsePosition(0)
   val d = nf.parse(s, pp)
   if (pp.getErrorIndex == -1) d.doubleValue else 0
 }

1 comment:

  1. Using GROUP BY in MongoDB with Scala and Morphia provides a powerful way to aggregate data efficiently. The integration of Morphia, a Java-based ORM for MongoDB, allows seamless mapping of Scala objects to database collections. It simplifies aggregation queries, making it easy to perform complex group operations and analysis.
    divorce process new york

    ReplyDelete