Many-to-many relationships with PetaPoco

February 11th, 2012 / Comments

An example of a many-to-many relationship in a database are blog post tags. A blog post can have many tags and a tag can have many blog posts. So how do you do this in PetaPoco? I’ve added tag support to my PetaPoco A Simple Web App project on GitHub so I’ll explain what I did.

To start with you need to create a link table in your database which will stitch the article and tag tables together. So I created the articleTag table to contain lists of paired article ids and tag ids. This is all the info we need to persist this relationship in the database.

So if we have an article with 5 tags you’ll end up with 5 records in articleTag like this :

Creating these records is a one-liner if you have the article and tag ids :

	_database.Execute(
	"insert into articleTag values(@0, @1)",
		article.Id, tagId);
	

There is a bit more complexity when it comes to reading article pocos and their tags. There are two ways of doing it :

The naughty+1 way

You could do this :

  1. Load the articles without the tags.
  2. Loop through each article and retrieve its tags
  3. Assign the list of tags to each article in the loop.

This is the N+1 problem and it doesn’t scale well. The more articles you’re loading the more database round trips you’ll make and the slower your app will run. Nobody should recommend this approach, but let’s see it anyway :

	public Article RetrieveById( int articleId)
{
    var article = _database.Query(
        "select * from article " +
        "join author on author.id = article.author_id " +
        "where article.id=@0 " +
        "order by article.date desc", articleId)
            .Single
<article>();
 
 var tags = _database.Fetch(
 "select * from tag " +
 "join articleTag on articleTag.tagId = tag.id " +
 "and articleTag.articleId=@0", articleId);
 
 if( tags != null) article.Tags = tags;
 
 return article;
}
	

This example is just for one article. Imagine it if we were pulling back 50 articles. That would be 51 database round trips when all we really need is 1.

The right way

	public Article RetrieveById( int articleId)
{
    return _database.Fetch(
        new ArticleRelator().Map,
        "select * from article " +
        "join author on author.id = article.author_id " +
        "left outer join articleTag on " +
                "articleTag.articleId = article.id " +
        "left outer join tag on tag.id=articleTag.tagId " +
        "where article.id=@0 ", articleId).Single();
}
	

Here I am pulling back a dataset that includes the article record, the author record and the tag records. You can tell this is a many-to-many relationship by the double joining first on the articleTag then on tag itself. The results that come back to PetaPoco look like this :

As you can see there is a fair bit of duplication here and this is a trade off you will want to think carefully about. The trade off is between the number of database round trips (number of queries) and result set efficiency (network traffic from the sql server to the web server (or service layer server)). It is best to have as few database round trips as possible. But on the other hand it is better to have lean result sets too. I’m sticking with the right way.

If I was writing a real blogging app I would think long and hard about a single joined query like this because the body content, which could be thousands of bytes, would be returned as many times as there are tags against the blog post. I would almost certainly use a stored procedure to return multiple result sets so there is only one database round trip. However typical non-blogging datasets won’t contain such unlimited text data eg; orders and order lines. So there’s no problem.

Document databases suit this type of arrangement. The tags would be embedded in the article document and would still be indexable.

Stitching the pocos together

See that new ArticleRelator().Map line above? PetaPoco can utilise a relator helper function for multi-poco queries so that each poco is correctly assigned in the data hierarchy. Having the function wrapped in a class instance means it can remember the previous poco in the results.

If you’re using multi-poco queries I urge you to read the PetaPoco documentation on the subject and experiment for an hour or two. All the relator class does is take the pocos coming in from each row in the resultset and stitch them together. It allows me to add each tag to the article as well as assign the author to the article.

Turn around sir

And what about from the other angle? Where we have a tag and we want to know the articles using the tag? This is the essence of many-to-many, there are two contexts.

	public Tag RetrieveByTag( string tagName)
{
    return _database.Fetch(
        new TagRelator().Map,
        "select * from tag " +
        "left outer join articleTag on articleTag.tagId = tag.id " +
        "left outer join article on " +
        "article.id = articleTag.articleId " +
        "where tag.tagName=@0 order by tag.tagName asc", tagName)
        .SingleOrDefault();
}
	

This is an identical process as with retrieving articles but the tables are reversed. One tag has many articles in this context.

More responsibilities

Having many-to-many relationships does add more responsibilities to your app. For example when deleting an author it’s no longer sufficient to just delete the author’s articles followed by the author record. I have to remove the author’s articles’ tags from the articleTag table too otherwise they become data orphans pointing to articles that no longer exist. Add because we’re performing multiple database calls that are all required to succeed (or not at all), we need a transaction. Like this :

	public bool Delete( int authorId)
{
    using( var scope = _database.GetTransaction())
    {
        _database.Execute(
            "delete from articleTag where articleTag.articleId in " +
            "(select id from article where article.author_id=@0)",
            authorId);
        _database.Execute( "delete from article where author_id=@0", authorId);
        _database.Execute( "delete from author where id=@0", authorId);
 
        scope.Complete();
    }
 
    return true;
}
	

Adding many-to-many support to PetaPoco – A Simple Web App was fairly painless and should fill a small hole in the internet. I’ve had a few people ask about it and it seemed the natural next step for the project.

Am I doing many-to-many wrongly? Would you do it differently? Let me know so I can learn from you.

ASP.NET MVC , Data , Petapoco