Entity Framework Gotchas – Strategies for Orphaned Child Objects

Entity Framework provides a powerful framework for expressing relations between objects mapped to the database, but it is not without its shortcomings. One particular theme does keep coming around – removing child objects from parents.

All code examples are available on GitHub

Let’s take a pretty basic set of relationships:

    public class Parent
    {
        public virtual int Id { get; set; }

        private ICollection<Child> _children; 
        public virtual ICollection<Child> Children
        {
            get { return _children ?? (_children = new Collection<Child>()); }
            set { _children = value; }
        } 
    }

    public class Child
    {
        public virtual int Id { get; set; }

        public virtual int ParentId { get; set; }
        public virtual Parent Parent { get; set; }
    }

    public class Context : DbContext
    {
        public DbSet<Parent> Parents { get; set; }
        public DbSet<Child> Children { get; set; }
    }

The above code defines two classes, Parent and Child, with Parent containing a collection of Child objects called Children. New instances of Child can be created and added to the Children collection of Parent:

        _context = new Context();

        // Setting up the scenario.
        var parent = new Parent();
        var child = new Child();
        parent.Children.Add(child);

        _context.Parents.Add(parent);
        _context.SaveChanges();

But, trying to remove the child from the parent causes an exception (I note the exception in EF5 is significantly friendlier than the exception in EF4):

        // Causing the exception
        var parent = _context.Parents.First();
        var child = parent.Children.First();

        parent.Children.Remove(child);

        _context.SaveChanges();

When the above is run, the child is “orphaned” from the parent. That is, it is removed from its parent’s collection, the Parent is set to null, but the object itself is not marked for deletion. On executing SaveChanges, the null Parent reference can’t be saved (since the foreign key ParentId is not null) and an exception is thrown. The scenario makes perfect sense, but is very aggravating.

So how do we get the above lump of code to work as expected?

Solution 1 – Explicitly Remove The Child

The naive implementation of this problem is to explicitly remove the child object from the context, marking it for deletion.

        var parent = _context.Parents.First();
        var child = parent.Children.First();

        parent.Children.Remove(child);
        _context.Children.Remove(child);

        _context.SaveChanges();

However, the call to remove the child from the context needs to be done on every instance the child is removed from the Parent’s collection. This is tedious at best, but asking for a single call to be missed somewhere in your code. It also assumes that at the point of removing the child object, your code has direct access to the context. In these situations, you may see this as a reasonable overhead. However, if the removal occurs in one of your objects, it’s unreasonable to ask it to have access to the context. In this case, we use a tracking list to return a collection of objects to be deleted as a side-effect of the method. It’s still the programmers responsibility to remove these items from the context, but it keeps the POCO relatively vanilla (free from context).

    partial class Parent
    {
        public bool RemoveChild(Child child, out IEnumerable<Child> toRemove)
        {
            var removeTracker = new List<Child>();

            Children.Remove(child);
            removeTracker.Add(child);

            // Side-effects occuring as of business logic.

            toRemove = removeTracker;
            return true;
        }
    }

This still makes explicit removal the responsibility of the programmer, but does keep the object clean. Keeping the tracking list as an output parameter also helps to avoid cluttering the meaning of the return type. The downside to this approach is it needs to be used everywhere a Child object is removed – it’s a far from perfect solution.

Solution 2 – Identifying Relationships

An Identifying relationship is a relationship which is part of the primary key, making it identifying. Currently, our ParentId is non-identifying. That is, whilst our object requires a ParentId to exist, there is nothing identifying that the ParentId is part of the identification of the object. Entity Framework provides the attribute [Key] for marking properties as part of the Key, and has a built in mechanism for handling the removal of child objects in identifying relationships.

    public class Child
    {
        [Key, Column(Order = 1)]
        public virtual int Id { get; set; }

        [Key, Column(Order = 2)]
        public virtual int ParentId { get; set; }
        public virtual Parent Parent { get; set; }
    }

The Key attribute marks both Id and ParentId as part of the primary key. Column order needs to be determined for composite keys, hence the Column attribute. With the change in key, our original code now works fine with no additional work, works in every instance where a child object is removed, and incurs no additional overhead. When the child object is removed from the parent, EF realises that the object can no longer exist, and marks it for deletion.

If you’ve got control over your database schema, this (to me) looks like the cleanest option to you. It’s simple, and once you’ve handled the change, there’s no need to mess about with any additional hacks.

Solution 3 – SaveChanges

Our final solution is a variant of our first solution. In our first solution, we kept a list of the objects we intended to remove, and then removed them later. We can defer this process even further, by looking for orphaned objects and removing them when we call SaveChanges.

    partial class Context
    {
        public override int SaveChanges()
        {
            var toRemove = Children.Local.Where(x => x.Parent == null).ToList();
            foreach (var child in toRemove)
                Children.Remove(child);

            return base.SaveChanges();
        }
    }

In the above code, on calling SaveChanges, we look through all local Child objects looking for orphans, and then remove these from the context. On calling the base SaveChanges context, our save is successful since all orphans have been handled.

This solution works, but is not without its problems. Primarily, this needs to be done for each entity which is a child of another entity, and I’ve not (as yet) found a nice way to do this using generics. It does work well where you’re working with an existing database schema, or composite keys are not an option. There is also a performance penalty to pay in executing the query to check for null objects, you are searching every object in the local collection. If there are 200,000 objects in the local Children collection, this may take a second to execute, and that may, or may not, be acceptable to you.

  • http://twitter.com/julielerman Julie Lerman

    This one definitely confuses people a lot. Removing something from a collection is totally different than calling REMOVE on a DbSet. The collection has nothing to do with EF. It’s just a .NET collection and has no clue about the mysteries of EF’s relationship management. I wish they had left the DbSet method name “DELETE” like it was for ObjectSet. That way there is an obvious difference between “DbSet.DELETE” and “Collection.REMOVE”. FWIW, if you can call context.Children.Remove(child), then entity framework will also fix the relationship problem and delete at the same time. If you are using aggregate root, however, you may only have context.Parents and not even have a context.Children property and would then have to go even deeper to context.Entry(child).State=EntityState.Deleted. So the real problem is that REMOVE is on the .NET collection. Now I’m wondering if there is a way to make the savechanges smarter… like checking for null but required FK properties and deleting the entities before EF sees the p roblem and throws an exception?

  • kianryan

    Well, using reflection it should be possible to check which attributes are marked as required, and call Delete on them.

    However, I see this as being a little bit nasty, and potentially quite expensive on large data sets.

    MS have identified their preffered stragey (Identifying Relationships) and it does make some sense, but only really works if you have control over the schema.

  • Lambolez Paul

    This doesn’t work as the field is marqued as required when you use [Key, Column(Order = 2)] …

  • Pingback: EF and DeleteOrphan | tapez@555

  • Raffaele Garofalo

    This is such a stupid behaviour that breaks all the DDD rules. First we have to options here: 1) make the Domain Entity aware of the ORM (so bad) 2) Encapsulate the BL inside a third component that removes the item from the collection and then mark the child object as deleted. With other ORM like NHibernate we don’t need to track this, the proxy is smart enough to understand that a child item has been removed from the collection and mark it as deleted. With EF we must

    Person.Child.Remove(child) context.Entry(child).State = EntityState.Deleted

    what does it mean? It means that as always happens, MS implemented an ORM which is not fully domain agnostic …

  • kianryan

    It’s not great, I admit. I have also now got another solution, that involves making a change to the context. This way when you mark the entity as deleted, you stop caring until SaveChanges kicks in and it checks for the orphaned object itself and cascades the delete state. I was pretty sure I had written it up, but apparently not…