SQLAlchemy merge and relations

In which an oddity in SQLAlchemy is spotted, and it turns out to be a bug not a misunderstanding.

Background

The merge function in SQLAlchemy lets an object seem to get pushed to the database, but actually stores and returns a copy of the object. This is handy for when you are mixing persistent and non-persistent objects. Unfortunately, as of version 4.2, there's a weird bug that can result if you use related objects. This page serves to document this bug, in case anyone else runs across it.

Symptoms

Merged objects with other object joined by a relation using backrefs, show those 'child' objects twice. A condensed demonstration - the names of the mapped objects and tables are unimportant, but a Biosequence has a one-to-many with a set of Annotations, established by a backref:

from sqlalchemy import *
from sqlalchemy.orm import *

metadata = MetaData()

class Biosequence (object):
   pass

class Annotation (object):
   pass

table_biosequences = Table ('biosequences',
   metadata,
   Column ('identifier', Integer(), primary_key=True),
   Column ('dummy', String(16)),
)

table_seqannotations = Table ('seqannotations',
   metadata,
   Column ('identifier', Integer(), primary_key=True),
   Column ('biosequence_id', Integer(), ForeignKey ('biosequences.identifier')),
)

mapper (Annotation, table_seqannotations)

mapper (Biosequence, table_biosequences,
   properties={
      'annotations':relation ( Annotation,
         backref='biosequence_ref',
         lazy=False,
         cascade="all, delete, delete-orphan",
      ),
   },
)

engine = create_engine (
   'sqlite:///',
   convert_unicode=True,
)
sessionfactory = sessionmaker (
   bind=engine,
   autoflush=False,
   transactional=False
)
session = sessionfactory()
metadata.create_all (bind=engine, checkfirst=True)

# make an object with 3 'children'
bseq = Biosequence()
anns = [Annotation() for x in range (3)]
bseq.annotations = anns

print "Original num of anns:", len (bseq.annotations)

# the merged copy has 6 children
merge_bseq = session.merge (bseq)
session.flush()
print "Now num of anns still:", len (bseq.annotations)
print "Ids:", [x.identifier for x in bseq.annotations]
print "Merged copy num of anns:", len (merge_bseq.annotations)
print "Ids:", [x.identifier for x in merge_bseq.annotations]

# as does the return!
results = session.query (Biosequence)
print "Number of results", results.count()
returned_bseq = results.one()
print "Returned num of anns:", len (returned_bseq.annotations)
print "Ids:", [x.identifier for x in returned_bseq.annotations]

# make an new object with 3 children
bseq2 = Biosequence()
bseq2.annotations = [Annotation() for x in range (3)]

print "New obj num of anns:", len (bseq.annotations)
session.save_or_update (bseq2)
session.flush()
# and it works as expected
results = session.query (Biosequence)
print "Number of retreived objects", results.count()
returned_bseqs = results.all()
for item in returned_bseqs:
   print "Returned num of anns:", len (item.annotations)
   print "Ids:", [x.identifier for x in item.annotations]

Which returns:

Original num of anns: 3
Now num of anns still: 3
Ids: [None, None, None]
Merged copy num of anns: 6
Ids: [1, 1, 2, 2, 3, 3]
Number of results 1
Returned num of anns: 6
Ids: [1, 1, 2, 2, 3, 3]
New obj num of anns: 3
Number of retreived objects 2
Returned num of anns: 6
Ids: [1, 1, 2, 2, 3, 3]
Returned num of anns: 3
Ids: [4, 5, 6]

In summary: if an object is stored using save_or_update, when later retrieved, it has the correct number of related children. If merged, the returned merge copy has each child twice. If this object is retrieved later, it still has every child twice. Looking at the db, the correct number of child rows is being stored and the relations are correct.

To illustrate, this is the original Biosequence's annotations:

  • Annotation, id 1
  • Annotation, id 2
  • Annotation, id 3

This is what the annotations of the merged copy and later retrieved merges look like:

  • Annotation, id 1
  • Annotation, id 1
  • Annotation, id 2
  • Annotation, id 2
  • Annotation, id 3
  • Annotation, id 3

Technical details

Python 2.5, SqlAlchemy 4.2-ish, Mac OSX.

Result

Thanks to the super-efficient Michael Bayer, this bug will be fixed in 4.3 and apparently can be avoided by just not using the backref.