Ever catch yourself staring at your editor? Then switching over and staring at the content types overview of your new Drupal site? Then back to the client requirements? And then doing it all over again and again while you face a decision? Drupal content types and fields vs. custom entities. That's the tough decision. The choices you make here will affect almost part of site building to come, but it's so hard to know which to choose.
On one hand, Drupal fields are getting better, faster and more flexible all the time with better integration with contributed modules like views. But at the same time the new entity API with Drupal 7 is more flexible than ever, allowing you to add fields to even your own custom entities. So which route should you go? How can you make a decision like this without second guessing yourself all the time?
Why getting it right is important
While lots of projects can go either way the downsides can make the difference between a successful web project and a maintenance or performance nightmare. Using content types and fields with a complex data model that requires lots of complex queries can hurt performance, but using custom entities when you don't need them can add a maintenance burden when it comes time to upgrade your site or make changes. So how do you make that decision?
What factors matter
I've been down that road many times, trying to make a decision, and along the way I've come up with a list of questions and criteria that I use to help shape the decision on new projects. This post shares the rough workflow that I use when trying to decide which path to take.
While none of these are hard and fast criteria either way, these criteria represent that best indicators that I've found for success with different approaches.
1. Is that data you're representing public or private?
Many times I've come to a site that is using a node type to store some sort of admin or editor only private information. This wouldn't be a huge issues, but then you run into the issue of making sure that the data is truly hidden. Does it still show in search results? Does it leak in RSS feeds?
Are you using a node access module just to restrict access to this data? Often times node access can add a significant overhead to all database queries for content on the site, just to restrict access to a few pieces of information that should always be hidden from non-administrators. Further, node access modules also add complexity to your configuration that make it easy to accidentally mis configure a checkbox and hide content that you didn't intend to.
While data privacy is a strong indicator for using custom entities, it's important to note that the opposite isn't necessarily true. Just because you're modeling data that should always be publicly visible doesn't mean you should automatically use content types and fields.
2. How complex are the relationships between the data? (and how do you need to query them?)
Data by itself is easy to represent in almost any system – it's the relationships that add complications. Whether it's a simple parent/child style relationship or something more complex makes a big difference when choosing between content types with fields and custom entities.
One of the classic examples of relationships is the record company, with artists that have many albums that in turn have many songs. This type of relationship is very easy to model with content types and fields. For the song content type, you could add an entity reference field to point at albums, and on albums you could use an entity reference to point to artists. This makes it easy to show the artist on each album or even track, and list all the albums when viewing an artist.
On the other hand when you move to a less hierarchical relationship, such as many-to-many things get more complicated. Take the use case of a directory of colleges and universities. Each university might have many degrees that they offer, and they might have many locations where they offer them. Representing the relationship between locations and degrees is a complex many-to-many relationship. The tough part of the decision here if using content types and fields is trying to decide which side of the relationship gets the field. I've seen sites that try to work around this by maintaining dual entity reference fields in both directions and using insert/update hooks to keep the data in sync, but this can be unreliably and duplicates your relationship in two places. By using custom entities for such a situation, it's possible to use a join table that models the relationship between the two entities.
One of the other scenarios that also lends to using a join table is when the relationship between two entities is more complicated than simple, 'belongs to' or 'has one.' Imagine a relationship between an organization and a list of staff, where the staff have different roles. This is easy, until you get to the condition that staff can have different roles in different organization. Suddenly the role is tied directly to the relationship between the two entities. This is another situation where a join table allows you to specify not only the organization ID and staff ID, but also the role of that relationship.
Worried about Views support? This type of relationship is easy to model with Views through relationships!
3. How many fields are you trying to represent?
If the other criteria are a little vague, this one is even worse. The number of fields on a content type doesn't really have any limitations, and Drupal has no limit on maximum number of fields.
That being said, there may be times that you want to think about moving to a custom entity if you have a large number of fields. This is a little more arbitrary, but the things that I look for are when I have large numbers of simple fields, like integers or booleans that need to be represented. This is especially true if you aren't using revisions or translations, using custom entities will really simplify the load on your database server and it's storage requirements (but keep in mind that storage is very cheap these days, and paying for more storage is almost always cheaper than the time spent trying to optimize for storage).
My rough rule of thumb for this type of thing is that if I'm looking at 80-100 or more fields, I'm likely to look into custom entities, if no translations or revisions are required. If translations or revisions are important, it may be a factor, but I'm more inclined to raise the cutoff where I think the benefits from custom entities outweigh the downsides.
4. What other features do you need?
Translation? Different backend storage? Revisions? Complex access rules? All of these things can make a difference too. If you don't want to code around revisions or translations, then you may want to stick to content types if you need those features. You man find that Drupal's node access system and modules do a better job of access control than anything you have the time to code up.
What doesn't really matter
A lot of times you may feel like you need the specific feature provided by a field, such as Address (with it's multi-country support) or Geofield, etc. This doesn't mean you can't use custom entities if that's what would otherwise make sense for your project. I've actually used Address and Geofield on custom entities before since the relationship between the entities was easier to model with join tables, but it was easier to store the complex structured data in a field. This hybrid approach really shows off the power of Drupal.
Want to learn more
Not knowing how to build custom entities shouldn't stop you from being able to use such a powerful feature on your Drupal site, which is why I am writing Model Your Data with Drupal, which is on sale now at 35% off at http://modelyourdatawithdrupal.com. That special offer is extended through next week, when the price jumps to just 25% off, so don't wait too long to grab your pre-release copy of Model Your Data with Drupal.