New Fun Schema

From GCD
Jump to navigation Jump to search

This page documents the proposed database schema for the "New Fun" release, i.e. the first release of the new web site that will go into production. It is not set in stone, but is nearing its final form. At some point, we will declare a soft schema freeze, after which only minor changes (such as adding a column that doesn't affect table relationships or major display options) will be allowed.

A few notes on database vs code names:

Table names are lower case with underscores. Model class names are capitalized with no spaces or underscores. It should be obvious which model fits which table.

All foreign keys end in _id in the database, but do not have that suffix in the code. So wherever you see something like issue_id in this table, expect to see issue in the code. This is because in the code you get an object, not just an ID.

Tables are prefixed with an application name, which does not appear in the model class name. The main app name is currently usually core, which is arguably a lousy name. I'm contemplating changing it to data but I'm not sure about that. It can't be gcd because that is the name of the overall project. Currently, the only other application is migration, which is for tables that exist only to assist in migration of data from old to new forms.

Publisher (core_publisher)

The Publisher table is unusual in that it stores several distinct types of entities, all of which fall under the general umbrella of "Publisher". Several of them may play multiple roles, which is the primary reason for keeping them in one table. Each type of publishing entity is represented by a proxy model in the Django code. Except as noted, publishers are connected to each other via the PublisherRelationship table described further down. Before going through the fields, here is the list of proxy models and their definitions:

MasterPublisher

This is the legacy notion of a top-level publisher, linked at the series. Defined by setting is_master = 1. The nature and future role of master publishers is very much a matter of debate.

Imprint

This is the legacy notion of an imprint, effectively a hodge-podge of anything and everything that did not fit as a MasterPublisher. These are linked at the series level, and will be retained only until they can be migrated to the other fields, after which they will be dropped from the application. Something is an imprint if its parent field is not NULL (relationships between new publisher entries do not use the parent field).

PublishingCompany

An actual corporate entity of some sort involved in publication. These are linked at the issue level and defined as having is_company = 1.

Brand

A marketing brand involving a particular logo or set of closely related logos. These are linked at the issue level and defined as having is_brand = 1.

The Publisher Table

core_publisher
Column Type Description Usage Notes
Core publisher data fields:
id int(11) auto_increment DB-generated primary key.
name varchar(255) NOT NULL Name of the publishing entity. Different for each type. Master publishers use names chosen for easy recognition. Brand names are based on the logos that denote them. Publishing company names should be exactly as found in the indicia that reference them (minor variations, such as "Co." vs "Company" may be harmonized). Imprints have arbitrarily complicated garbage for names that mash together all of the above. Don't even try to figure it out.
country_id foreign key to core_country The country from which the publisher operates.
year_began int(11) Year (by cover date) of first publication.
year_ended int(11) Year (by cover date) of last publication, if any.
url varchar(255) The URL for the publisher's website. Should be a complete URL, including the protocol (i.e. "http://")
notes longtext Space for arbitrary notes.
Type flags. A row may have more than one set to true.
is_master tinyint(1) Defines which rows appear in the MasterPublisher proxy model.
is_company tinyint(1) Defines which rows appear in the PublishingCompany proxy model.
is_brand tinyint(1) Defines which rows appear in the Brand proxy model.
Change tracking fields:
created date Row creation date.
modified date Last modified date.


Series (core_series)

core_series_name
Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
name varchar(255) NOT NULL A name used for a series. The name may come from the cover, the indicia, common usage or any other source.
core_series
Column Type Description Usage Notes
Core series data fields:
id int(11) auto_increment DB-generated primary key.
language_id foreign key to core_country The language in which the series is written. Currently there is no facility to record multi-lingual comics.
format varchar(255) legacy format field This field will be retained only until its contents have been migrated to the issue and sequence-level format fields. It's contents are a mess of at least five different sorts of data (including size, binding, paper stock, coloration and frequency of publication).
year_began int(11) Year (by cover date) of first publication.
year_ended int(11) Year (by cover date) of last publication, if any.
notes longtext Space for arbitrary notes.
tracking_notes mediumtext Notes about how a given series is continued in or continued from another series. Largely replaced by the core_series_tracking table, after most data is migrated this field will only be used for tracking notes where one end of the relationship is not and cannot be in the database (tracking from a newspaper or web comic, for instance).
has_gallery tinyint(1) True if there's a cover gallery to display for this series.
open_reserve int(11) Integer field. How many issues of this series are open for reservation.
Change tracking fields:
created date Row creation date.
modified date Last modified date.
modification_time time Last modified time.

Issues

Many columns one might expect to find in the issue table are in the table mapping issues to series.

core_issue
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
publication_date varchar(255) Publication date of the issue as given in the indicia, but fully spelled out. Published Date
price varchar(255) Legacy price field. A decimal price followed by the ISO code, i.e. 0.10 USD See Cover Price, but superseded by core_issue_price
story_count int(11) Number of stories linked to this issue.
page_count varchar(10) Page count including covers but not inserts or dust jackets. Should be decimal, but needs migration work.
editing mediumtext Editor for the issue as a whole.
notes mediumtext Notes for the issue.
Change tracking fields:
created date Row creation date.
modified date Last modified date.
modification_time time Last modified time.
core_issue_series
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
volume_label_id foreign key to core_label Name of the field stored as "volume" for the given issue. It may be "volume" or some other term.
volume varchar(255) Volume number from the indicia.
descriptor_label_id foreign key to core_label Name of the field stored as "descriptor" for the given issue. It is most often "issue" but may be something else.
descriptor varchar(255) Issue number or other identifier from the indicia.

Stories

The stories table has a number of unused columns that are duplicates of columns from the series and issues tables (they're known to be unused because in some cases the data does not match, and the versions shown on the Lasso site are the ones in the issue and series tables, not these). They were initially used for caching data to avoid expensive joins in the FileMaker days. Also, sequence 0 (which represents the front cover) is treated specially, with its page count, notes, reprints and editor fields being applied to the whole issue by the current UI implementation.


stories [Story class in new site]
DB Column Name in New Code Description Usage Notes
Core story data fields:
ID id DB-generated primary key. Since there is no story page, this field does not show up in any URLs in the Lasso implementation.
Seq_No sequence_number Order of the story within the issue. By convention, sequence 0 is the front cover, and some of its fields are considered to apply to the whole issue. Sequence Number
Feature feature The name of the feature, if any. Often but not always the name of the primary character. Some stories do not have a feature.
Pg_Cnt page_count Number of pages. Pages
Notes notes
Reprints reprints Reprint information (both reprint from and reprinted in).
Type type Type of "story", or more accurately sequence. Actual comic story vs text story vs ad vs letters page, etc.
Fields for creator credits:
Script script
Pencils pencils
Inks inks
Colors colors
Letters letters
Editing editor Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book. However, this field can be used if individual stories have separate editors.
JobNo job_number Job Number
Fields related to content:
Title title
Char_App characters
Synopsis synopsis
Genre genre
Field related to flat file management / distribution. See series/issues:
InitDist initial_distribution Not Used
Fields relating to or copied from the issues table:
IssueID issue Foreign key into the issues table.
Issue issue_number Not Used Earlier indexes may have data in this field, but it is ignored.
Pub_Date publication_date Not Used Earlier indexes may have data in this field, but it is ignored.
Key_Date key_date Not Used Earlier indexes may have data in this field, but it is ignored.
Price price Not Used
rel_year rel_year Not Used
Fields relating to or copied from the series table:
SeriesID series Foreign key into the series table, duplicated from the issues table presumably to allow for direct stories <-> series joins. Actually, it was used to help troubleshoot and QA the initial building of a relational database way back. Also use for statistics.
Yr_Began year_began Not Used Earlier indexes may have data in this field, but it is ignored.
Pub_Name publisher_name Not Used Earlier indexes may have data in this field, but it is ignored.
Change tracking fields:
created created Row creation timestamp.
Modified modified Date of last row change.
ModTime modification_time Time of last row change (24 hour).
LstChang last_change Date of some sort, but apparently unused? Column has numerous values, but they do not line up with the values in the Modified column. Data got butchered through various ins and outs of flat files and between systems changing the date format.