New Fun Schema: Difference between revisions

From GCD
Jump to navigation Jump to search
Line 78: Line 78:
|-
|-
| is_master
| is_master
| tinyint(1)
| boolean
| Defines which rows appear in the MasterPublisher proxy model.
| Defines which rows appear in the MasterPublisher proxy model.
|  
|  
|-
|-
| is_company
| is_company
| tinyint(1)
| boolean
| Defines which rows appear in the PublishingCompany proxy model.
| Defines which rows appear in the PublishingCompany proxy model.
|
|
|-
|-
| is_brand
| is_brand
| tinyint(1)
| boolean
| Defines which rows appear in the Brand proxy model.
| Defines which rows appear in the Brand proxy model.
|
|
Line 105: Line 105:
|-
|-
|}
|}


==Series==
==Series==

Revision as of 06:51, 21 July 2009

This page documents the proposed database schema for the "New Fun" release, i.e. the first release of the new web site that will go into production. It is not set in stone, but is nearing its final form. At some point, we will declare a soft schema freeze, after which only minor changes (such as adding a column that doesn't affect table relationships or major display options) will be allowed.

A few notes on database vs code names:

Table names are lower case with underscores. Model class names are capitalized with no spaces or underscores. It should be obvious which model fits which table.

All foreign keys end in _id in the database, but do not have that suffix in the code. So wherever you see something like issue_id in this table, expect to see issue in the code. This is because in the code you get an object, not just an ID.

Tables are prefixed with an application name, which does not appear in the model class name. The main app name is currently usually core, which is arguably a lousy name. I'm contemplating changing it to data but I'm not sure about that. It can't be gcd because that is the name of the overall project. Currently, the only other application is migration, which is for tables that exist only to assist in migration of data from old to new forms.

Publishers

The Publisher table is unusual in that it stores several distinct types of entities, all of which fall under the general umbrella of "Publisher". Several of them may play multiple roles, which is the primary reason for keeping them in one table. Each type of publishing entity is represented by a proxy model in the Django code. Except as noted, publishers are connected to each other via the PublisherRelationship table described further down. Before going through the fields, here is the list of proxy models and their definitions:

MasterPublisher

This is the legacy notion of a top-level publisher, linked at the series. Defined by setting is_master = 1. The nature and future role of master publishers is very much a matter of debate.

Imprint

This is the legacy notion of an imprint, effectively a hodge-podge of anything and everything that did not fit as a MasterPublisher. These are linked at the series level, and will be retained only until they can be migrated to the other fields, after which they will be dropped from the application. Something is an imprint if its parent field is not NULL (relationships between new publisher entries do not use the parent field).

PublishingCompany

An actual corporate entity of some sort involved in publication. These are linked at the issue level and defined as having is_company = 1.

Brand

A marketing brand involving a particular logo or set of closely related logos. These are linked at the issue level and defined as having is_brand = 1.

The Publisher Table

core_publisher
Column Type Description Usage Notes
Core publisher data fields:
id int(11) auto_increment DB-generated primary key.
name varchar(255) NOT NULL Name of the publishing entity. Different for each type. Master publishers use names chosen for easy recognition. Brand names are based on the logos that denote them. Publishing company names should be exactly as found in the indicia that reference them (minor variations, such as "Co." vs "Company" may be harmonized). Imprints have arbitrarily complicated garbage for names that mash together all of the above. Don't even try to figure it out.
country_id foreign key to core_country The country from which the publisher operates.
year_began int(11) Year (by cover date) of first publication.
year_ended int(11) Year (by cover date) of last publication, if any.
url varchar(255) The URL for the publisher's website. Should be a complete URL, including the protocol (i.e. "http://")
notes longtext Space for arbitrary notes.
Type flags. A row may have more than one set to true.
is_master boolean Defines which rows appear in the MasterPublisher proxy model.
is_company boolean Defines which rows appear in the PublishingCompany proxy model.
is_brand boolean Defines which rows appear in the Brand proxy model.
Change tracking fields:
created date Row creation date.
modified date Last modified date.

Series

Series is fairly straightforward, except that series names are held in a separate table to allow multiple names to be assigned to a given series. See the code for details.

core_series_name
Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
name varchar(255) NOT NULL A name used for a series. The name may come from the cover, the indicia, common usage or any other source.
core_series
Column Type Description Usage Notes
Core series data fields:
id int(11) auto_increment DB-generated primary key.
language_id foreign key to core_country The language in which the series is written. Currently there is no facility to record multi-lingual comics.
format varchar(255) legacy format field This field will be retained only until its contents have been migrated to the issue and sequence-level format fields. It's contents are a mess of at least five different sorts of data (including size, binding, paper stock, coloration and frequency of publication).
year_began int(11) Year (by cover date) of first publication.
year_ended int(11) Year (by cover date) of last publication, if any.
notes longtext Space for arbitrary notes.
tracking_notes mediumtext Notes about how a given series is continued in or continued from another series. Largely replaced by the core_series_tracking table, after most data is migrated this field will only be used for tracking notes where one end of the relationship is not and cannot be in the database (tracking from a newspaper or web comic, for instance).
has_gallery tinyint(1) True if there's a cover gallery to display for this series.
open_reserve int(11) Integer field. How many issues of this series are open for reservation.
Change tracking fields:
created date Row creation date.
modified date Last modified date.
modification_time time Last modified time.

Items (a.k.a. Issues)

As recent discussions delved more into the problems created by trying to treat books as issues, the terminology got quite confusing. For that reason, and to better reflect the requirements of the project, I've renamed the "issues" to "items" at the database level. Of course, they will still be "issues" (or in some cases "books") in the UI. In general, this section has a lot of techy terminology that will not be at all visible in the UI, or even in the middle layers of the web application. This is intentional, with the aim to break faulty associations between objects as we conceive of them visually and the database-level storage requirements. Within the web application there will be classes such as "Issue" that make sense out of this very complex sub-area of the schema.

For instance, it's difficult to see from looking at the next few tables how to figure out something as conceptually simple as "what is the number of this issue?" However, in the typical case an indexer will just enter the issue number into a very obvious issue number field, or edit it there if they need to change something later. The point being, unless you really care about database design, don't get hung up on how much this doesn't look anything like a comic book issue anymore. Unless you squint really hard.

core_item_descriptor
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
scope enum('item', 'group') Indicates whether this descriptor applies to one item or a group of items. A single item is typically an issue or a book, or more technically, one row in the core_item table. A group is a set of items within the series (possibly the entire series, and possibly a set with only one member, but at least theoretically multiple items). Volume numbers in periodicals are the most typical example of descriptors at group scope. Indexers will never directly select or see this field as it will be handled behind the scenes. But it will, for instance, determine whether the UI should display a descriptor of type "volume" as the main number for an item (as in a trade paperback collection) or as a prefix to the main item (as in a a periodical issue).
label_id Foreign key to core_item_descriptor_label The display label for this descriptor, such as "issue" or "volume".
source_id Foreign key to core_item_descriptor_source Where the descriptor originated. Typical values include indicia, cover, title page, spine, etc. as well as "inferred", meaning the indexer made it up or got it from somewhere other than the actual item (long standing fan convention, for instance).
value varchar(255) The actual issue/volume/book/etc. number/name/title/whatever. Note that the special no-number value "[nn]" should never be used (nor should the older "nn"). An item with no number should simply not have an entry in this table. Or should only have an entry that is marked as "inferred", meaning that it was assigned by the indexer and does not appear on the item. The UI will translate this condition to "[nn]" or whatever else is appropriate.


core_item_descriptor_label
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
name varchar(255) The human-readable display name for this kind of descriptor. Note that unlike most literal data, these labels may not all be in English and therefore may have unusual translation rules. The exact strategy here is still TBD, but part of the need for this feature is dealing with different names in various countries that don't directly translate outside of that country/language.
for_item boolean If true, may be used at the item level. i.e. descriptors where scope='item'
for_group boolean If true may be used at the group level. i.e. descriptors where scope='group'


core_item_descriptor_source
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
name varchar(255) The human-readable display name for the source location.
is_inferred boolean If true, the descriptor marked with this source did not come from the item at hand, and should be displayed accordingly.


core_item
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
Basic item information
price varchar(255) Legacy price field. A decimal price followed by the ISO code, i.e. 0.10 USD See Cover Price, but superseded by core_item_price
page_count varchar(10) Page count including covers but not inserts or dust jackets. Should be decimal, but needs migration work.
editing mediumtext Editor for the item as a whole.
notes mediumtext Notes for the item.
Date fields:
publication_date varchar(255) Legacy field. Publication date of the item as given in the indicia, but fully spelled out. Published Date, but superseded by the other publication fields in this table.
publication_year int(11) Year of publication.
publication_year_inferred boolean True if the indexer guessed or got the year from somewhere other than the item.
publication_month_id foreign key to core_publication_month The month or vaguely month-like part of the publication date. May be two months, may be a season, may be a value like "Holiday"
publication_month_modifier enum('early', 'mid', 'late') Can be applied to the selected month. This cuts down substantially on the size of the drop down menu of "months".
publication_month_inferred boolean True if the indexer guessed or got the month from somewhere other than the item.
publication_day int(11) Day of the month of the publication date, if any.
publication_day_inferred int(11) True if the indexer guessed or got the day for somewhere other than the items. This is not meant to be used for monthlies or other publications that should not have a publication day, only for weeklies or similar items that simply omitted the day from the cover or publication information.
Format fields:
size_id Foreign key to core_size References a standard size (e.g. U.S. Golden Age) from a pre-defined set.
height decimal(10,3) Height in either inches or centimeters
width decimal(10,3) Width in either inches or centimeters
size_in_metric boolean This does not affect the standard sizes in the other table, which have their own metric flag.
interior_paper_id Foreign key to core_paper Paper stock of the interior of the publication.
cover_paper_id Foreign key to core_paper Paper stock of the cover of the publication.
binding_id Foreign key to core_binding How the publication is bound, although what that means remains a matter of some debate.
Field relating to sequences:
sequence_count int(11) Number of stories linked to this issue.
Change tracking fields:
created date Row creation date.
modified date Last modified date.
modification_time time Last modified time.


core_item_series
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
series_id Foreign key into core_series
item_id Foreign key into core_item

Sequences

The core_sequence table is the new name of the old stories table, using a more generic term.

core_sequence
DB Column Type Description Usage Notes
id int(11) auto_increment DB-generated primary key.
title varchar(255) The story title. In quotes if taken from the first line of the story.
title_inferred boolean True if the title was chosen by the indexer (either made up or quoted from the script). This replaces the use of brackets, which should no longer appear in the title field.
feature varchar(255) The name of the feature, if any. Should match the logo on the splash page, or possibly the bolded text in the intro blurb (for stories that don't get fancy splash pages and logos).
type_id Foreign key to core_type Type of "story", or more accurately sequence. Actual comic story vs text story vs ad vs letters page, etc.
page_count varchar(10) Number of pages. Pages. Should be decimal but needs migration work.
sort_code int(11) Order of the story or other content within the issue. Formerly Sequence Number but that name is a bit inaccurate given how we currently handle variant covers, yielding sequences that occupy the same sequential spot and should therefore have the same sequence number.
issue_id Foreign key to the core_issue table. The issue in which this sequence appears.
notes mediumtext General notes about the sequence.
reprint_notes mediumtext Reprint information that can't be stored as links because the other end can't be put in the database. Most of the original purpose of this field has been superseded by the various reprint join tables.
Fields for creator credits:
script mediumtext Script, writer, plotter, etc.
pencils mediumtext Pencils, painted art, layouts, photographs, etc.
inks mediumtext Inks, painted art, photographs
colors mediumtext Colors, painted art, color photographs
letters mediumtext Lettering credit
editing mediumtext Editor credit for the sequence. Rarely used at the sequence level, but particularly useful for anthologies or compilations where individual sequences are drawn from original publications with various editors.
job_number varchar(25) Job Number
Fields related to content:
characters mediumtext
synopsis mediumtext
genre varchar(255)
Change tracking fields:
created date Row creation date.
modified date Last modified date.
modification_time time Last modified time.