New Fun Schema: Difference between revisions
mNo edit summary |
(Document the new series table, including series name.) |
||
Line 41: | Line 41: | ||
|- | |- | ||
| id | | id | ||
| int(11) | | int(11) auto_increment | ||
| DB-generated primary key. | | DB-generated primary key. | ||
| | | | ||
Line 107: | Line 107: | ||
==Series== | ==Series (core_series)== | ||
{| border="1" | |||
|+ '''core_series_name''' | |||
! Column | |||
! Type | |||
! Description | |||
! Usage Notes | |||
|- | |||
| id | |||
| int(11) auto_increment | |||
| DB-generated primary key. | |||
| | |||
|- | |||
| name | |||
| varchar(255) NOT NULL | |||
| A name used for a series. | |||
| The name may come from the cover, the indicia, common usage or any other source. | |||
|- | |||
|} | |||
{| border="1" | {| border="1" | ||
|+ ''' | |+ '''core_series''' | ||
! | ! Column | ||
! | ! Type | ||
! Description | ! Description | ||
! Usage Notes | ! Usage Notes | ||
Line 120: | Line 137: | ||
! colspan="4" | Core series data fields: | ! colspan="4" | Core series data fields: | ||
|- | |- | ||
| id | | id | ||
| int(11) auto_increment | |||
| DB-generated primary key. | | DB-generated primary key. | ||
| | | | ||
|- | |- | ||
| | | language_id | ||
| | | foreign key to '''core_country''' | ||
| | | The language in which the series is written. | ||
| | | Currently there is no facility to record multi-lingual comics. | ||
|- | |- | ||
| format | | format | ||
| | | varchar(255) | ||
| | | legacy format field | ||
| This field will be retained only until its contents have been migrated to the issue and sequence-level format fields. It's contents are a mess of at least five different sorts of data (including size, binding, paper stock, coloration and frequency of publication). | |||
|- | |- | ||
| year_began | | year_began | ||
| | | int(11) | ||
| Year (by cover date) of first publication. | |||
| | | | ||
|- | |- | ||
| year_ended | | year_ended | ||
| | | int(11) | ||
| | | Year (by cover date) of last publication, if any. | ||
| | |||
|- | |- | ||
| | | notes | ||
| | | longtext | ||
| | | Space for arbitrary notes. | ||
| | | | ||
|- | |- | ||
| tracking_notes | | tracking_notes | ||
| mediumtext | |||
| Notes about how a given series is continued in or continued from another series. | | Notes about how a given series is continued in or continued from another series. | ||
| Largely replaced by the '''core_series_tracking''' table, after most data is migrated this field will only be used for tracking notes where one end of the relationship is not and cannot be in the database (tracking from a newspaper or web comic, for instance). | |||
|- | |||
| has_gallery | |||
| tinyint(1) | |||
| True if there's a cover gallery to display for this series. | |||
| | | | ||
|- | |- | ||
| open_reserve | | open_reserve | ||
| int(11) | |||
| Integer field. | | Integer field. | ||
| How many issues of this series are open for reservation. | | How many issues of this series are open for reservation. | ||
|- | |- | ||
! colspan="4" | Change tracking fields: | ! colspan="4" | Change tracking fields: | ||
|- | |- | ||
| created | | created | ||
| Row creation | | date | ||
| Row creation date. | |||
| | | | ||
|- | |- | ||
| modified | | modified | ||
| | | date | ||
| Last modified date. | |||
| | | | ||
|- | |- | ||
| modification_time | | modification_time | ||
| | | time | ||
| Last modified time. | |||
| | |||
| | | | ||
|- | |- | ||
|} | |} |
Revision as of 01:04, 18 July 2009
This page documents the proposed database schema for the "New Fun" release, i.e. the first release of the new web site that will go into production. It is not set in stone, but is nearing its final form. At some point, we will declare a soft schema freeze, after which only minor changes (such as adding a column that doesn't affect table relationships or major display options) will be allowed.
A few notes on database vs code names:
Table names are lower case with underscores. Model class names are capitalized with no spaces or underscores. It should be obvious which model fits which table.
All foreign keys end in _id in the database, but do not have that suffix in the code. So wherever you see something like issue_id in this table, expect to see issue in the code. This is because in the code you get an object, not just an ID.
Tables are prefixed with an application name, which does not appear in the model class name. The main app name is currently usually core, which is arguably a lousy name. I'm contemplating changing it to data but I'm not sure about that. It can't be gcd because that is the name of the overall project. Currently, the only other application is migration, which is for tables that exist only to assist in migration of data from old to new forms.
Publisher (core_publisher)
The Publisher table is unusual in that it stores several distinct types of entities, all of which fall under the general umbrella of "Publisher". Several of them may play multiple roles, which is the primary reason for keeping them in one table. Each type of publishing entity is represented by a proxy model in the Django code. Except as noted, publishers are connected to each other via the PublisherRelationship table described further down. Before going through the fields, here is the list of proxy models and their definitions:
MasterPublisher
This is the legacy notion of a top-level publisher, linked at the series. Defined by setting is_master = 1. The nature and future role of master publishers is very much a matter of debate.
Imprint
This is the legacy notion of an imprint, effectively a hodge-podge of anything and everything that did not fit as a MasterPublisher. These are linked at the series level, and will be retained only until they can be migrated to the other fields, after which they will be dropped from the application. Something is an imprint if its parent field is not NULL (relationships between new publisher entries do not use the parent field).
PublishingCompany
An actual corporate entity of some sort involved in publication. These are linked at the issue level and defined as having is_company = 1.
Brand
A marketing brand involving a particular logo or set of closely related logos. These are linked at the issue level and defined as having is_brand = 1.
The Publisher Table
Column | Type | Description | Usage Notes |
---|---|---|---|
Core publisher data fields: | |||
id | int(11) auto_increment | DB-generated primary key. | |
name | varchar(255) NOT NULL | Name of the publishing entity. Different for each type. | Master publishers use names chosen for easy recognition. Brand names are based on the logos that denote them. Publishing company names should be exactly as found in the indicia that reference them (minor variations, such as "Co." vs "Company" may be harmonized). Imprints have arbitrarily complicated garbage for names that mash together all of the above. Don't even try to figure it out. |
country_id | foreign key to core_country | The country from which the publisher operates. | |
year_began | int(11) | Year (by cover date) of first publication. | |
year_ended | int(11) | Year (by cover date) of last publication, if any. | |
url | varchar(255) | The URL for the publisher's website. | Should be a complete URL, including the protocol (i.e. "http://") |
notes | longtext | Space for arbitrary notes. | |
Type flags. A row may have more than one set to true. | |||
is_master | tinyint(1) | Defines which rows appear in the MasterPublisher proxy model. | |
is_company | tinyint(1) | Defines which rows appear in the PublishingCompany proxy model. | |
is_brand | tinyint(1) | Defines which rows appear in the Brand proxy model. | |
Change tracking fields: | |||
created | date | Row creation date. | |
modified | date | Last modified date. |
Series (core_series)
Column | Type | Description | Usage Notes |
---|---|---|---|
id | int(11) auto_increment | DB-generated primary key. | |
name | varchar(255) NOT NULL | A name used for a series. | The name may come from the cover, the indicia, common usage or any other source. |
Column | Type | Description | Usage Notes |
---|---|---|---|
Core series data fields: | |||
id | int(11) auto_increment | DB-generated primary key. | |
language_id | foreign key to core_country | The language in which the series is written. | Currently there is no facility to record multi-lingual comics. |
format | varchar(255) | legacy format field | This field will be retained only until its contents have been migrated to the issue and sequence-level format fields. It's contents are a mess of at least five different sorts of data (including size, binding, paper stock, coloration and frequency of publication). |
year_began | int(11) | Year (by cover date) of first publication. | |
year_ended | int(11) | Year (by cover date) of last publication, if any. | |
notes | longtext | Space for arbitrary notes. | |
tracking_notes | mediumtext | Notes about how a given series is continued in or continued from another series. | Largely replaced by the core_series_tracking table, after most data is migrated this field will only be used for tracking notes where one end of the relationship is not and cannot be in the database (tracking from a newspaper or web comic, for instance). |
has_gallery | tinyint(1) | True if there's a cover gallery to display for this series. | |
open_reserve | int(11) | Integer field. | How many issues of this series are open for reservation. |
Change tracking fields: | |||
created | date | Row creation date. | |
modified | date | Last modified date. | |
modification_time | time | Last modified time. |
Issues
The issues table contains large swaths of unused columns, due to it initally being a cloning of the sequence 0 story records when moving from flat file to a semi-relational database (FileMaker 4). Some fields stayed aboard instead of being kicked out due to a number of flat-file based files pending inclusion.
DB Column | Name in New Code | Description | Usage Notes |
---|---|---|---|
Core issue data fields: | |||
ID | id | DB-generated primary key. | In the lasso-based site, this is the number you see in URLs like http://www.comics.org/details.lasso?id=293 It is generated by the database and has no inherent meaning. |
VolumeNum | volume | Volume number from the indicia. | Not consistently used, and according to recent editor list comments, not planned for migration into the new schema. Where especially relevant, volume numbers are typically included in the Issue (number) field. NOTE: This should actually be used for proper indexing of some foreign titles which restart numbering every year. |
Issue | number | Issue number or other identifier from the indicia. | Use the value nn to indicate "no number" when there is no identifier. For issues where the volume number is important, the form v1#1 is used to specify both the volume and issue numbers. When the cover number or commonly assumed number is different from the indicia number, it goes in brackets after the indicia number: v3#2 [26]. The same format is used for disambiguation of identically numbered issues: 1 [1946] or nn [1947]. |
Pub_Date | publication_date | Publication date of the issue as given in the indicia, but fully spelled out. | Published Date |
Price | price | A decimal price followed by the ISO code, i.e. 0.10 USD | Cover Price |
storycount | story_count | Number of stories linked to this issue. | May not be 100% accurate for all issues, but is updated whenever the issue itself is updated. |
Key_Date | key_date | Specially formatted date string that (in theory) determines the ordering of issues within a series. | In practice, many issues are missing this field, meaning that if their issue numbers are not standard, there is no reliable programmatic way to sort the issues in the series. In practice, the cover sort codes are slightly more reliable (although not always), but they are not included in the public data dumps. Also see Keydate |
Fields that would be core issue data but aren't actually used: | |||
Pg_Cnt | page_count | Not Used | Page count for the whole issue is taken from story sequence 0 (which otherwise represents the front cover). |
Editing | editor | Not Used | Editor(s) for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). |
Notes | notes | Not Used | Notes for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). |
Fields related to indexing: | |||
IndexStatus | index_status | Whether or not an issue is being indexed, and where it is in the process. | 0=not (white); 1=reserved (red); 2=pending (orange); 3=approved (green). Colors noted are how the issues appear in various index status grids. |
ReserveCheck | reserve_check | Not Used | The intention of this field is unknown. |
ReserveStatus | reserve_status | 1=indexed, pending or reserved; 0=not reserved | The intended relationship between this and IndexStatus/index_status is not known. from Jon: "0 = not reserved, 1 = reserved, 2 = reserved for editing. The value 2 has been used to a small extent to reserve books for re-editing or adding data after it's been indexed and approved." |
Fields related to cover images: | |||
CoverCheck | has_cover | Not Used | |
CoverCount | num_covers | Not Used | Intended for variants |
Fields related to the series, including unused cache fields: | |||
SeriesID | series | Foreign key into the series table. | |
Bk_Name | series_name | Not Used (deprecated) | |
Yr_Began | year_began | Not Used | |
Pub_Name | publisher_name | Not Used | |
Fields apparently related to flat file management / distribution: | |||
InitDist | initial_distribution | Not Used | |
UpdateDist | update_distribution | Not Used | |
isUpdated | is_updated | Deprecated. Used for tracking updated indexes before processing each distribution. | |
Change tracking fields: | |||
created | created | Row creation timestamp. | |
Modified | modified | Date of last row change. | |
ModTime | modification_time | Time of last row change (24 hour). | |
LstChang | last_change | Date of some sort, but apparently unused? | Column has numerous values, but they do not line up with the values in the Modified column. NOTE: This field was butchered data-wise and then replaced by the modified/modtime combo. |
Miscellaneous unused and/or mysterious fields: | |||
rel_year | rel_year | Not Used | Intended for released year, caching of keydate year. |
SelfCount | self_count | Not Used | Historically used for duplcation checking. |
SeriesLink | series_link | Not Used | Deprecated. Early non-id based series link used to build the initial foreign keys into series. |
Unused fields that would apply only to the cover. Stemming from seq 0 clone.: | |||
Feature | cover_feature | Not Used | |
Char_App | cover_characters | Not Used | |
Script | cover_script | Not Used | |
Pencils | cover_pencils | Not Used | |
Inks | cover_inks | Not Used | |
Colors | cover_colors | Not Used | |
Letters | cover_letters | Not Used | |
Synopsis | cover_synopsis | Not Used | Yes, I realize cover_synopsis does not make a whole lot of sense. It's not used anyway. |
Reprints | cover_reprints | Not Used | Arguably this could go in the "whole issue" section and be used to indicates issues that reprint entire other issues instead of doing story-by-story reprint links. |
Title | cover_title | Not Used | Arguably this could go in the "whole issue" section. |
Genre | cover_genre | Not Used | Arguably this could go in the "whole issue" section. |
Type | cover_type | Not Used | Since the type of a cover is "cover", this also doesn't make much sense. But it doesn't make sense as applied to an entire issue either, as far as I can tell. |
Seq_No | cover_sequence_number | Not Used | Makes even less sense that the rest of these. Stemming from the cloning of the stories table... |
Stories
The stories table has a number of unused columns that are duplicates of columns from the series and issues tables (they're known to be unused because in some cases the data does not match, and the versions shown on the Lasso site are the ones in the issue and series tables, not these). They were initially used for caching data to avoid expensive joins in the FileMaker days. Also, sequence 0 (which represents the front cover) is treated specially, with its page count, notes, reprints and editor fields being applied to the whole issue by the current UI implementation.
DB Column | Name in New Code | Description | Usage Notes |
---|---|---|---|
Core story data fields: | |||
ID | id | DB-generated primary key. | Since there is no story page, this field does not show up in any URLs in the Lasso implementation. |
Seq_No | sequence_number | Order of the story within the issue. | By convention, sequence 0 is the front cover, and some of its fields are considered to apply to the whole issue. Sequence Number |
Feature | feature | The name of the feature, if any. | Often but not always the name of the primary character. Some stories do not have a feature. |
Pg_Cnt | page_count | Number of pages. | Pages |
Notes | notes | ||
Reprints | reprints | Reprint information (both reprint from and reprinted in). | |
Type | type | Type of "story", or more accurately sequence. | Actual comic story vs text story vs ad vs letters page, etc. |
Fields for creator credits: | |||
Script | script | ||
Pencils | pencils | ||
Inks | inks | ||
Colors | colors | ||
Letters | letters | ||
Editing | editor | Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book. However, this field can be used if individual stories have separate editors. | |
JobNo | job_number | Job Number | |
Fields related to content: | |||
Title | title | ||
Char_App | characters | ||
Synopsis | synopsis | ||
Genre | genre | ||
Field related to flat file management / distribution. See series/issues: | |||
InitDist | initial_distribution | Not Used | |
Fields relating to or copied from the issues table: | |||
IssueID | issue | Foreign key into the issues table. | |
Issue | issue_number | Not Used | Earlier indexes may have data in this field, but it is ignored. |
Pub_Date | publication_date | Not Used | Earlier indexes may have data in this field, but it is ignored. |
Key_Date | key_date | Not Used | Earlier indexes may have data in this field, but it is ignored. |
Price | price | Not Used | |
rel_year | rel_year | Not Used | |
Fields relating to or copied from the series table: | |||
SeriesID | series | Foreign key into the series table, duplicated from the issues table presumably to allow for direct stories <-> series joins. | Actually, it was used to help troubleshoot and QA the initial building of a relational database way back. Also use for statistics. |
Yr_Began | year_began | Not Used | Earlier indexes may have data in this field, but it is ignored. |
Pub_Name | publisher_name | Not Used | Earlier indexes may have data in this field, but it is ignored. |
Change tracking fields: | |||
created | created | Row creation timestamp. | |
Modified | modified | Date of last row change. | |
ModTime | modification_time | Time of last row change (24 hour). | |
LstChang | last_change | Date of some sort, but apparently unused? | Column has numerous values, but they do not line up with the values in the Modified column. Data got butchered through various ins and outs of flat files and between systems changing the date format. |