Current Schema: Difference between revisions

From GCD
Jump to navigation Jump to search
(Clean up some imprint stuff.)
(Convert all docs to the 0.2 Vieux Bois schema. Still need to add brand and indicia publisher tables.)
Line 1: Line 1:
This wiki page attempts to document the current schema for the GCD.  The public schema consists of four principle data tables as well as several supporting tables.  The supporting tables are '''Countries''', '''Languages''' (lists of codes for each of those things), '''Indexers''' (a list of indexers, but not of full account information) and '''IndexCredit''' (maps indexers to series for credit, but there are also credit notes in the series table).  The four main tables are described below.
== Introduction ==


The original organization of the GCD data was into two files--or, more accurately, into a Series file (later two and then four Series files) and several dozen detail filesThe two types of files were normalized by Lionel English in the mid-1990s in order to put them in to a real database, and then the two normalized files were further normalized in order to create the '''Publishers''' table from the '''Series''' table, and an '''Issues''' table from the '''Details''' table.  The supporting tables were mostly created in the same fashion.  Note in particular that normalization occurred after-the-fact and organization efforts were applied retroactively to a large existing data-set upon which no strict data type-enforcement had been been applied.  Many, if not most, of the current schema's anomalies are a direct result of the database's free-form beginnings.
This wiki page attempts to documents the current schema for the GCD.  As of October 2009, the schema is being actively maintained and enhanced, and as of December 2009 all efforts are being made to keep this page up to date.


The "Name in New Code" column is for those who might be looking at the new Python/Django implementation.  Since the existing table and column names are not even remotely consistent in capitalization, formatting or abbreviation, they were standardized in the Django model classes to make working with them easier (Django provides a convenient name mapping system for exactly this purpose).
=== Public and Private Tables ===


Note that all foreign keys in this database are nullable.  The current data set includes stories with no issue, issues with no series, and series with no publisherAnd other stranger thingsThese are all errors; resulting from unenforced relational integrity and data formatting errors during early imports.
The GCD schema consists of public and private tables.  The public tables contain the visible data records (except for cover information) and are available to anyone in the [http://dev.comics.org/data/ public dumps].  No account or membership of any sort is required to access the data dumps.  The GCD requires that any use of our data from the dumps be credited to us.  On a web site, this should be done on each page where the data is used, and a link should be provided to our siteCommercial use is welcome as long as credit and (on the web) links are providedFor questions about using the dump please contact the Board.


Please note that much of the '''issue''' and '''story''' table information on this page is based on earlier documentation by Peter Croome, with the remaining initial data contributed by Henry AndrewsLionel English has attempted to provide clarification where possible. Jon Løvstad has added some comments per April 1st.
The private tables consist of three types of tables:
* account data (private for the obvious reasons)
* data editing tables (private to avoid sending out unapproved data- we'd like to make the change history public at some point if folks want it)
* the covers table (private for legal reasons involving the distributions of cover scans and copyright law.  Please contact the Board if you have concerns about this policy.  Please do not contact the tech team as this is not a technical policy and complaining to them only wastes both your time and theirs)
 
=== A Bit of History ===
 
The original organization of the GCD data was into two files--or, more accurately, into a Series file (later two and then four Series files) and several dozen detail files.  The two types of files were normalized by Lionel English in the mid-1990s in order to put them in to a real database, and then the two normalized files were further normalized in order to create the '''Publishers''' table from the '''Series''' table, and an '''Issues''' table from the '''Details''' table.  During the server migration and site rewrite the took place during August-December 2009, Henry Andrews, Jochen Garcke and the rest of the tech team changed the database to a system supporting true foreign keys and transactions, and imposed a consistent naming scheme on the tables and columns.  Several new public tables, as well as an entirely new set of data editing tables, were added at this time.  The documentation for the pre-2009-migration schema is [PreOctober2009_Schema still available] for those curious about it.
 
=== Future Plans ===
 
We're progressing towards our real "1.0" release of this generation of the site, code-named "New Fun".  Much discussion on the lists references the "New Fun" schema which supports a much more complex and realistic view of the data.  Please join the [Email_Lists_FAQ  gcd-tech group] if you would like to participate in that discussion.  For the curious, the December 2009 release was 0.2 "Vieux Bois", for
 
== Supporting Tables ==
 
There are several supporting tables necessary to use the data which lay out fixed sets of options for certain fields.  They are:
 
* '''gcd_country'''
** Has columns '''id''', '''code''' and '''name'', and provides the set of countries used by the various publisher and series tables.  The '''code''' column is the ISO 3166 code for the country, and should arguably be used as the foreign key, but due to the very rushed development timeline of the December 2009 release (as the prior server was no longer viable), we went with the standard numeric '''id''' column for reasons which seemed like a good idea at the time, but involved a great deal of sleep deprivation :-)  We'll probably revisit this eventually.
* '''gcd_language'''
** Has columns '''id''', '''code''' and '''name''', and provides the set of languages used by the series table.  The '''code''' column is the ISO 639 code for the language.  The two-letter ISO 639-1 code is used if available, three letter code from 639-2 or 639-3 otherwise.  This might get standardized into 3-letter codes or 4-letter codes if we need to take ISO 639-6 into account once its code list is publishedThis table has the same foreign key oddness about '''code''' vs '''id''' as the '''gcd_country''' table, although the possibility of changing the codes makes it more sensible to stick with '''id''' for now.
* '''gcd_story_type'''
** Has columns '''id''' and '''name''', and provides the set of types used by the story table.  The type field is somewhat controversial in that its values actually cover various different attributes of story sequences.  However, until late 2009 there was no way for the GCD to change the field (and it was in fact a plain text field within the story table) so discussions on changing it were set aside.  This will be one of many things revisited along the path to the 1.0 New Fun release.


==Publishers==
==Publishers==


The publisher table is mostly straightforward. The largest area of confusion is the notion of imprints and of master publishers.  Many of the larger publishers have had many names and/or many concurrent corporate identities during their life time (particularly true of Golden Age publishers, where multiple corporation names were used to reduce risk and hide ownership)In order to group related publishing companies together, the GCD uses what we have recently been calling "house" names, or names that have been established in fandom as umbrella names to refer to a particular, identifiable group of related companies (e.g. Detective Comics, Inc; National Periodicals; DC Comics Inc).  In an effort to provide a place to note the actual publishers of given series, a "Publisher Notes" field was later added, which was used both to list actual company names and publishing imprints, eventually resulting in a rather ugly messMuch attention was focused on correcting this mess in the new schema.
The publisher table currently holds two sorts of records: "master publishers" and "imprints"Both of these concepts are less precise than one might expect.
 
"Master publishers" are the common names of publishers as typically grouped by comic book researchers.  As this vague definition suggests, there is considerable room for disagreement over what constitutes a master publisher and how such entities should be namedThis field is often referred to as the "argue 'till we're blue in the face" field.  Part of the current definition of a series in the GCD is that all issues within the series are published by the same master publisher.  This has led to the grouping some companies together under a single master publisher because one took over several series from the other.  Debate on the proper arrangement and definition of master publishers is ongoing.  A master publisher is a record that has the '''is_master''' field set to 1. The name of the publisher is chosen primarily based on what is most likely to be expected by a user running a search.
 
As of late 2009, "imprints" are considered a deprecated concept and are being replaced by indicia publishers and brands (covered in their own tables below).  The concept of an imprint may return in a more well-defined form in the future for formal imprints of publishing houses.  The data records in this table that are considered "imprints" serve as a catch-all for several different publishing concepts, often requiring long semi-colon separated lists as the field is set per-series but the data often changes per-issue.  An imprint is a record that has a non-NULL '''parent_id''' field.
 
Note that there are a very small number of records that are both master publishers ''and'' imprints.  Explaining those records is beyond the scope of this documentation.


{| border="1"
{| border="1"
|+ '''publishers''' ['''Publisher''' class in new site]
|+ '''gcd_publisher''' ['''apps.gcd.models.Publisher''' class in the code]
! DB Column
! Column Name
! Name in New Code
! Description
! Description
! Usage Notes
! Usage Notes
|-
|-
! colspan="4" | Core publisher data fields:
! colspan="3" | Core publisher data fields:
|-
|-
| ID
| id
| id
| DB-generated primary key.
| DB-generated primary key.
| In the lasso-based site, this is the number you see in URLs like http://www.comics.org/publisher_details.lasso?id=4066  It is generated by the database and has no inherent meaning.
|  
|-
|-
| PubName
| name
| name
| Name of the publishing "family", publisher (legal entity), imprint, or set of imprints
| Name of the master publishing "family", imprint, or set of imprints
| Because imprints are not stored per-issue, in some cases a semicolon-separated list of imprints or corporate names corresponding to all imprints used for a given series has been given an entry in this table.  Also, not all things that we term "Publishers" or "Imprints" strictly fall into those categories, and we do not accurately reflect the evolution of company names over time (for instance, "DC" was not until fairly recently the official name of the company, although it was long an informal name.  It is still not exactly correct- "DC Comics" appears to be the current legal name.  But "DC" is what everyone knows, so that is what appears in the database).
| See the discussion above for how this name is chosen
|-
|-
| YearBegan
| year_began
| year_began
| Year (by cover date) of first publication.
| Year (by cover date) of first publication.
|
|
|-
|-
| YearEnded
| year_ended
| year_ended
| Year (by cover date) of last publication, if any.
| Year (by cover date) of last publication, if any.
|  
|  
|-
|-
| CountryID
| country_id
| country
| The country from which the publisher operates.
| The country from which the publisher operates.
| Foreign key into the '''Countries''' table.
| Foreign key into the '''gcd_country''' table.
|-
|-
| Notes
| notes
| notes
| Space for arbitrary notes.
| Space for arbitrary notes.
|
|
|-
|-
| web
| url
| url
| The URL for the publisher's website.
| The URL for the publisher's website.
| While this column sometimes has data, it does not appear to be used by the Lasso-based site.
|
|-
|-
| AlphaSortCode
! colspan="3" | Relations to other tables and counts of related objects:
| alpha_sort_code
| A single character.
| Appears to be used to sort related publishers near each other even if the more obvious alphabetical sort would not do so.  It's a touch unclear. Intended to strip out "The" etc. in an earlier version of the database.
|-
|-
! colspan="4" | Publisher / imprint relations and publication counts:
|-
| ImprintCount
| imprint_count
| imprint_count
| Count of how many imprints a given publisher has.
| Count of how many imprints a given master publisher has.
| Does not appear to be uniformly accurate. Should be recalculated on updates.
| Recalculated by the code on data updates.
|-
|-
| BookCount
| series_count
| series_count
| Count of how many series this publisher publishes.
| Count of how many series this publisher publishes.
| Does not appear to be uniformly accurate. Should be recalculated on updates.
| Recalculated by the code on data updates.
|-
|-
| IssueCount
| issue_count
| issue_count
| Total number of issues in all series published by the publisher.
| Total number of issues in all series published by the publisher.
| Does not appear to be uniformly accurate. Should be recalculated on updates.
| Recalculated by the code on data updates.
|-
|-
| Master
| is_master
| is_master
| Boolean field.
| Boolean field (1 or 0 in MySQL).
| This field indicated the "master" or top level in the hierarchy of publisher/imprint in the days where both publishers and imprints were stored in the same, self-joined table in FileMaker.
| See discussion of master publishers above.
|-
|-
| Connection
| parent_id
| connection
| Self-join key for parent/child relationships between master publishers and imprints.
| Not used.
| NULL for non-imprints.
| Was supposed to explain the nature of each publisher-to-publisher relationship.
|-
|-
| ParentID
! colspan="3" | Change tracking fields:
| parent
| Self-join key for parent/child relationships.
| This is how the imprints for a publisher are stored.
|-
|-
| NextID
| reserved
| next
| If true, there is an active change being made in the data editing tables.
| Self-join key, was intended to provide links from one incarnation of a publisher to the next incarnation (e.g. National Periodical Publications -> DC Comics Inc).
|
| Never implemented.
|-
! colspan="4" | Change tracking fields:
|-
|-
| created
| created
| created
|
| Row creation timestamp.
|
|
|-
|-
| Modified
| modified
| modified
| One of two change timestamps.
|  
| Actively used by the Lasso site.
|
|-
| Updated
| updated
| One of two change timestamps.
| Deprecated.
|-
|-
|}
|}


==Brands==
==Indicia Publishers==


==Series==
==Series==


The series table includes a number of fields that are now obsolete, or perhaps represent possible new features that were abandonedIt also has fields related that may be related to cover scans, which are unclear as the cover tables are not included in the public schema.
A series in the GCD is officially defined by the indicia title and publisher- when either of these changes a new series is created.  Minor changes in the indicia title (such as the presence or absence of an article, i.e. "The") may be allowed within one series, especially if the article comes and goesSome exceptions exist, such as Dell's Four Color (2nd series) which is commonly grouped in guides but actually uses a different indicia title for nearly every issue.
 
A series may be split due to a particularly long lapse in its publication schedule (on the order of years) or due to certain kinds of renumbering, but the exact rules in this area are still being debated.


{| border="1"
{| border="1"
|+ '''series''' ['''Series''' class in new site]
|+ '''gcd_series''' ['''apps.gcd.models.Series''' class in the code]
! DB Column
! Column Name
! Name in New Code
! Description
! Description
! Usage Notes
! Usage Notes
|-
|-
! colspan="4" | Core series data fields:
! colspan="3" | Core series data fields:
|-
|-
| ID
| id
| id
| DB-generated primary key.
| DB-generated primary key.
| In the lasso-based site, this is the number you see in URLs like http://www.comics.org/series.lasso?seriesid=72  It is generated by the database and has no inherent meaning.
|  
|-
|-
| Bk_Name
| name
| name
| Series title.
| Series title, as it appears in the indicia (if any, as on the cover if there is no indicia), with any leading article moved to the end after a comma.
|
| The article-moving rule applies only if the title is in the language in which the series is written.
|-
|-
| Format
| format
| format
| Physical format of the series (size, color presence or absence, paper stock, etc.).
| Physical format of the series (size, color presence or absence, paper stock, etc.).
| Since format is really an attribute of the issue rather than the whole series, this field may contain a fairly long description of how the format changed over the life of the series.  In the new schema, this is being replace by a number of distinct fields, which individually describe the constituent attributes commonly listed here.
| Since format is really an attribute of the issue rather than the whole series, this field may contain a fairly long description of how the format changed over the life of the series.  In the "New Fun" schema, this is being replace by a number of distinct fields, which individually describe the constituent attributes commonly listed here.
|-
| Notes
| notes
| Space for arbitrary notes.
|
|-
|-
| Yr_Began
| year_began
| year_began
| First year (by cover date) of publication.
| First year (by cover date) of publication.
|
|  
|-
|-
| Yr_Ended
| year_ended
| year_ended
| Last year (by cover date) of publication, if any.
| Last year (by cover date) of publication, if any.
|
|
|-
|-
| PubDates
| is_current
| Set to true if new issues are still being produced for this series.
| This is a new field and may not be properly set for all series yet.
|-
| first_issue_id
| Foreign key to the first issue of this series in the gcd_issue table.
| Updated by the code when issues are added, deleted or moved.  This is a new field and is not always properly set, nor is it yet used in the UI.
|-
| last_issue_id
| Foreign key to the last issue of this series in the gcd_issue table.
| Updated by the code when issues are added, deleted or moved.  This is a new field and is not always properly set, nor is it yet used in the UI.
|-
| publication_dates
| publication_dates
| First and last full cover publication dates of the series, separated by a '-'.  No last date if still ongoing.
| First and last full cover publication dates of the series, separated by a '-'.  No last date if still ongoing.
|
| This field is being replaced by the '''first_issue_id''' and '''last_issue_id''' fields.  It is not currently possible to edit it from the OI, so errors are not being actively corrected.  Once all data is migrated to the new field, this one will be dropped.
|-
| tracking_notes
| Notes about how a given series is continued in or continued from another series.
| See the [Formatting Documentation] for how these notes should be structured.  This will eventually be replaced with foreign key fields.
|-
|-
| Pub_Note
| publication_note
| publication_note
| Arbitrary publication / publisher-related notesSome records in the current DB include addresses, imprint information and/or general reprint information here.
| Formerly the place where arbitrary publication details could be recordedNow considered deprecated.
| Legacy field.  Not used by the Lasso implementationReplaced by the imprint_id foreign key into the Publisher table that identified the imprint record.
| This predates the imprint field, which predates the issue-level indicia publisher and brand fieldsNotes here that are not migrated to those fields will most likely be merged into the general notes field and this field will be eventually dropped.
|-
|-
| Frst_Iss
| notes
| first_issue
| Space for arbitrary notes.
| Issue # (or other identifier) of the first issue in the series.
|  
| Note that it's not currently reliably possible to determine this from the issue table, since the series may only be partially indexed.
|-
| Last_Iss
| last_issue
| Issue # (or other identifier) of the last issue in the series, if any.
| Note that it's not currently reliably possible to determine this from the issue table, since the series may only be partially indexed.
|-
| Issuecount
| issue_count
| Number of issues associated with the series.
| Does not appear to be reliably accurate.
|-
| PubID
| publisher
| Foreign key into the '''publishers''' table.
|
|-
|-
| Pub_Name
| publisher_id
| publisher_name
| Foreign key into the '''gcd_publisher''' table.
| Cached Publisher name and stems historically from the flat file format.
| Should only ever reference a master publisher record (enforced at the code level).
| Reliability of this field has not been verified.
|-
|-
| imprint_id
| imprint_id
| imprint
| Foreign key into the '''publishers''' table for the depreated "imprint" concept.
| Foreign key into the '''publishers''' table for the "imprint" concept. Which bears only occasional resemblance to any normal definition of the term.
| This is being set to NULL once its information has been migrated to the '''brand_id''' and '''indicia_publisher_id''' on the issues for the series.
|
|-
|-
| CounCode
| country_id
| country_code
| Foreign key into the '''gcd_country''' table.
| '''Not''' a foreign key into the Country table.  Instead the country code is duplicated here, and does not necessarily match properly.
| This is properly an attribute of the indicia publisher (linked to the issue table) rather than the series, and will eventually be dropped once that field is sufficiently populated.
| Historically it was done this way to save some joining time.
|-
|-
| LangCode
| language_id
| language_code
| Foreign key into the '''gcd_language''' table.
| '''Not''' a foreign key into the Languages table. Instead the language code is duplicated here, and does not necessarily match properly.
| The language in which the comic is written.  For comics that are purely visual, there is an ISO code for "no linguistic content".
| Historically it was done this way to save some joining time.
|-
|-
! colspan="4" | Fields relating series to each other (continuations, etc.):
| issue_count
|-
| Number of issues associated with the series.
| Tracking
| Recalculated by the code when the data is updated.
| tracking_notes
| Notes about how a given series is continued in or continued from another series.
|
|-
|-
| Crossref
| has_gallery
| crossref
| If true, the series has at least one cover image.
| Not in use.  
| Updated by the code as covers are added/changed/removed.
| Intended to indicate a cross reference to replace tracking
|-
|-
| CrossrefID
! colspan="3" | Change tracking fields:
| crossref_id
| Not in use.
| Intended to be a foreign key to a table of one-to-many references to other series to replace tracking.
|-
|-
! colspan="4" | Field related to the (cover?) gallery:
|-
| HasGallery
| gallery_present
| 3-character field, but used as boolean.
| Gallery flag. Initially used for caching display of the gallery indicator. 3-char field stems from very early "YES/NO" values.
|-
| Included
| gallery_includes
| Obsolete.
| Legacy field that used to be used to indicate which issues had been indexed.  Superceded by various matrices on the current site.
|-
! colspan="4" | Fields related to indexing:
|-
| Indexers
| indexers
| Semicolon-separated string of indexers.
| Originally a free-form text field recording who had worked on a given series.  Used to create initial version of '''Indexers''' table, and then used to create the '''IndexCredit''' join table.  Later updated periodically *from* the '''IndexCredit''' join table so that it could be exported into legacy format.  Presumably no longer completely up-to-date, now that the lasso-site tracks indexing at the issue level.  Not clear how the current site builds the display of indexers on a given page.
|-
| OpenReserve
| open_reserve
| open_reserve
| Integer field.
| Obsolete field that we forgot to drop during schema migration.
| How many issues of this series are open for reservation.
| This will be dropped next time we change the schema, and should be ignored.
|-
|-
! colspan="4" | Fields apparently related to flat file management / distribution:
| reserved
|-
| If true, there is an active change being made in the data editing tables.
| File
| file
| Obsolete. The name of the file used for flat-file distribution. They were sized for floppies IIRC.
|
|
|-
|-
| InitDist
| initial_distribution
| Obsolete.
| Indicated the first distribution set the record was contained in (flat-file).
|-
| UpdateDist
| update_distribution
| Obsolete.
| Indicated the last distribution set the record was updated in (flat-file).
|-
! colspan="4" | Change tracking fields:
|-
| Created
| created
| created
| Row creation timestamp.
|
|
|
|-
|-
| Modified
| modified
| modified
| Date of last row change.
|  
|
|-
| ModTime
| modification_time
| Time of last row change (24 hour).
|
|-
| LstChang
| last_change
| Date of some sort, but apparently unused.
| Only five non-NULL, non-zero values in the current data, all in the future (nearest is in 2010).
|-
| oldID
| old_id
| The ID from a prior organization of the data.
| Long deprecated.
|-
! colspan="4" | Miscellaneous unused and/or mysterious fields:
|-
| Themes
| themes
|
|
| Does not appear to be used (always NULL). Was intended as an early incarnation of keywords.
|-
| SelfCount
| self_count
| Obsolete.
| Stems from previous FileMaker incarnation of the database and used as a way to track duplicated during imports.
|-
|-
|}
|}
Line 313: Line 227:
=Issues=
=Issues=


The issues table contains large swaths of unused columns, due to it initally being a cloning of the sequence 0 story records when moving from flat file to a semi-relational database (FileMaker 4). Some fields stayed aboard instead of being kicked out due to a number of flat-file based files pending inclusion.  
Issues are what you would expect.  They include both periodical issue (pamphlets) and bound volumes / collections, currently with no distinctions between the two.  The "New Fun" schema will add classifications so that we can treat these items differently.  As it is books are sometimes awkwardly wedged into fields designed for periodicals, but it mostly works.
 
One notable exception to the sensibility of this table is that issue variants do *not* each get separate records at this time.  A future schema (possibly "New Fun" but possibly a release some time after that) will address this inconsistency.  For now, variant covers are recorded by adding multiple cover sequences to the story table for the issue. There are some exceptions to this arrangement when more than just the cover varies.


{| border="1"
{| border="1"
|+ '''issues''' ['''Issue''' class in new site]
|+ '''gcd_issue''' ['''apps.gcd.models.Issue''' class in the code]
! DB Column
! Column Name
! Name in New Code
! Description
! Description
! Usage Notes
! Usage Notes
|-
|-
! colspan="4" | Core issue data fields:
! colspan="3" | Core issue data fields:
|-
|-
| ID
| id
| id
| DB-generated primary key.
| DB-generated primary key.
| In the lasso-based site, this is the number you see in URLs like http://www.comics.org/details.lasso?id=293 It is generated by the database and has no inherent meaning.
|  
|-
| number
| Issue number or other identifier from the indicia (if there is one) or cover (if there is no indicia) or in some cases both (if there are both and they disagree).
| See [[Issue Numbers]] for numerous special cases of missing / dual / annual / per-volume or other sorts of "numbers"Note that this is a text field rather than a numeric field because many issue "numbers" are partially or entirely non-numeric, i.e. "Summer Special".
|-
|-
| VolumeNum
| volume
| volume
| Volume number from the indicia.
| Volume number from the indicia (if there is one) or cover (if there is no indicia).
| Not consistently used, and according to recent editor list comments, not planned for migration into the new schemaWhere especially relevant, volume numbers are typically included in the Issue (number) field. NOTE: This should actually be used for proper indexing of some foreign titles which restart numbering every year.
| See [[Volume]] While this field has been present for a long time, it has only recently been displayed on the site.  Therefore the data is not present as often as for other fields.
|-
| display_volume_with_number
| If true, combine the volume number with the issue number in most places that the issue number would be displayedThis produces numbers like "v2#1" (although in many indexes that notation is still used directly in the issue number field- this will eventually be corrected).  See [[Volume]] and [[Issue Numbers]].
|-
| no_volume
| If true, there is no volume number for the issue. This is used to confirm the meaning of an empty volume field since that field was ignored for many years and therefore is often empty when it should not be.  See [[Volume]] and [[Issue Numbers]].
|-
|-
| Issue
| sort_code
| number
| Numeric field that determines how issues sort within each series. There is a uniqueness constraint on the combination of this column and '''series_id'''.
| Issue number or other identifier from the indicia.
| This replaces the old cover sort code and is now the sole source of ordering for issues within a series.  '''key_date''' (which is not always known) is now only used for chronological search results.
| Use the value ''nn'' to indicate "no number" when there is no identifier. For issues where the volume number is important, the form ''v1#1'' is used to specify both the volume and issue numbersWhen the cover number or commonly assumed number is different from the indicia number, it goes in brackets after the indicia number: ''v3#2 [26]''The same format is used for disambiguation of identically numbered issues''1 [1946]'' or ''nn [1947]''.
|-
| key_date
| Specially formatted date string that (in theory) determines the ordering of issues in search results that are organized by date.
| In practice, many issues are missing this field, meaning that they will just sort to the beginning of any chronological listThis is a data problem, not a coding errorFor reliably sorting issues within a series, see '''sort_code'''.  Also see [[Keydate]]
|-
|-
| Pub_Date
| publication_date
| publication_date
| Publication date of the issue as given in the indicia, but fully spelled out.
| Publication date of the issue as given in the indicia, but fully spelled out.
| [[Published Date]]
| See [[Published Date]]
|-
| indicia_frequency
| If the issue has an indicia and it specified a publication frequency (monthly, quarterly, one-shot, etc.) that is recorded here.
| This is a text field rather than a foreign key into a set of options because of frequencies that are totally non-standard (common in alternative/underground publications) or irregular in such ways as "monthly except in September and December".  This is a new field so it is sparsely filled out at this time.
|-
|-
| Price
| price
| price
| A decimal price followed by the ISO code, i.e. ''0.10 USD''
| A decimal price followed by the ISO code, i.e. ''0.10 USD''
| [[Cover Price]]
| See [[Cover Price]] for details, including multiple prices and non-decimal currencies.
|-
|-
| storycount
| page_count
| story_count
| Overall page count for the issue, counting covers but not inserts or dust jackets.
| Number of stories linked to this issue.
| A single sheet with one fold (the smallest possible "comic book") would count as 4 pagesSee [[Page Count]].
| May not be 100% accurate for all issues, but is updated whenever the issue itself is updated.
|-
| Key_Date
| key_date
| Specially formatted date string that (in theory) determines the ordering of issues within a series.
| In practice, many issues are missing this field, meaning that if their issue numbers are not standard, there is no reliable programmatic way to sort the issues in the series.  In practice, the cover sort codes are slightly more reliable (although not always), but they are not included in the public data dumpsAlso see [[Keydate]]
|-
|-
! colspan="4" | Fields that would be core issue data but aren't actually used:
| page_count_uncertain
| If true, the page count is unknown (if page_count is NULL) or could not be reliably determined due to second-hand information or missing pages.
| This field allows the page_count field to be of numeric type.  Previously it was text so that a "?" could be appended for this purpose.
|-
|-
| Pg_Cnt
| editing
| page_count
| Editor for the whole issue, or for the overall anthology if there are different editors for each story in a collection.  Also used for other issue-level credits such as assistant/associate editor, publisher, production, special thanks, etc.
| Not Used
| See [[Credits]] and [[Editing]] for how credits are structured.
| Page count for the whole issue is taken from story sequence 0 (which otherwise represents the front cover).
|-
|-
| Editing
| no_editing
| editor
| If there is no editor (or similar credit) at the issue level, this field should be set to true.  Distinguishes from an unknown editor or from the field simply not having been filled out.
| Not Used
| Editor(s) for the whole issue are taken from story sequence 0 (which otherwise represents the front cover).
|-
|-
| Notes
| notes
| notes
| Not Used
| Space for arbitrary notes about the entire issue.
| Notes for the whole issue are taken from story sequence 0 (which otherwise represents the front cover).
| Because the issue-level notes were originally (by convention) placed in the notes field of the first cover sequence (in the story table), the only reasonable option was to duplicate those notes into the issue table during migration.  We are working through these duplicates, but for now it is very common to see identical notes in the issue record and in the sequence number 0 story record.
|-
|-
! colspan="4" | Fields related to indexing:
! colspan="3" | Fields relating to the other tables:
|-
|-
| IndexStatus
| indicia_publisher_id
| index_status
| Foreign key into the '''gcd_indicia_publisher''' table (nullable).
| Whether or not an issue is being indexed, and where it is in the process.
| Indicates the actual company that published the issue as shown in the indicia or other formal publication dataThis is a new field and therefore is sparsely filled out at this time.
| 0=not (white); 1=reserved (red); 2=pending (orange); 3=approved (green)Colors noted are how the issues appear in various index status grids.
|-
|-
| ReserveCheck
| brand_id
| reserve_check
| Foreign key into the '''gcd_brand''' table (nullable).
| Not Used
| Indicates the publisher's logo / name/ tagline on the exterior (front cover, back cover or spine) of the issue.  There is still a fair amount of debate about what constitutes a separate brand (how much change in the logo appearance and/or text).  This is a new field and therefore it is sparsely filled out at this time.
| The intention of this field is unknown.
|-
|-
| ReserveStatus
| series_id
| reserve_status
| Foreign key into the '''gcd_series''' table.
| 1=indexed, pending or reserved; 0=not reserved
| The intended relationship between this and IndexStatus/index_status is not known. from Jon: "0 = not reserved, 1 = reserved, 2 = reserved for editing. The value 2 has been used to a small extent to reserve books for re-editing or adding data after it's been indexed and approved."
|-
! colspan="4" | Fields related to cover images:
|-
| CoverCheck
| has_cover
| Not Used
|
|
|-
|-
| CoverCount
| story_type_count
| num_covers
| Number of stories of type "story" linked to this issue.
| Not Used
| Used to determine whether an issue is "indexed" or not.  Currently, an issue is indexed if there is at least one entry in the story table of type "story" (to prevent the existence of a story of type "cover" or other type from making the issue look complete).  In the future we may also implement tracking of the expected number of stories to make this measurement more precise.  Calculated whenever an issue's stories are updated.
| Intended for variants
|-
|-
! colspan="4" | Fields related to the series, including unused cache fields:
! colspan="3" | Change tracking fields:
|-
|-
| SeriesID
| index_status
| series
| Obsolete field kept to help debug potential migration problems (so far, so good...).
| Foreign key into the '''series''' table.
| This field will be dropped the next time we update the schema, and should be ignored.
|
|-
|-
| Bk_Name
| reserve_status
| series_name
| Obsolete field kept to help debug potential migration problems (so far, so good...).
| Not Used (deprecated)
| This field will be dropped the next time we update the schema, and should be ignored.
|  
|-
|-
| Yr_Began
| reserve_check
| year_began
| Obsolete field kept to help debug potential migration problems (so far, so good...).
| Not Used
| This field will be dropped the next time we update the schema, and should be ignored.
|
|-
|-
| Pub_Name
| reserved
| publisher_name
| If true, there is an active change being made in the data editing tables.
| Not Used
|
|
|-
|-
! colspan="4" | Fields apparently related to flat file management / distribution:
| created
|-
| InitDist
| initial_distribution
| Not Used
|
|-
| UpdateDist
| update_distribution
| Not Used
|
|
|-
| isUpdated
| is_updated
| Deprecated. Used for tracking updated indexes before processing each distribution.
|
|
|-
|-
! colspan="4" | Change tracking fields:
|-
| created
| created
| Row creation timestamp.
|
|-
| Modified
| modified
| modified
| Date of last row change.
|  
|
|-
| ModTime
| modification_time
| Time of last row change (24 hour).
|
|-
| LstChang
| last_change
| Date of some sort, but apparently unused?
| Column has numerous values, but they do not line up with the values in the Modified column. NOTE: This field was butchered data-wise and then replaced by the modified/modtime combo.
|-
! colspan="4" | Miscellaneous unused and/or mysterious fields:
|-
| rel_year
| rel_year
| Not Used
| Intended for released year, caching of keydate year.
|-
| SelfCount
| self_count
| Not Used
| Historically used for duplcation checking.
|-
| SeriesLink
| series_link
| Not Used
| Deprecated. Early non-id based series link used to build the initial foreign keys into series.
|-
! colspan="4" | Unused fields that would apply only to the cover. Stemming from seq 0 clone.:
|-
| Feature
| cover_feature
| Not Used
|
|-
| Char_App
| cover_characters
| Not Used
|
|-
| Script
| cover_script
| Not Used
|
|-
| Pencils
| cover_pencils
| Not Used
|
|-
| Inks
| cover_inks
| Not Used
|
|-
| Colors
| cover_colors
| Not Used
|
|
|-
| Letters
| cover_letters
| Not Used
|
|-
| Synopsis
| cover_synopsis
| Not Used
| Yes, I realize cover_synopsis does not make a whole lot of sense.  It's not used anyway.
|-
| Reprints
| cover_reprints
| Not Used
| Arguably this could go in the "whole issue" section and be used to indicates issues that reprint entire other issues instead of doing story-by-story reprint links.
|-
| Title
| cover_title
| Not Used
| Arguably this could go in the "whole issue" section.
|-
| Genre
| cover_genre
| Not Used
| Arguably this could go in the "whole issue" section.
|-
| Type
| cover_type
| Not Used
| Since the type of a cover is "cover", this also doesn't make much sense.  But it doesn't make sense as applied to an entire issue either, as far as I can tell.
|-
| Seq_No
| cover_sequence_number
| Not Used
| Makes even less sense that the rest of these. Stemming from the cloning of the stories table...
|-
|-
|}
|}
Line 554: Line 344:
=Stories=
=Stories=


The stories table has a number of unused columns that are duplicates of columns from the series and issues tables (they're known to be unused because in some cases the data does not match, and the versions shown on the Lasso site are the ones in the issue and series tables, not these). They were initially used for caching data to avoid expensive joins in the FileMaker days.  Also, sequence 0 (which represents the front cover) is treated specially, with its page count, notes, reprints and editor fields being applied to the whole issue by the current UI implementation.
The story table would more accurately be called the sequence table, and we plan to eventually rename it.  It records the contents of an issue, whether they are actual sequential art stories, text stories, covers, advertisements or any number of other sequence types.  Only sequential art stories, covers and text stories are technically required to "complete" an index, but cover-to-cover indexes are appreciated.
 
 


{| border="1"
{| border="1"
|+ '''stories''' ['''Story''' class in new site]
|+ '''gcd_story''' ['''apps.gcd.models.Story''' class in the code]
! DB Column
! Column Name
! Name in New Code
! Description
! Description
! Usage Notes
! Usage Notes
|-
|-
! colspan="4" | Core story data fields:
! colspan="3" | Core story data fields:
|-
|-
| ID
| id
| id
| DB-generated primary key.
| DB-generated primary key.
| Since there is no story page, this field does not show up in any URLs in the Lasso implementation.
|
|-
|-
| Seq_No
| sequence_number
| sequence_number
| Order of the story within the issue.
| Order of the story within the issue.
| By convention, sequence 0 is the front cover, and some of its fields are considered to apply to the whole issue. [[Sequence Number]]
| See [[Sequence Number]].  Sequence zero is no longer treated specially (although it still represents the front cover, or at least one variant of it).
|-
| title
| The title of the story, or a quote from the beginning of the script if there is no title.
| See [[Title]]
|-
| title_inferred
| True if the indexer chose the title either directly or by using the first line of the script.
| This causes the UI to display the title with [square brackets] around it.
|-
|-
| Feature
| feature
| feature
| The name of the feature, if any.
| The name of the feature, if any.  Typically as from the splash or title page, but with some exceptions.
| Often but not always the name of the primary character.  Some stories do not have a feature.
| Often but not always the name of the primary character.  Some stories do not have a feature. See [[Feature]]
|-
|-
| Pg_Cnt
| page_count
| page_count
| Number of pages.
| Number of pages.  One side counts as one page.
| [[Pages]]
| [[Pages]]
|-
|-
| Notes
| page_count_uncertain
| True if the page count is unknown or uncertain.
| This is a separate field so that page_count could be a numeric field instead of varchar.
|-
| notes
| notes
|
| Arbitrary notes from the indexer.
|
| See [[Notes]]
|-
|-
| Reprints
| reprint_notes
| reprints
| Reprint information (both reprint from and reprinted in).
| Reprint information (both reprint from and reprinted in).
|
| See [[Reprints]] for the extremely detailed formatting requirements for these notes.  This field will be replaced with database links in release 0.5 "Kin-der-kids".
|-
|-
| Type
| type_id
| type
| Foreign key to the '''data_story_type''' table.  Type of "story", or more accurately sequence.
| Type of "story", or more accurately sequence.
| Actual comic story vs text story vs ad vs letters page, etc.
| Actual comic story vs text story vs ad vs letters page, etc.
|-
|-
! colspan="4" | Fields for creator credits:
| job_number
| [[Job Number]]
|
|-
! colspan="3" | Fields for creator credits:
|-
|-
| Script
| script
| script
|
| Writer, scripter and/or (co-)plotter credits.
|
| See [[Credits]] and [[Script]].
|-
| no_script
| True if there is no writing, such as for a single panel illustration with no text (pin-up) or a simple promo that shows the next issue's cover with no significant text (Lines like "Continued next issue" do not count as significant text).
| Note that the absence of dialog or narration does not mean there is no script.  If there is a sequence of events depicted in the artwork, that constitutes a "script".
|-
|-
| Pencils
| pencils
| pencils
|
| Pencils or other primary artwork (painting, photography, etc.) or layouts / breakdowns and finishing pencils.
| See [[Credits]] and [[Pencils]].
|-
| no_pencils
| True if there is no artwork, for instance a text story with no illustrations or a plain text recap page with limited visual elements (such as a solid color background, which is more of a production choice than the sort of artwork credited in these fields.
|
|
|-
|-
| Inks
| inks
| inks
|
| Inks or other artwork that completes the primary artwork.  Painting / photography should be credited both here and in pencils an possibly colors.
| See [[Credits]] and [[Inks]].
|-
| no_inks
| True if there is no inking or similar finishing task that applies to the artwork, or there is no artwork.
|
|
|-
|-
| Colors
| colors
| colors
|
| Color added to non-colored artwork, or similar image processing.  Painting / color photography should be credited both here and in pencils and inks.
| See [[Credits]] and [[Colors]].
|-
| no_colors
| True if there is no colorist to credit, for instance because the sequence is printed in black and white, or if there is no artwork.
|
|
|-
|-
| Letters
| letters
| letters
|
| Lettering.  For text not designed/produced by a letterer, use "typeset" (for instance typewritten text stories word processor output not done in word balloons or captions).
|
| See [[Credits]] and [[Letters]]
|-
|-
| Editing
| no_letters
| editor
| True if there is no letterer (or "typeset") for the sequence.  Covers with no significant text beyond the title and standard promotional taglines should set this to true.
|
|
| Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book.  However, this field can be used if individual stories have separate editors.
|-
|-
| JobNo
| editing
| job_number
| [[Job Number]]
|
|
| Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book.  However, this field can be used if individual stories have separate editors such as certain original anthologies or collections covering diverse material.
|-
|-
! colspan="4" | Fields related to content:
! colspan="3" | Fields related to content:
|-
| Title
| title
|
|
|-
|-
| Char_App
| characters
| characters
|
| Characters appearing in the field.  Sometimes only recurring characters, but sometimes including things like "unnamed boy" or "Nazis".
|
| See [[Character Appearances]].  The formatting rules for this field are extensive, but have been changed several times and multiple sets of rules are currently in active use.
|-
|-
| Synopsis
| synopsis
| synopsis
|
| A ''brief'' synopsis of the content.  The purpose of the field is to aid in identifying the story, not to provide a plot summary to replace reading the whole thing.  Do not use text from other sites or sources as it may be assumed to be under copyright.
|
| See [[Synopsis]].
|-
|-
| Genre
| genre
| genre
|
| One or more genres, separated by semicolons.
|
| Free-form, although there is an [[Official Genres List]]. See also [[Genre]].
|-
! colspan="4" | Field related to flat file management / distribution. See series/issues:
|-
| InitDist
| initial_distribution
| Not Used
|
|-
! colspan="4" | Fields relating to or copied from the '''issues''' table:
|-
|-
| IssueID
| issue_id
| issue
| Foreign key into the '''issues''' table.
| Foreign key into the '''issues''' table.
|
|
|-
|-
| Issue
! colspan="3" | Change tracking fields:
| issue_number
| Not Used
| Earlier indexes may have data in this field, but it is ignored.
|-
|-
| Pub_Date
| reserved
| publication_date
| If true, there is an active change being made in the data editing tables.
| Not Used
| Earlier indexes may have data in this field, but it is ignored.
|-
| Key_Date
| key_date
| Not Used
| Earlier indexes may have data in this field, but it is ignored.
|-
| Price
| price
| Not Used
|
|
|-
|-
| rel_year
| created
| rel_year
| Not Used
|
|
|-
! colspan="4" | Fields relating to or copied from the '''series''' table:
|-
| SeriesID
| series
| Foreign key into the '''series''' table, duplicated from the '''issues''' table presumably to allow for direct stories <-> series joins.
| Actually, it was used to help troubleshoot and QA the initial building of a relational database way back. Also use for statistics.
|-
| Yr_Began
| year_began
| Not Used
| Earlier indexes may have data in this field, but it is ignored.
|-
| Pub_Name
| publisher_name
| Not Used
| Earlier indexes may have data in this field, but it is ignored.
|-
! colspan="4" | Change tracking fields:
|-
| created
| created
| Row creation timestamp.
|
|
|-
|-
| Modified
| modified
| modified
| Date of last row change.
|  
|
||}
|-
| ModTime
| modification_time
| Time of last row change (24 hour).
|
|-
| LstChang
| last_change
| Date of some sort, but apparently unused?
| Column has numerous values, but they do not line up with the values in the Modified column. Data got butchered through various ins and outs of flat files and between systems changing the date format.
|-
|}

Revision as of 01:44, 21 December 2009

Introduction

This wiki page attempts to documents the current schema for the GCD. As of October 2009, the schema is being actively maintained and enhanced, and as of December 2009 all efforts are being made to keep this page up to date.

Public and Private Tables

The GCD schema consists of public and private tables. The public tables contain the visible data records (except for cover information) and are available to anyone in the public dumps. No account or membership of any sort is required to access the data dumps. The GCD requires that any use of our data from the dumps be credited to us. On a web site, this should be done on each page where the data is used, and a link should be provided to our site. Commercial use is welcome as long as credit and (on the web) links are provided. For questions about using the dump please contact the Board.

The private tables consist of three types of tables:

  • account data (private for the obvious reasons)
  • data editing tables (private to avoid sending out unapproved data- we'd like to make the change history public at some point if folks want it)
  • the covers table (private for legal reasons involving the distributions of cover scans and copyright law. Please contact the Board if you have concerns about this policy. Please do not contact the tech team as this is not a technical policy and complaining to them only wastes both your time and theirs)

A Bit of History

The original organization of the GCD data was into two files--or, more accurately, into a Series file (later two and then four Series files) and several dozen detail files. The two types of files were normalized by Lionel English in the mid-1990s in order to put them in to a real database, and then the two normalized files were further normalized in order to create the Publishers table from the Series table, and an Issues table from the Details table. During the server migration and site rewrite the took place during August-December 2009, Henry Andrews, Jochen Garcke and the rest of the tech team changed the database to a system supporting true foreign keys and transactions, and imposed a consistent naming scheme on the tables and columns. Several new public tables, as well as an entirely new set of data editing tables, were added at this time. The documentation for the pre-2009-migration schema is [PreOctober2009_Schema still available] for those curious about it.

Future Plans

We're progressing towards our real "1.0" release of this generation of the site, code-named "New Fun". Much discussion on the lists references the "New Fun" schema which supports a much more complex and realistic view of the data. Please join the [Email_Lists_FAQ gcd-tech group] if you would like to participate in that discussion. For the curious, the December 2009 release was 0.2 "Vieux Bois", for

Supporting Tables

There are several supporting tables necessary to use the data which lay out fixed sets of options for certain fields. They are:

  • gcd_country
    • Has columns id', code and name, and provides the set of countries used by the various publisher and series tables. The code column is the ISO 3166 code for the country, and should arguably be used as the foreign key, but due to the very rushed development timeline of the December 2009 release (as the prior server was no longer viable), we went with the standard numeric id column for reasons which seemed like a good idea at the time, but involved a great deal of sleep deprivation :-) We'll probably revisit this eventually.
  • gcd_language
    • Has columns id, code and name, and provides the set of languages used by the series table. The code column is the ISO 639 code for the language. The two-letter ISO 639-1 code is used if available, three letter code from 639-2 or 639-3 otherwise. This might get standardized into 3-letter codes or 4-letter codes if we need to take ISO 639-6 into account once its code list is published. This table has the same foreign key oddness about code vs id as the gcd_country table, although the possibility of changing the codes makes it more sensible to stick with id for now.
  • gcd_story_type
    • Has columns id and name, and provides the set of types used by the story table. The type field is somewhat controversial in that its values actually cover various different attributes of story sequences. However, until late 2009 there was no way for the GCD to change the field (and it was in fact a plain text field within the story table) so discussions on changing it were set aside. This will be one of many things revisited along the path to the 1.0 New Fun release.

Publishers

The publisher table currently holds two sorts of records: "master publishers" and "imprints". Both of these concepts are less precise than one might expect.

"Master publishers" are the common names of publishers as typically grouped by comic book researchers. As this vague definition suggests, there is considerable room for disagreement over what constitutes a master publisher and how such entities should be named. This field is often referred to as the "argue 'till we're blue in the face" field. Part of the current definition of a series in the GCD is that all issues within the series are published by the same master publisher. This has led to the grouping some companies together under a single master publisher because one took over several series from the other. Debate on the proper arrangement and definition of master publishers is ongoing. A master publisher is a record that has the is_master field set to 1. The name of the publisher is chosen primarily based on what is most likely to be expected by a user running a search.

As of late 2009, "imprints" are considered a deprecated concept and are being replaced by indicia publishers and brands (covered in their own tables below). The concept of an imprint may return in a more well-defined form in the future for formal imprints of publishing houses. The data records in this table that are considered "imprints" serve as a catch-all for several different publishing concepts, often requiring long semi-colon separated lists as the field is set per-series but the data often changes per-issue. An imprint is a record that has a non-NULL parent_id field.

Note that there are a very small number of records that are both master publishers and imprints. Explaining those records is beyond the scope of this documentation.

gcd_publisher [apps.gcd.models.Publisher class in the code]
Column Name Description Usage Notes
Core publisher data fields:
id DB-generated primary key.
name Name of the master publishing "family", imprint, or set of imprints See the discussion above for how this name is chosen
year_began Year (by cover date) of first publication.
year_ended Year (by cover date) of last publication, if any.
country_id The country from which the publisher operates. Foreign key into the gcd_country table.
notes Space for arbitrary notes.
url The URL for the publisher's website.
Relations to other tables and counts of related objects:
imprint_count Count of how many imprints a given master publisher has. Recalculated by the code on data updates.
series_count Count of how many series this publisher publishes. Recalculated by the code on data updates.
issue_count Total number of issues in all series published by the publisher. Recalculated by the code on data updates.
is_master Boolean field (1 or 0 in MySQL). See discussion of master publishers above.
parent_id Self-join key for parent/child relationships between master publishers and imprints. NULL for non-imprints.
Change tracking fields:
reserved If true, there is an active change being made in the data editing tables.
created
modified

Brands

Indicia Publishers

Series

A series in the GCD is officially defined by the indicia title and publisher- when either of these changes a new series is created. Minor changes in the indicia title (such as the presence or absence of an article, i.e. "The") may be allowed within one series, especially if the article comes and goes. Some exceptions exist, such as Dell's Four Color (2nd series) which is commonly grouped in guides but actually uses a different indicia title for nearly every issue.

A series may be split due to a particularly long lapse in its publication schedule (on the order of years) or due to certain kinds of renumbering, but the exact rules in this area are still being debated.

gcd_series [apps.gcd.models.Series class in the code]
Column Name Description Usage Notes
Core series data fields:
id DB-generated primary key.
name Series title, as it appears in the indicia (if any, as on the cover if there is no indicia), with any leading article moved to the end after a comma. The article-moving rule applies only if the title is in the language in which the series is written.
format Physical format of the series (size, color presence or absence, paper stock, etc.). Since format is really an attribute of the issue rather than the whole series, this field may contain a fairly long description of how the format changed over the life of the series. In the "New Fun" schema, this is being replace by a number of distinct fields, which individually describe the constituent attributes commonly listed here.
year_began First year (by cover date) of publication.
year_ended Last year (by cover date) of publication, if any.
is_current Set to true if new issues are still being produced for this series. This is a new field and may not be properly set for all series yet.
first_issue_id Foreign key to the first issue of this series in the gcd_issue table. Updated by the code when issues are added, deleted or moved. This is a new field and is not always properly set, nor is it yet used in the UI.
last_issue_id Foreign key to the last issue of this series in the gcd_issue table. Updated by the code when issues are added, deleted or moved. This is a new field and is not always properly set, nor is it yet used in the UI.
publication_dates First and last full cover publication dates of the series, separated by a '-'. No last date if still ongoing. This field is being replaced by the first_issue_id and last_issue_id fields. It is not currently possible to edit it from the OI, so errors are not being actively corrected. Once all data is migrated to the new field, this one will be dropped.
tracking_notes Notes about how a given series is continued in or continued from another series. See the [Formatting Documentation] for how these notes should be structured. This will eventually be replaced with foreign key fields.
publication_note Formerly the place where arbitrary publication details could be recorded. Now considered deprecated. This predates the imprint field, which predates the issue-level indicia publisher and brand fields. Notes here that are not migrated to those fields will most likely be merged into the general notes field and this field will be eventually dropped.
notes Space for arbitrary notes.
publisher_id Foreign key into the gcd_publisher table. Should only ever reference a master publisher record (enforced at the code level).
imprint_id Foreign key into the publishers table for the depreated "imprint" concept. This is being set to NULL once its information has been migrated to the brand_id and indicia_publisher_id on the issues for the series.
country_id Foreign key into the gcd_country table. This is properly an attribute of the indicia publisher (linked to the issue table) rather than the series, and will eventually be dropped once that field is sufficiently populated.
language_id Foreign key into the gcd_language table. The language in which the comic is written. For comics that are purely visual, there is an ISO code for "no linguistic content".
issue_count Number of issues associated with the series. Recalculated by the code when the data is updated.
has_gallery If true, the series has at least one cover image. Updated by the code as covers are added/changed/removed.
Change tracking fields:
open_reserve Obsolete field that we forgot to drop during schema migration. This will be dropped next time we change the schema, and should be ignored.
reserved If true, there is an active change being made in the data editing tables.
created
modified

Issues

Issues are what you would expect. They include both periodical issue (pamphlets) and bound volumes / collections, currently with no distinctions between the two. The "New Fun" schema will add classifications so that we can treat these items differently. As it is books are sometimes awkwardly wedged into fields designed for periodicals, but it mostly works.

One notable exception to the sensibility of this table is that issue variants do *not* each get separate records at this time. A future schema (possibly "New Fun" but possibly a release some time after that) will address this inconsistency. For now, variant covers are recorded by adding multiple cover sequences to the story table for the issue. There are some exceptions to this arrangement when more than just the cover varies.

gcd_issue [apps.gcd.models.Issue class in the code]
Column Name Description Usage Notes
Core issue data fields:
id DB-generated primary key.
number Issue number or other identifier from the indicia (if there is one) or cover (if there is no indicia) or in some cases both (if there are both and they disagree). See Issue Numbers for numerous special cases of missing / dual / annual / per-volume or other sorts of "numbers". Note that this is a text field rather than a numeric field because many issue "numbers" are partially or entirely non-numeric, i.e. "Summer Special".
volume Volume number from the indicia (if there is one) or cover (if there is no indicia). See Volume While this field has been present for a long time, it has only recently been displayed on the site. Therefore the data is not present as often as for other fields.
display_volume_with_number If true, combine the volume number with the issue number in most places that the issue number would be displayed. This produces numbers like "v2#1" (although in many indexes that notation is still used directly in the issue number field- this will eventually be corrected). See Volume and Issue Numbers.
no_volume If true, there is no volume number for the issue. This is used to confirm the meaning of an empty volume field since that field was ignored for many years and therefore is often empty when it should not be. See Volume and Issue Numbers.
sort_code Numeric field that determines how issues sort within each series. There is a uniqueness constraint on the combination of this column and series_id. This replaces the old cover sort code and is now the sole source of ordering for issues within a series. key_date (which is not always known) is now only used for chronological search results.
key_date Specially formatted date string that (in theory) determines the ordering of issues in search results that are organized by date. In practice, many issues are missing this field, meaning that they will just sort to the beginning of any chronological list. This is a data problem, not a coding error. For reliably sorting issues within a series, see sort_code. Also see Keydate
publication_date Publication date of the issue as given in the indicia, but fully spelled out. See Published Date
indicia_frequency If the issue has an indicia and it specified a publication frequency (monthly, quarterly, one-shot, etc.) that is recorded here. This is a text field rather than a foreign key into a set of options because of frequencies that are totally non-standard (common in alternative/underground publications) or irregular in such ways as "monthly except in September and December". This is a new field so it is sparsely filled out at this time.
price A decimal price followed by the ISO code, i.e. 0.10 USD See Cover Price for details, including multiple prices and non-decimal currencies.
page_count Overall page count for the issue, counting covers but not inserts or dust jackets. A single sheet with one fold (the smallest possible "comic book") would count as 4 pages. See Page Count.
page_count_uncertain If true, the page count is unknown (if page_count is NULL) or could not be reliably determined due to second-hand information or missing pages. This field allows the page_count field to be of numeric type. Previously it was text so that a "?" could be appended for this purpose.
editing Editor for the whole issue, or for the overall anthology if there are different editors for each story in a collection. Also used for other issue-level credits such as assistant/associate editor, publisher, production, special thanks, etc. See Credits and Editing for how credits are structured.
no_editing If there is no editor (or similar credit) at the issue level, this field should be set to true. Distinguishes from an unknown editor or from the field simply not having been filled out.
notes Space for arbitrary notes about the entire issue. Because the issue-level notes were originally (by convention) placed in the notes field of the first cover sequence (in the story table), the only reasonable option was to duplicate those notes into the issue table during migration. We are working through these duplicates, but for now it is very common to see identical notes in the issue record and in the sequence number 0 story record.
Fields relating to the other tables:
indicia_publisher_id Foreign key into the gcd_indicia_publisher table (nullable). Indicates the actual company that published the issue as shown in the indicia or other formal publication data. This is a new field and therefore is sparsely filled out at this time.
brand_id Foreign key into the gcd_brand table (nullable). Indicates the publisher's logo / name/ tagline on the exterior (front cover, back cover or spine) of the issue. There is still a fair amount of debate about what constitutes a separate brand (how much change in the logo appearance and/or text). This is a new field and therefore it is sparsely filled out at this time.
series_id Foreign key into the gcd_series table.
story_type_count Number of stories of type "story" linked to this issue. Used to determine whether an issue is "indexed" or not. Currently, an issue is indexed if there is at least one entry in the story table of type "story" (to prevent the existence of a story of type "cover" or other type from making the issue look complete). In the future we may also implement tracking of the expected number of stories to make this measurement more precise. Calculated whenever an issue's stories are updated.
Change tracking fields:
index_status Obsolete field kept to help debug potential migration problems (so far, so good...). This field will be dropped the next time we update the schema, and should be ignored.
reserve_status Obsolete field kept to help debug potential migration problems (so far, so good...). This field will be dropped the next time we update the schema, and should be ignored.
reserve_check Obsolete field kept to help debug potential migration problems (so far, so good...). This field will be dropped the next time we update the schema, and should be ignored.
reserved If true, there is an active change being made in the data editing tables.
created
modified

Stories

The story table would more accurately be called the sequence table, and we plan to eventually rename it. It records the contents of an issue, whether they are actual sequential art stories, text stories, covers, advertisements or any number of other sequence types. Only sequential art stories, covers and text stories are technically required to "complete" an index, but cover-to-cover indexes are appreciated.

gcd_story [apps.gcd.models.Story class in the code]
Column Name Description Usage Notes
Core story data fields:
id DB-generated primary key.
sequence_number Order of the story within the issue. See Sequence Number. Sequence zero is no longer treated specially (although it still represents the front cover, or at least one variant of it).
title The title of the story, or a quote from the beginning of the script if there is no title. See Title
title_inferred True if the indexer chose the title either directly or by using the first line of the script. This causes the UI to display the title with [square brackets] around it.
feature The name of the feature, if any. Typically as from the splash or title page, but with some exceptions. Often but not always the name of the primary character. Some stories do not have a feature. See Feature
page_count Number of pages. One side counts as one page. Pages
page_count_uncertain True if the page count is unknown or uncertain. This is a separate field so that page_count could be a numeric field instead of varchar.
notes Arbitrary notes from the indexer. See Notes
reprint_notes Reprint information (both reprint from and reprinted in). See Reprints for the extremely detailed formatting requirements for these notes. This field will be replaced with database links in release 0.5 "Kin-der-kids".
type_id Foreign key to the data_story_type table. Type of "story", or more accurately sequence. Actual comic story vs text story vs ad vs letters page, etc.
job_number Job Number
Fields for creator credits:
script Writer, scripter and/or (co-)plotter credits. See Credits and Script.
no_script True if there is no writing, such as for a single panel illustration with no text (pin-up) or a simple promo that shows the next issue's cover with no significant text (Lines like "Continued next issue" do not count as significant text). Note that the absence of dialog or narration does not mean there is no script. If there is a sequence of events depicted in the artwork, that constitutes a "script".
pencils Pencils or other primary artwork (painting, photography, etc.) or layouts / breakdowns and finishing pencils. See Credits and Pencils.
no_pencils True if there is no artwork, for instance a text story with no illustrations or a plain text recap page with limited visual elements (such as a solid color background, which is more of a production choice than the sort of artwork credited in these fields.
inks Inks or other artwork that completes the primary artwork. Painting / photography should be credited both here and in pencils an possibly colors. See Credits and Inks.
no_inks True if there is no inking or similar finishing task that applies to the artwork, or there is no artwork.
colors Color added to non-colored artwork, or similar image processing. Painting / color photography should be credited both here and in pencils and inks. See Credits and Colors.
no_colors True if there is no colorist to credit, for instance because the sequence is printed in black and white, or if there is no artwork.
letters Lettering. For text not designed/produced by a letterer, use "typeset" (for instance typewritten text stories word processor output not done in word balloons or captions). See Credits and Letters
no_letters True if there is no letterer (or "typeset") for the sequence. Covers with no significant text beyond the title and standard promotional taglines should set this to true.
editing Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book. However, this field can be used if individual stories have separate editors such as certain original anthologies or collections covering diverse material.
Fields related to content:
characters Characters appearing in the field. Sometimes only recurring characters, but sometimes including things like "unnamed boy" or "Nazis". See Character Appearances. The formatting rules for this field are extensive, but have been changed several times and multiple sets of rules are currently in active use.
synopsis A brief synopsis of the content. The purpose of the field is to aid in identifying the story, not to provide a plot summary to replace reading the whole thing. Do not use text from other sites or sources as it may be assumed to be under copyright. See Synopsis.
genre One or more genres, separated by semicolons. Free-form, although there is an Official Genres List. See also Genre.
issue_id Foreign key into the issues table.
Change tracking fields:
reserved If true, there is an active change being made in the data editing tables.
created
modified }