Current Schema: Difference between revisions
(Finished series table (first round anyway).) |
(Added section heading rows. Also checkpointing work on issues table.) |
||
Line 11: | Line 11: | ||
! Description | ! Description | ||
! Usage Notes | ! Usage Notes | ||
|- | |||
! colspan="4" | Core publisher data fields: | |||
|- | |- | ||
| ID | | ID | ||
Line 51: | Line 53: | ||
| A single character. | | A single character. | ||
| Appears to be used to sort related publishers near each other even if the more obvious alphabetical sort would not do so. It's a touch unclear. | | Appears to be used to sort related publishers near each other even if the more obvious alphabetical sort would not do so. It's a touch unclear. | ||
|- | |||
! colspan="4" | Publisher / imprint relations and publication counts: | |||
|- | |- | ||
| ImprintCount | | ImprintCount | ||
Line 86: | Line 90: | ||
| Self-join key, purpose unknown. | | Self-join key, purpose unknown. | ||
| Does not appear to be in use. | | Does not appear to be in use. | ||
|- | |||
! colspan="4" | Change tracking fields: | |||
|- | |- | ||
| created | | created | ||
Line 112: | Line 118: | ||
! Description | ! Description | ||
! Usage Notes | ! Usage Notes | ||
|- | |||
! colspan="4" | Core series data fields: | |||
|- | |- | ||
| ID | | ID | ||
Line 153: | Line 161: | ||
| Does not appear to be used by the Lasso implementation. | | Does not appear to be used by the Lasso implementation. | ||
|- | |- | ||
| | | Frst_Iss | ||
| | | first_issue | ||
| | | Issue # (or other identifier) of the first issue in the series. | ||
| | | Note that it's not currently reliably possible to determine this from the issue table. | ||
|- | |||
| Last_Iss | |||
| last_issue | |||
| Issue # (or other identifier) of the last issue in the series, if any. | |||
| Note that it's not currently reliably possible to determine this from the issue table. | |||
|- | |- | ||
| Issuecount | | Issuecount | ||
Line 162: | Line 175: | ||
| Number of issues associated with the series. | | Number of issues associated with the series. | ||
| Does not appear to be reliably accurate. | | Does not appear to be reliably accurate. | ||
|- | |||
| PubID | |||
| publisher | |||
| Foreign key into the '''publishers''' table. | |||
| | |||
|- | |||
| Pub_Name | |||
| publisher_name | |||
| Copied from publishers table, presumably to save a join. | |||
| Reliability of this field has not been verified. | |||
|- | |||
| CounCode | |||
| country_code | |||
| Inexplicably '''not''' a foreign key into the Country table. Instead the country code is duplicated here, and does not necessarily match properly. | |||
| | |||
|- | |||
| LangCode | |||
| language_code | |||
| Inexplicably '''not''' a foreign key into the Languages table. Instead the language code is duplicated here, and does not necessarily match properly. | |||
| | |||
|- | |||
! colspan="4" | Fields relating series to each other (continuations, etc.): | |||
|- | |- | ||
| Tracking | | Tracking | ||
Line 177: | Line 212: | ||
| Apparently a self-join key for the cross-referenced series. | | Apparently a self-join key for the cross-referenced series. | ||
| Does not appear to be used by the Lasso implementation. Only one record where this is not NULL or 0. | | Does not appear to be used by the Lasso implementation. Only one record where this is not NULL or 0. | ||
|- | |||
! colspan="4" | Field related to the (cover?) gallery: | |||
|- | |- | ||
| HasGallery | | HasGallery | ||
Line 186: | Line 223: | ||
| gallery_includes | | gallery_includes | ||
| varchar field, seems to be comma-separated list (with hyphens for ranges) of issues included in something? I speculate the gallery, but it's not clear how this is used, if at all, by the current implementation. It is '''not''' the string that shows up in the "Issues published:" space on the series page. | | varchar field, seems to be comma-separated list (with hyphens for ranges) of issues included in something? I speculate the gallery, but it's not clear how this is used, if at all, by the current implementation. It is '''not''' the string that shows up in the "Issues published:" space on the series page. | ||
|- | |||
! colspan="4" | Fields related to indexing: | |||
|- | |- | ||
| Indexers | | Indexers | ||
Line 197: | Line 236: | ||
| Presumably something to do with online indexing, but since reservations are per-issue, it's not clear quite what. | | Presumably something to do with online indexing, but since reservations are per-issue, it's not clear quite what. | ||
|- | |- | ||
| | ! colspan="4" | Fields apparently related to flat file management / distribution. I think. | ||
|- | |- | ||
| File | | File | ||
Line 232: | Line 253: | ||
| | | | ||
|- | |- | ||
| | ! colspan="4" | Change tracking fields: | ||
|- | |- | ||
| Created | | Created | ||
Line 266: | Line 279: | ||
| Not clear. Presumably the ID from a prior organization of the data? | | Not clear. Presumably the ID from a prior organization of the data? | ||
| Does not appear to be in active use currently. | | Does not appear to be in active use currently. | ||
|- | |||
! colspan="4" | Miscellaneous unused and/or mysterious fields: | |||
|- | |||
| Themes | |||
| themes | |||
| | |||
| Does not appear to be used (always NULL). | |||
|- | |- | ||
| SelfCount | | SelfCount | ||
| self_count | | self_count | ||
| Absolutely no clue. | | Absolutely no clue. | ||
| | |||
|- | |||
|} | |||
{| border="1" | |||
|+ '''issues''' ['''Issue''' class in new site] | |||
! DB Column | |||
! Name in New Code | |||
! Description | |||
! Usage Notes | |||
|- | |||
! colspan="4" | Core issue data fields: | |||
|- | |||
| ID | |||
| id | |||
| DB-generated primary key. | |||
| In the lasso-based site, this is the number you see in URLs like http://www.comics.org/details.lasso?id=293 It is generated by the database and has no inherent meaning. | |||
|- | |||
| VolumeNum | |||
| volume | |||
| Volume number from the indicia. | |||
| Not consistently used, and according to recent editor list comments, not planned for migration into the new schema. Where especially relevant, volume numbers are typically included in the Issue (number) field. | |||
|- | |||
| Issue | |||
| number | |||
| Issue number or other identifier from the indicia. | |||
| Use the value ''nn'' to indicate "no number" when there is no identifier. For issues where the volume number is important, the form ''v1#1'' is used to specify both the volume and issue numbers. When the cover number or commonly assumed number is different from the indicia number, it goes in brackets after the indicia number: ''v3#2 [26]''. The same format is used for disambiguation of identically numbered issues: ''1 [1946]'' or ''nn [1947]''. | |||
|- | |||
| Pub_Date | |||
| publication_date | |||
| Publication date of the issue as given in the indicia, but fully spelled out. | |||
| Issues that list a publication date of ''December-January'', ''Winter'' or ''Holiday'' should specify both years involved, i.e. ''Winter 1942-1943''. | |||
|- | |||
| Price | |||
| price | |||
| ISO code followed by decimal price, i.e. ''USD 0.10'' | |||
| Most documentation of this field is out of date. ISO code followed by a decimal number is the correct format, contrary to what other pages might say, as per recent (early 2008) discussions on the editor list. | |||
|- | |||
| storycount | |||
| story_count | |||
| Number of stories linked to this issue. | |||
| Not 100% convinced this is always accurate. | |||
|- | |||
| Key_Date | |||
| key_date | |||
| Specially formatted date string that (in theory) determines the ordering of issues within a series. | |||
| In practice, many issues are missing this field, meaning that if their issue numbers are not standard, there is no reliable programmatic way to sort the issues in the series. In practice, the cover sort codes are slightly more reliable (although not always), but they are not included in the public data dumps. | |||
|- | |||
! colspan="4" | Fields that would be core issue data but aren't actually used: | |||
|- | |||
| Pg_Cnt | |||
| page_count | |||
| Not Used | |||
| Page count for the whole issue is taken from story sequence 0 (which otherwise represents the front cover). | |||
|- | |||
| Editing | |||
| editor | |||
| Not Used | |||
| Editor(s) for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). | |||
|- | |||
| Notes | |||
| notes | |||
| Not Used | |||
| Notes for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). | |||
|- | |||
! colspan="4" | Fields related to indexing: | |||
|- | |||
| IndexStatus | |||
| index_status | |||
| Whether or not an issue is being indexed, and where it is in the process. | |||
| 0=not (white); 1=reserved (red); 2=pending (orange); 3=approved (green). Colors noted are how the issues appear in various index status grids. | |||
|- | |||
| ReserveCheck | |||
| reserve_check | |||
| Not Used | |||
| The intention of this field is unknown. | |||
|- | |||
| ReserveStatus | |||
| reserve_status | |||
| 1=indexed, pending or reserved; 0=not reserved | |||
| The intended relationship between this and IndexStatus/index_status is not known. | |||
|- | |||
! colspan="4" | Fields related to cover images: | |||
|- | |||
| CoverCheck | |||
| has_cover | |||
| Not Used | |||
| | |||
|- | |||
| CoverCount | |||
| num_covers | |||
| Not Used | |||
| | |||
|- | |||
! colspan="4" | Fields related to the series, including unused cache fields: | |||
|- | |||
| SeriesID | |||
| series | |||
| Foreign key into the '''series''' table. | |||
| | |||
|- | |||
| Bk_Name | |||
| series_name | |||
| Not Used | |||
| | |||
|- | |||
| Yr_Began | |||
| year_began | |||
| Not Used | |||
| | |||
|- | |||
| Pub_Name | |||
| publisher_name | |||
| Not Used | |||
| | |||
|- | |||
! colspan="4" | Fields apparently related to flat file management / distribution. I think. | |||
|- | |||
| InitDist | |||
| initial_distribution | |||
| Not Used | |||
| | |||
|- | |||
| UpdateDist | |||
| update_distribution | |||
| Not Used | |||
| | |||
|- | |||
| isUpdated | |||
| is_updated | |||
| Not Used (? uncertain) | |||
| | | | ||
|- | |- | ||
|} | |} |
Revision as of 01:18, 29 March 2008
This wiki page attempts to document the current schema for the GCD. The public schema consists of four principle data tables as well as several supporting tables. The supporting tables are Countries, Languages (lists of codes for each of those things), Indexers (a list of indexers, but not of full account information) and IndexCredit (maps indexers to series for credit, but there are also credit notes int he series table). The four main tables are described below.
The "Name in New Code" column is for those who might be looking at the new Python/Django implementation. Since the existing table and column names are not even remotely consistent in capitalization, formatting or abbreviation, they were standardized in the Django model classes to make working with them easier (Django provides a convenient name mapping system for exactly this purpose).
Please note that much of the issue and story table information on this page is based on earlier documentation by Peter Croome.
DB Column | Name in New Code | Description | Usage Notes |
---|---|---|---|
Core publisher data fields: | |||
ID | id | DB-generated primary key. | In the lasso-based site, this is the number you see in URLs like http://www.comics.org/publisher_details.lasso?id=4066 It is generated by the database and has no inherent meaning. |
PubName | name | Name of the publisher, imprint, or set of imprints | Because imprints are not stored per-issue, in some cases a semicolon-separated list of imprints corresponding to all imprints used for a given series has been given an entry in this table. Also, not all things that we term "Publishers" or "Imprints" strictly fall into those categories, and we do not accurately reflect the evolution of company names over time (for instance, "DC" was not until fairly recently the official name of the company, although it was long an informal name. It is still not exactly correct- "DC Comics" appears to be the current legal name. But "DC" is what everyone knows, so that is what appears in the database). |
YearBegan | year_began | Year (by cover date) of first publication. | |
YearEnded | year_ended | Year (by cover date) of last publication, if any. | |
CountryID | country | The country from which the publisher operates. | Foreign key into the Countries table. |
Notes | notes | Space for arbitrary notes. | |
web | url | The URL for the publisher's website. | While this column sometimes has data, it does not appear to be used by the Lasso-based site. |
AlphaSortCode | alpha_sort_code | A single character. | Appears to be used to sort related publishers near each other even if the more obvious alphabetical sort would not do so. It's a touch unclear. |
Publisher / imprint relations and publication counts: | |||
ImprintCount | imprint_count | Count of how many imprints a given publisher has. | Does not appear to be uniformly accurate. |
BookCount | series_count | Count of how many series this publisher publishes. | Does not appear to be uniformly accurate. |
IssueCount | issue_count | Total number of issues in all series published by the publisher. | Does not appear to be uniformly accurate. |
Master | is_master | Boolean field. | Appears to indicate if something is a publisher or an imprint. Or more generally, a first-class publishing entity or one that is somehow part of a larger entity (because it was bought long ago, or is part of a group of companies commonly referred to by another name, etc). The new site mostly ignores this in favor of checking for rows with a matching parent id. |
Connection | connection | No clue. It's a varchar, though. Perhaps it's supposed to explain the nature of each publisher-to-publisher relationship? | Always null or in one case an empty string in the current data dump, so pretty clearly not used. |
ParentID | parent | Self-join key for parent/child relationships. | This is how the imprints for a publisher are stored. |
NextID | next | Self-join key, purpose unknown. | Does not appear to be in use. |
Change tracking fields: | |||
created | created | Row creation timestamp. | |
Modified | modified | One of two change timestamps. | No idea which one the Lasso site actually uses. |
Updated | updated | One of two change timestamps. | No idea which one the Lasso site actually uses. |
DB Column | Name in New Code | Description | Usage Notes |
---|---|---|---|
Core series data fields: | |||
ID | id | DB-generated primary key. | In the lasso-based site, this is the number you see in URLs like http://www.comics.org/series.lasso?seriesid=72 It is generated by the database and has no inherent meaning. |
Bk_Name | name | Series title. | |
Format | format | Physical format of the series (size, color presence or absence, paper stock, etc.). | Since format is really an attribute of the issue rather than the whole series, this field may contain a fairly long description of how the format changed over the life of the series. |
Notes | notes | Space for arbitrary notes. | |
Yr_Began | year_began | First year (by cover date) of publication. | |
Yr_Ended | year_ended | Last year (by cover date) of publication, if any. | |
PubDates | publication_dates | First and last full cover publication dates of the series, separated by a '-'. No last date if still ongoing. | |
Pub_Note | publication_note | Arbitrary publication / publisher-related notes. Some records in the current DB include addresses, imprint information and/or general reprint information here. | Does not appear to be used by the Lasso implementation. |
Frst_Iss | first_issue | Issue # (or other identifier) of the first issue in the series. | Note that it's not currently reliably possible to determine this from the issue table. |
Last_Iss | last_issue | Issue # (or other identifier) of the last issue in the series, if any. | Note that it's not currently reliably possible to determine this from the issue table. |
Issuecount | issue_count | Number of issues associated with the series. | Does not appear to be reliably accurate. |
PubID | publisher | Foreign key into the publishers table. | |
Pub_Name | publisher_name | Copied from publishers table, presumably to save a join. | Reliability of this field has not been verified. |
CounCode | country_code | Inexplicably not a foreign key into the Country table. Instead the country code is duplicated here, and does not necessarily match properly. | |
LangCode | language_code | Inexplicably not a foreign key into the Languages table. Instead the language code is duplicated here, and does not necessarily match properly. | |
Fields relating series to each other (continuations, etc.): | |||
Tracking | tracking_notes | Notes about how a given series is continued in or continued from another series. | |
Crossref | crossref | Appears to be a boolean field flagging if there is a crossreference. | Does not appear to be used by the Lasso implementation. Only one record where this is not NULL or 0. |
CrossrefID | crossref_id | Apparently a self-join key for the cross-referenced series. | Does not appear to be used by the Lasso implementation. Only one record where this is not NULL or 0. |
Field related to the (cover?) gallery: | |||
HasGallery | gallery_present | 3-character field, but used as boolean. | If true, there is a cover gallery. I think. |
Included | gallery_includes | varchar field, seems to be comma-separated list (with hyphens for ranges) of issues included in something? I speculate the gallery, but it's not clear how this is used, if at all, by the current implementation. It is not the string that shows up in the "Issues published:" space on the series page. | |
Fields related to indexing: | |||
Indexers | indexers | Semicolon-separated string of indexers. | This does not appear to be quite what gets displayed in the credits on the page in the Lasso site. The exact relationship between this and the IndexCredit table is unclear. |
OpenReserve | open_reserve | Integer field. | Presumably something to do with online indexing, but since reservations are per-issue, it's not clear quite what. |
Fields apparently related to flat file management / distribution. I think. | |||
File | file | Presumably the name of the file used for flat-file indexing. Or possibly distribution. | |
InitDist | initial_distribution | Presumably something to do with flat-file distribution of the data. | |
UpdateDist | update_distribution | Presumably something to do with flat-file distribution of the data. | |
Change tracking fields: | |||
Created | created | Row creation timestamp. | |
Modified | modified | Date of last row change. | |
ModTime | modification_time | Time of last row change (24 hour). | |
LstChang | last_change | Date of some sort, but apparently unused. | Only five non-NULL, non-zero values in the current data, all in the future (nearest is in 2010). |
oldID | old_id | Not clear. Presumably the ID from a prior organization of the data? | Does not appear to be in active use currently. |
Miscellaneous unused and/or mysterious fields: | |||
Themes | themes | Does not appear to be used (always NULL). | |
SelfCount | self_count | Absolutely no clue. |
DB Column | Name in New Code | Description | Usage Notes |
---|---|---|---|
Core issue data fields: | |||
ID | id | DB-generated primary key. | In the lasso-based site, this is the number you see in URLs like http://www.comics.org/details.lasso?id=293 It is generated by the database and has no inherent meaning. |
VolumeNum | volume | Volume number from the indicia. | Not consistently used, and according to recent editor list comments, not planned for migration into the new schema. Where especially relevant, volume numbers are typically included in the Issue (number) field. |
Issue | number | Issue number or other identifier from the indicia. | Use the value nn to indicate "no number" when there is no identifier. For issues where the volume number is important, the form v1#1 is used to specify both the volume and issue numbers. When the cover number or commonly assumed number is different from the indicia number, it goes in brackets after the indicia number: v3#2 [26]. The same format is used for disambiguation of identically numbered issues: 1 [1946] or nn [1947]. |
Pub_Date | publication_date | Publication date of the issue as given in the indicia, but fully spelled out. | Issues that list a publication date of December-January, Winter or Holiday should specify both years involved, i.e. Winter 1942-1943. |
Price | price | ISO code followed by decimal price, i.e. USD 0.10 | Most documentation of this field is out of date. ISO code followed by a decimal number is the correct format, contrary to what other pages might say, as per recent (early 2008) discussions on the editor list. |
storycount | story_count | Number of stories linked to this issue. | Not 100% convinced this is always accurate. |
Key_Date | key_date | Specially formatted date string that (in theory) determines the ordering of issues within a series. | In practice, many issues are missing this field, meaning that if their issue numbers are not standard, there is no reliable programmatic way to sort the issues in the series. In practice, the cover sort codes are slightly more reliable (although not always), but they are not included in the public data dumps. |
Fields that would be core issue data but aren't actually used: | |||
Pg_Cnt | page_count | Not Used | Page count for the whole issue is taken from story sequence 0 (which otherwise represents the front cover). |
Editing | editor | Not Used | Editor(s) for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). |
Notes | notes | Not Used | Notes for the whole issue are taken from story sequence 0 (which otherwise represents the front cover). |
Fields related to indexing: | |||
IndexStatus | index_status | Whether or not an issue is being indexed, and where it is in the process. | 0=not (white); 1=reserved (red); 2=pending (orange); 3=approved (green). Colors noted are how the issues appear in various index status grids. |
ReserveCheck | reserve_check | Not Used | The intention of this field is unknown. |
ReserveStatus | reserve_status | 1=indexed, pending or reserved; 0=not reserved | The intended relationship between this and IndexStatus/index_status is not known. |
Fields related to cover images: | |||
CoverCheck | has_cover | Not Used | |
CoverCount | num_covers | Not Used | |
Fields related to the series, including unused cache fields: | |||
SeriesID | series | Foreign key into the series table. | |
Bk_Name | series_name | Not Used | |
Yr_Began | year_began | Not Used | |
Pub_Name | publisher_name | Not Used | |
Fields apparently related to flat file management / distribution. I think. | |||
InitDist | initial_distribution | Not Used | |
UpdateDist | update_distribution | Not Used | |
isUpdated | is_updated | Not Used (? uncertain) |