Chapter 6 Data Sources

6.1 Internal Sources

The following data sources have been consolidated into our internal database, ojodb.

The data is collected via a web scraper maintained by Asemio and hosted on the Google Cloud Platform. It is scraped from various web and pdf sources, detailed below. For further information on the data collection specification, refer to the documentation by Asemio. For more information on maintenance procedures, see here.

6.1.1 OSCN: Oklahoma District Court Records

OSCN sourced data can be found in the "public" schema of the database. For example, you can list all tables in the schema with ojo_list_tables(schema = "public"). The Oklahoma State Courts Network (OSCN) holds information on all types of criminal and civil cases filed in District Courts across Oklahoma. The format and content of court records available on OSCN differ according to the records management system used by the county.

We divide counties into two categories: OSCN counties and ODCR counties. For 13 counties, including the 6 largest by population, the information available is extensive and structured consistently. The relative ease of using data collected from OSCN allows us to perform more reliable and granular analysis. We often refer to these 13 counties as OSCN counties.

We refer to the other 64 counties as ODCR counties. The records for these counties contain less detail, so getting data on bonds, dispositions, and other important aspects of a case takes more guesswork and gives more uncertain results. Note also that two ODCR counties have more than one court: Creek, which has courts in Bristow, Drumright, and Sapulpa, and Okmulgee, which has courts in Okmulgee and Henryetta.

Table 6.1: Data Source by Court District
district source
ADAIR OSCN
ALFALFA ODCR
ATOKA ODCR
BEAVER ODCR
BECKHAM ODCR
BLAINE ODCR
BRISTOW ODCR
BRYAN ODCR
CADDO ODCR
CANADIAN OSCN
CARTER ODCR
CHEROKEE ODCR
CHOCTAW ODCR
CIMARRON ODCR
CLEVELAND OSCN
COAL ODCR
COMANCHE OSCN
COTTON ODCR
CRAIG ODCR
CREEK ODCR
CUSTER ODCR
DELAWARE ODCR
DEWEY ODCR
DRUMWRIGHT ODCR
ELLIS OSCN
GARFIELD OSCN
GARVIN ODCR
GRADY ODCR
GRANT ODCR
GREER ODCR
HARMON ODCR
HARPER ODCR
HASKELL ODCR
HENRYETTA ODCR
HUGHES ODCR
JACKSON ODCR
JEFFERSON ODCR
JOHNSTON ODCR
KAY ODCR
KINGFISHER ODCR
KIOWA ODCR
LATIMER ODCR
LEFLORE ODCR
LINCOLN ODCR
LOGAN OSCN
LOVE ODCR
MAJOR ODCR
MARSHALL ODCR
MAYES ODCR
MCCLAIN ODCR
MCCURTAIN ODCR
MCINTOSH ODCR
MURRAY ODCR
MUSKOGEE ODCR
NOBLE ODCR
NOWATA ODCR
OKFUSKEE ODCR
OKLAHOMA OSCN
OKMULGEE ODCR
OSAGE ODCR
OTTAWA ODCR
PAWNEE ODCR
PAYNE OSCN
PITTSBURG ODCR
PONTOTOC ODCR
POTTAWATOMIE ODCR
PUSHMATAHA OSCN
ROGERMILLS OSCN
ROGERS OSCN
SEMINOLE ODCR
SEQUOYAH ODCR
STEPHENS ODCR
TEXAS ODCR
TILLMAN ODCR
TULSA OSCN
WAGONER ODCR
WASHINGTON ODCR
WASHITA ODCR
WOODS ODCR
WOODWARD ODCR

6.1.1.1 Uses

OSCN data is the most common resource for OJO projects about the justice system. There is an abundance of information on criminal and civil cases, including parties, criminal charges and civil case issues, case resolutions (called dispositions), court appearances, fines and fees, etc. Some questions we might answer using OSCN records include:

  • What are the most common charges in criminal cases?
  • How much in fines and fees are levied against people in criminal cases? How much is collected?
  • What percentage of criminal felony cases are dismissed?
  • How many evictions are filed and granted each year?

There are millions of cases stored in our database, so the possibilities for research are infinite, but it takes an enormous amount of effort and lots of subject matter expertise to ensure that we are drawing valid conclusions. The [Research Methodology] section contains details on how we do that.

6.1.1.2 Tables

6.1.1.2.1 case

The case table contains basic information about the case as well as the IDs of data associated with the case that is stored in other tables.

Table Variable Description
case id Case ID number
case title Title of the case (e.g., State of Oklahoma v. Roman Roy)
case district Name of district court
case case_type Abbreviation of case type (e.g., “CF” for felony)
case year Year of case filing
case case_number Case number assigned by court, consisting of case type abbreviation, year of filing, and number in filing sequence (e.g. CF-2021-1234)
case date_filed Date of case filing
case date_closed Date of case close (not always updated at the end of a case)
case status Current status of case
case judge Judge assigned to case. In many cases, this is not an individual’s name but the docket the case appears on.
case appealed_from This field contains no data as of 2021-11-30
case attorneys Nested list of attorney IDs associated with parties to the case
case parties Nested list of party IDs associated with the case
case events Nested list of event IDs associated with the case
case citation_information Nested list of citation IDs associated with the case
case minutes Nested list of minute IDs associated with the case
case counts For criminal cases, nested list of count IDs associated with the case (OSCN counties)
case issues For civil cases, nested list of IDs associated with the case
case created_at Date and time that case ifnformation was first collected
case updated_at Date and time that case information was last collected
case open_counts Nested list of charge descriptions associated with the case (ODCR counties only)
6.1.1.2.2 case_type

The case_type table contains a lookup table matching case_type abbreviations (as found in the case table), with a corresponding text label, e.g. "CF" = "Criminal Felony".

6.1.1.2.3 party

The party table contains the name and role of persons or organizations involved in a case, e.g. Tulsa Public Schools, Defendant. There are normally multiple rows per case. If the party of interest is an individual, there may be an associated ID in the person_record column. Use this ID to link party records to information in the person_record table.

6.1.1.2.4 count

The count table contains details on the reasons for a criminal case, i.e. the count(s) brought against a defendant. See the issue table to obtain the reason(s) for civil case, i.e. issues. There can be multiple rows per defendant per case. Not all counts remain the same throughout the criminal case. Some may be dropped or modified by the time the case is disposed. Compare the column count_as_filed to the count_as_disposedcolumn. Use the disposition and disposition_date columns to determine the outcome of a count, and when it was disposed.

6.1.1.2.5 issue

The issue table contains details on the reasons for a civil case. Unlike in the count table, issues are contained in only the description column and do not change throughout the case. Use the disposition and disposition_date columns to determine the outcome of an issue, and when it was disposed.

6.1.1.2.6 minute

Each case has a number of minutes associated with it. The minute table stores the code, description, and other information for each record. If the minute has an associated fine or fee, you can use the amount column to determine the cost.

6.1.2 ODOC: State Prison Records

ODOC sourced data can be found in the "odoc" schema of the database. The Oklahoma Department of Corrections records data on each person who enters its system, including information on their offenses and sentence.

6.1.2.1 Uses

Some questions we might answer using ODOC records include: - How many people are being held in prisons around the state? - How many people in the prison system were sentenced for violent crimes? - What is the average sentence length for violent vs. non-violent sentences?

Linking individuals from ODOC records to those sourced from OSCN, we might ask: - Of the people released from DOC custody last year, how many have since been charged with a criminal count in Oklahoma?

6.1.2.2 Tables

6.1.3 OCDC: Oklahoma County Jail Records

The Oklahoma County Detention Center provides us access to their internal data via JailTracker, a jail management platform.

6.1.3.1 Uses

6.1.3.2 Tables

6.1.4 IIC: Tulsa County Jail Records

The source data can be viewed here.

6.1.4.1 Uses

6.1.4.2 Tables

6.2 External Sources

6.2.1 Census

In most cases, census data is best obtained using the {tidycensus} package. You will need to obtain an API key from the Census Bureau website to utilize the package. See the {tidycensus} documentation for complete instructions.

6.2.1.1 Population Estimates

When obtaining county level population estimates that can be compared year-to-year DO NOT use the Decennial or ACS census releases. Rather, DO use the Population Estimates provided by tidycensus::get_estimates(product = "population").

6.2.1.2 Decennial

6.2.1.3 ACS

6.2.2 Geography

In the {tidycensus} package, you can specify geometry = TRUE to return shapefiles used for geographic plots. In ggplot, use the geom_sf() to create such maps.