Skip to main content

PROJECT SUMMARY

Each year regulatory agencies issue more than 4,000 new rules. Many of these must be created through a complex process known as notice and comment (N&C) rulemaking: The agency drafts a proposed rule and then exposes the proposal, underlying data and studies, and its legal and policy rationale to public comment. N&C rulemaking is one of the most important methods of contemporary public policy making; it is also one of the slowest and most expensive. Although an agency may receive hundreds of thousands of comments for a proposed rule, its legal obligation is to review and respond to all significant comments. Furthermore, Congress and the President have imposed an increasing number of mandates on rulemaking over the last 25 years. As these requirements to consult, study, and/or certify have proliferated, rule writers have found it increasingly difficult to keep track of them and to recognize which, if any, are relevant in a particular rulemaking.

Electronic rulemaking (eRulemaking) has the potential to radically transform the N&C process. It could make the process more transparent and accessible to the public, and more substantively reliable and cost-effective for the agency. So far, though, E-docket systems and eRulemaking "workbenches" make only rudimentary use of available technology.

The Cornell team proposes to use well-developed and emerging methods of natural language processing (NLP) to develop tools to aid agency rule writers in: (1) organizing, analyzing, and managing the comments, studies, and other supporting documents associated with a proposed rule; and (2) analyzing proposed rules to flag possibly relevant legal mandates from among the large number of statutes and Executive Orders that potentially require analyses, consultations, or certifications during rulemaking. The Departments of Transportation and Commerce, with whom the team will continue to collaborate, have identified both tasks as high priorities. The team will focus, in particular, on the use of information extraction, text categorization, and opinion-oriented text analysis techniques in both supervised and weakly supervised machine learning frameworks.

Evaluation will involve: the use of accepted technical measures of NLP performance (e.g., recall and precision); a combination of qualitative and quantitative social science methods to assess integration of the tools into the rulewriting process as perceived by staff at various levels of the agency hierarchy; and observation by legally-trained researchers with expert understanding of the rulemaking process.

Intellectual Merit

The research will help realize the positive potential of eRulemaking, advance the state-of-the-art in NLP, and improve our understanding of the effects of technology on rulemaking. Because of its interdisciplinary composition — combining expertise in NLP, expert knowledge about regulatory law and legal information systems, and social science experience in the effect of technology on organizations — the Cornell team is uniquely situated to generate both qualitative and quantitative data about the crucial, but still largely under-studied, rulemaking process.

Broader Impacts

The project provides a unique opportunity for interdisciplinary education and research for PhD, master's, and undergraduate students in Cornell's Information Science Program. All data sets and tools will be made available to other researchers. The NLP methods to be developed are general-purpose techniques, trainable for any domain or genre, and useful in any context that requires managing, organizing, and analyzing large volumes of text. Finally, many of the same techniques that help agency rule writers can be used to design agency websites that help the public search, sort, and otherwise selectively access materials in the rulemaking process; hence, the project may ultimately contribute to the E-Government Act of 2002 goal of making rulemaking "more transparent and accountable" and more "citizencentric."

This project is supported by the National Science Foundation under Grant No. IIS-0535099. Any opinions, findings, conclusions or recommendations are those of the researchers and do not necessarily reflect the views of the National Science Foundation.