PROJECT SUMMARY
Each year regulatory agencies issue more than 4,000 new rules. Many of these must be created through a complex process known as notice and comment (N&C) rulemaking: The agency drafts a proposed rule and then exposes the proposal, underlying data and studies, and its legal and policy rationale to public comment. N&C rulemaking is one of the most important methods of contemporary public policy making; it is also one of the slowest and most expensive. Although an agency may receive hundreds of thousands of comments for a proposed rule, its legal obligation is to review and respond to all significant comments. Furthermore, Congress and the President have imposed an increasing number of mandates on rulemaking over the last 25 years. As these requirements to consult, study, and/or certify have proliferated, rule writers have found it increasingly difficult to keep track of them and to recognize which, if any, are relevant in a particular rulemaking.
Electronic rulemaking (eRulemaking) has the potential to radically transform the N&C process. It could make the process more transparent and accessible to the public, and more substantively reliable and cost-effective for the agency. So far, though, E-docket systems and eRulemaking "workbenches" make only rudimentary use of available technology.
The Cornell team proposes to use well-developed and emerging methods of natural language processing (NLP) to develop tools to aid agency rule writers in: (1) organizing, analyzing, and managing the comments, studies, and other supporting documents associated with a proposed rule; and (2) analyzing proposed rules to flag possibly relevant legal mandates from among the large number of statutes and Executive Orders that potentially require analyses, consultations, or certifications during rulemaking. The Departments of Transportation and Commerce, with whom the team will continue to collaborate, have identified both tasks as high priorities. The team will focus, in particular, on the use of information extraction, text categorization, and opinion-oriented text analysis techniques in both supervised and weakly supervised machine learning frameworks.
