Workshop on Data Management for Molecular and Cell Biology

Feb. 2-3, 2003

Lister Hill Center, NLM, NIH Campus, Bethesda, MD


The aim of the workshop is to define a research agenda for data management technology in support of bioinformatics applications - specifically to support basic and applied research in molecular and cell biology, genomics, functional genomics, structural biology, biochemistry, genetics, molecular phylogeny, pharmacology, pharmacogenomics, chemoinformatics, systems biology of the cell, industrial microbiology, etc.

The rationale for this workshop is the observation that current data management systems are often not very well suited to support bioinformatics applications. For the past decade, the vast bulk of federal research and development funding for biological, genomic, genetic, and structural biological databases has gone into the tasks of database development, creation, curation, and maintenance, and have primarily employed conventional database technology rather than novel database technology targeted at bioinformatics applications. The result is that biological database developers are forced to write and maintain considerable amounts of ad hoc code, which may also be less efficient. This slows the development of biological databases, increases development and maintenance costs, and often limits the expressiveness of query facilities for end users. We believe that additional investments in the underlying database technology are required if bioinformatics data management is to be effectively supported in the 21st century. This workshop is intended to articulate such a research agenda.

We focus our attention on molecular and cell biology, broadly defined, as the central area of application for new data management technology. Specifically, we will exclude from this workshop discussions of medical informatics, anatomical databases, ecological databases, and non-molecular phyloinformatics.

Important Dates

Jan. 12, 2003 Hotel reservation deadline
Jan. 17, 2003 Whitepapers due
Feb. 2-3, 2003 Workshop
Feb. 4, 2003 Report Writing Committee


  • National Science Foundation
  • National Library of Medicine, NIH
  • Department of Energy

Organizing Committee

Suggested Topics

  • Non-standard data
    • sequence data (DNA, RNA, protein)
    • shape data (protein, protein complexes, protein-ligand complexes, carbohydrates)
    • graph data (metabolic pathways, signaling pathways, genetic control networks, physical and genetic maps, concept lattices, taxonomies, phylogenetic trees, genealogies, experimental protocols)
    • array and matrix data (microarray data)
  • Non-standard queries:
    • similarity based queries (sequence similarity, shape similarity complementary shapes)
    • pattern matching queries
    • recursive queries (over graph data)
    • graph queries (subgraph isomorphism, graph homomorphism)
    • matrix operations (multiplication, inverse)
  • Data modeling and representation
    • metadata management
    • curation (quality control, merging, annotation, distributed annotation)
    • uncertain and inconsistent data
    • incomplete data
    • schema flexibility and evolution
    • model management (statistical and mathematical models, systems biology, etc.)-model topology, model parameterization, parameter estimates, model outputs
    • Workflow management (both lab (LIMS) workflow and computational workflows)
  • Data integration
    • terminology management
    • seamless access
    • decision support

More details about suggested topics can be found in the NSF workshop proposal .

Workshop Report

The result of the workshop is to be a report, setting out a research agenda for data management in support of molecular and cell biology. The report will be drafted by report writing committee group which will stay on in Bethesda for one extra day following the main workshop. The draft report will then circulated among the participants via email for comments, revisions. See the section below on the Writing Committee for membership.

The report will be delivered to the funders and other DB and bioinformatics research agencies, posted to the web, and published. It is available here in pdf format.


Attendance is by invitation only. We are anticipating 50-60 attendees from the database research community, bioinformatics community, major DBMS vendors, pharmaceutical and biotech firms, and various bioinformatics funding agencies (NSF, NIH, DOE, DARPA). A list of confirmed participants (accepted invitations as of Dec. 17, 2002) is available here. Some invitations are still outstanding. If you have accepted an invitation and are not shown on the list of confirmed participants, contact Frank Olken to make a correction.

White Papers

All participants are asked to submit short whitepapers (preferably 2-3 pages) indentifying major research challenge(s) for data management for molecular and cell biology two weeks (Friday, Jan. 17, 2003) prior to the workshop (as HTML files). They will posted to a workshop web page for attendees to read prior to the workshop. White papers should include a title, author, affiliation, date, and contact information (email, address).


The main workshop will be held on Sunday, Feb. 2, 2003 and Monday, Feb. 3, 2003. An report writing committee will stay on for one additional day to commence writing the workshop report.


The workshop will commence at 9:00 AM on Sunday, Feb. 2, 2003 and end by 4:00 PM on Monday, Feb. 3, 2003. Nonlocal attendees should plan to fly in on Saturday, Feb. 1. Most domestic attendees should be able to make departing evening flights from National or Dulles airports on Monday, Feb. 3.


The workshop will be held on the NIH campus in Bethesda, MD. It will be held at the Lister Hill Center for Biomedical Communications in the Auditorium located on the first floor of Building 38A. The Lister Hill Center is located at the Southeast corner of the NIH campus. (see NIH campus map).


Here is the link to the preliminary agenda . We will have a mix of plenary and breakout sessions.

Breakout groups will be formed based on the suggested topics and whitepapers. Attendees will be assigned to break out groups based on their whitepapers.

Detailed Workshop Proposal

Follow this link for the text of the NSF workshop proposal . This document contains a more detailed description of the issues we hope to address.


We have concluded our hotel negotiations (Dec. 17). We have contracted with the Bethesda Hyatt Regency Hotel for a discounted block of rooms. The room rate is $99/night + tax for Fri. (Jan. 31), Sat.(Feb.1), Sun. (Feb. 2) nights for single or double occupancy, and $199/night + tax for single occupancy and $25 for each additional person (up to 4 total) on Monday (Feb. 3) and Tuesday (Feb. 4) nights. To get the group rate, you must supply the group name: "University of Michigan: Workshop on Data Management for Biology". Attendees must make their own hotel reservations. Reservations must be made by Sunday, January 12, 2003, 5 PM Eastern Time. Reservations will accepted commencing, Wed., Dec. 18th. Please be certain to ask for the group rate.

We anticipate that most out-of-town attendees will arrive on Saturday afternoon or evening and depart on Monday evening. Friday accomodations will be provided for Sabbath observant attendees. We anticipate members of the writing committee will mostly depart on Tuesday evening. We expect that a few persons, who are unable to obtain evening return flights, will fly out the following day.

Hotel: Hyatt Regency Bethesda
One Bethesda Metro Center
Bethesda, Maryland, USA. 20814
Telephone: +1 301 657 1234
Toll Free Tel: 1-800-633-7313
Fax: +1 301 657 6453
Group Name: "University of Michigan: Workshop on Data Management for Biology"

Night Conference Hotel Rate Occupancy
Fri, Sat, Sun nights $99/night + tax Single or Double
Mon., Tue. night $199/night + tax Single
Mon., Tue. night $25/night + tax Additional persons (max 4)

The hotel is located in downtown Bethesda, MD at the intersection of Wisconsin Ave (Route 355, running north/south) and Old Georgetown Road (Route 187 running NW to SE). It is one block north of Montgomery Ave / Montgomery Lane (which run east/west). See the map . Note that the hotel is located atop the Bethesda Metro station on the Metro Red Line, one stop before the NIH Medical Center station. There are metro stations at National Airport and Union Station (Amtrak).

The hotel does not have DSL lines to the Internet in the rooms, only conventional phone lines. We have been told that they do have DSL lines to the Internet at the hotel concourse level (bring you own Ethenet cable and laptop and ask at the concierge's desk).

Workshop Dinner

All the workshop participants are invited to a dinner (probably in downtown Bethesda) on the first night of the workshop (Sunday, Feb. 2). The restaurant has yet to be decided. Location and direction will announced here. Participants are asked to RSVP whether they will be coming to dinner and any special dietary requirements (e.g., vegetarian, kosher, allergies, ...). The dinner is being hosted by IBM. Some government personnel (e.g., involved in procurements) may need to reimburse IBM for the cost of dinner. Dinner is currently planned for 6:30 PM on Sunday.

Travel Reimbursements

NSF has provided funds to reimburse travel and lodging by academics to the workshop. Industrial invitees are expected to pay their own travel and lodging.

We will reimburse academics for coach air fare (or train from the east coast), hotel (i.e., for Saturday, Feb. 1, and Sunday, Feb. 2), and local transportation, and federal per diem for meals. We are meeting on Sunday and Monday to permit attendees to obtain inexpensive air fares with Saturday night stay overs. We assume that most attendees will fly in on Saturday, and back out on Monday evening.

Travel reimbursements will be handled via the Univ. of Michigan. Further instructions will be posted to this web site in the first week of December. For further information / problems contact:

Janet M. Quaine
The University of Michigan
EECS Department
2224 EECS Building
1301 Beal Avenue
Ann Arbor, MI 48109-2122
Tel: 734-647-8221
Fax: 734-763-8094

Report Writing Committee

The report writing committee will stay on for Tuesday, Feb. 4th, to commence writing the workshop report. This small group conists of the organizing committee and a few other participants. Others will be name later.


The workshop has been supported by by the National Science Foundation, Directorate for Computer and Information Science and Engineering (CISE) under grant EIA-0239993, by National Library of Medicine at the National Institutes of Health, and by the U.S. Department of Energy under Contract No. DE-AC03-76SF00098. IBM Corp. is hosting the workshop dinner.

We would like to thank Milton Corn (NLM), Sylvia J. Spengler (NSF), Gary Strong (NSF), Bhavani Thuraisingham (NSF), Maria Zamenkova (NSF), for their assistance.

This page is maintained by Frank Olken Last update: Wed., Jan. 15, 2003