Recently I attended the ASIS&T annual meeting, held in Austin, TX. It was enjoyable and informative as usual, though I was sick most of the time. I gave a poster about Wikipedia, which was well-received and which I'll feature in a later post.
I found two sets of sessions at ASIS&T most interesting. I'll come back & insert names & links later.
The first set were all about scientific data, and the management thereof. There was a mixed group of presenters, including some working scientists, who talked about the problems of managing enormous amounts of data. One presenter, who was a data manager with a team of polar (ice & snow) researchers was particularly interesting, talking about how having someone in the field with the scientists entering data on the spot or the day-of was helpful in reducing error. Having spent a little time transcribing other people's field data, long after the point of collection, I can well imagine that this is true; it's hard to interpret other people's scrawls and question marks long after they've forgotten what it is they were trying to convey.
Another scientist was from the NASA Goddard Space center, and was talking about the explosion of data that is likely to come out of space science in the next few years, as NASA and other agencies move away from a single collector (e.g. Hubble) & towards a multiple sensor network (e.g. up to a thousand smaller probes launched into various parts of the atmosphere. There's going to be a real problem with what to do with all this data, how to tag it, archive it, etc. Currently NASA apparently does accessible backups on disk and dark archive backups on magnetic tape, with a media refresh every 5-10 years. Still this doesn't solve the problem of how to get data out to the scientists who need it.
Finally, there was an information science researcher who talked about his work with physical chemists, and how they used data, who talked about a) the difficulty of tagging and organizing chemical data (something long-known in the field); and b) how scientists could use data to do work even if they didn't collect the data in the first place -- that is, large datasets provide a new way of doing science as they afford an opportunity to do analysis without having to collect the data oneself. This, of course, assumes that the data is properly tagged and managed (so people know what they're looking at, irregularities are noted in the metadata, etc) -- which is the role of the information scientist. Interestingly, in this context, data is viewed as a resource -- not all scientists wanted to share their data because they didn't want to get scooped in publication.
Comments