Monday, 25 June 2007 ==================== Present: - Eric Deutsch - Jim Schofstahl - Puneet Souda - Pierre-Alain Binz - Luisa Montecchi-Palazzi - Ruth McNally - Lennart Martens Initial overview discussion mzML -------------------------------- - Luisa: CV needs to be expanded - Eric: But some people don't like the idea of having everything in the CV - Luisa: but it allows flexibility, and we can perhaps obsolote some of these things that are a bit weird or superfluous - Eric: What's the history of the CV? - Pete: Well, I just got it from Randy; many definitions and terms are from IUPAC. Quite a few don't however (like resolution). - Eric: but the current CV is used by mzData, isn't it? So maybe we shouldn't obsolote too many terms in case they are still used? - Luisa: we can try to remap them? - Eric: So can we put them in an 'inactive' part of the CV and then clean up the active part. - Luisa: yes, that's the idea. And the active part will become a lot cleaner. - Eric: should existing mzData documents work with the latest version of the ontology? Or should we freeze their CV at 1.1.2 or something? - Luisa: in PSI-MI we went to a new version. But a mapping table was made there as well. - Eric: so we could create a version 2.0 after were done with the new CV, and then it's clear that there is a difference. - Eric: what about the validator? - Lennart: Sam Kerrien ran JAXB on the validator, and came up with some issues (e.g.: complextype, element has the same name), as well as a lot of recursion. - Luisa: The validator will first of all be configured using an XML mapping file (as explained for instance in Lyon) - Lennart: And at the next level, this generic validator (which can do 'per-element' semantic validation) needs to be extended for specific, 'complex' semantic validation (e.g.: counts) - Eric: Let's have a look at the mailing list discussions of the last weeks. - Eric: CV things, probably best ot have Luisa and Pete discuss these. Jimmy Eng commented on mzXML encoding MRM as tiny MS2 scans with scanType="MRM". - Jim: mzML can currently do both MRM as single peak (peak parking), and MRM as a small fragmentation spectrum - Eric: Scan numbers should be in ascending order (validator!), but need not be consecutive. - Eric: How about multiple runs per file? This would require revision of the index. - Eric: 'run' should have a better definition, start/stop time, absolute time and dataProcessingList order issues. - Eric: Randy mail exchange: suggestions to completely revise the format (e.g., external binary data). - Lennart: In my opinion this is a big step back, and it would be a major redesign of the schema. - Luisa: Could you not have external pointers (URI) and keep the door to external binary files. - Lennart: this would lead to massive confusion. - Eric: I tend to agree, and Randy also said we need a format that is stable over time and doesn't allow 'many ways to do that'. - Pierre-Alain: how about the precision of the data, will it be sufficient? - Eric: Jimmy Eng also asked for an encoding at top that tells you about the kind of data to expect. - Lennart: yes, so we could have an element 'contents' with one or more cvParams that each contain terms related to analyses. - Jim: MS, MSn, MRM, Chromatogram would be good terms. - Eric: OK, we'll go with these. MRM encoding ------------ - Eric: MRM as three arrays - Jim: So what you're showing is all of the transitions at once? But are they not different scans then? Because for Thermo it is? And I think it is the same for Applied. - Eric: And if you have a profile mode, than you wouold do it like that as well. - Eric: do we not get data bloat? So many transitions many times leads to really many spectra (all transitions all the time) - Jim: I don't think so, I thought we scan them when we find them, and that's that. - Eric: mzXML does it like I explained (so many times many transitions). You thus essentially keep scanning for transitions and sometimes the peaks wil show up. - Pierre-Alain: for some situations, we would have a profile for both precursors AND products. - Jim: single-centroid (peak parking) is the most common situation that people work in. - Pierre-Alain: what about the time parameter for the MRM. - Eric: in the example we have here, it is the total time of the scan. Some people might want a time array. - Eric: CV people, here are some terms I have added, which should probably be represented in the CV. By the way, how should we write these terms? - Luisa: in general, you would use the English spelling, so that it is human readable. Everything can be in the synonyms. - Pierre-Alain: the nomenclature of Q1 and Q3 should be generified. - Lennart: You could use 'mass analyzer 1' and 'mass analyzer 2', or 'mass selector1', 'mass sleector '. - Luisa: The original Q1/Q3 can obviously still be non-exact synonyms (related synonyms) in the CV. - Jim: showing MRM scans in the Thermo software - Jim: we only show detected transitions - Eric: so chromatogram reconstruction requires zero-filling for absent data then? As you haven't recorded non-detected transitions - Jim: we actually do record them! - Lennart: so we can expect the size explosion Eric was talking about. - Eric: does this file have any confirmatory peptide CID in it? - Jim: no, not this instance. - Jim, Lennart, Pierre-Alain: so what we actually have here is two transitions for each precurosr, sometimes even three. Essentially, they are quadrupole mass range scans, but with the focus on the two (or three) target product ions. - Pierre-Alain: so the question now is: do we store this as two (three) transitions, or do we store this as one product ion spectrum? - Eric: and they are all one scan? - Jim: I found one with four product ions. - Jim, Pierre-Alain, Eric, Lennart: so we have two different ways of storing the data: in separate, consequential scans, each containing a precursor and a fragment ion spectrum (for MRM,can be one fragment peak only, for instance); or as a table with a time series for each transition. - Lennart: The table approach would mean splitting the format in two subformats: MRM has the table but not the scans, non-MRM has the scans but not the table. This would be difficult to enforce (MRM can still be scans only) and would make the thing quite difficult. - Pierre-Alain: yes, and the table is essentially only a convenience feature. - Lennart: and the downside to the scan mechanism (up side is that we can do just about everything, in one format) is that MRM might become quite bloated. - Pierre-Alain, Eric, Jim, Lennart: so do the downsides of the 'convenience' table outweigh its benefits? - We think so. - Eric: so I guess the conclusion is that we use the scan-like format. - Eric: what about a TIC chromatogram - Jim: that would be big. Imagine 10.000 scans for 200 transitions. - Pierre-Alain: well, it would be big, but not too big. Especially when compared with profiling traces. It will be big compared to centroided data. - Pierre-Alain: we store all the scans, with an optional MRM table (multiple arrays over time). We can do the same thing for the chromatogram. And then we can make the scans optional as well. But then we are close to having them as outside binaries. - Eric: Currently, I'd like to keep it simple; i.e.: scans are the focus, are mandatory, and the rest we'll simply exclude now. - Eric: So the conclusion is to keep the focus where it is now. - Eric: encoding the transition list would then be in a wrapper schema. - Lennart: yes, we would allow a wrapper, similar to the index. Essentially the schema would have a wrapper that has optional 'plug-ins', and a mandatory 'mzML'. Tentative release schedule (Pierre-Alain, Eric, Lennart) -------------------------- - Try to have public review phase to overlap with HUPO (6 October 2007) - Should have something in DocProc by end July, half August. - Internal review of DocProc would include vendors to save on time, and wouild be good to have vendor 'support' and 'approval' by HUPO. CV discussion ------------- - Luisa, Pete: should polarity not be an attribute rather than a CV term - Eric: so the attribute value would be validated in the schema. - Luisa: yes. - Eric: Still, I think it would be good to keep it in the CV. - Luisa: should it be on all of the source, analyzers, detector? Can they actually be somewhere else. - Jim: put it on instrument level and scan. - Eric: can we not skip the instrument level then? - Luisa: OK, we'll put it on the scan level only. ParamGroups can be used to group (compress) it anyway. - Luisa, Pete: What about proportional resolution? - Jim: I would think it is not that important. But if we remove it, people might argue? - Luisa: we'll just move it to the 'discussion' section. - Luisa, Pete: SIM/MRM and Mass Scan:should they be analyzer settings or scan settings? - Lennart, Jim, Eric: scan level. - Luisa, Pete: Scan Law, is that a scan level thing? - Eric: yes. - Luisa, Pete: analyzer params. Going through the list; what belongs here, what does not. - Luisa, Pete: now let's go through the detector branch; total ion current is moved to scan. - Luisa, Pete: Source is next. - Luisa, Pete: Instrument; vendor will group models. - Pierre-Alain: what about the 'mass spectrometer' parameter? Should it not include source. - Lennart: I think it is redundant, and doesn't validate easily. So it only allows people to mess the schema up. - Pierre-Alain: it is not mandated by MIAPE. - Eric: so the entire mass spectrometer part goes. - Jim: group source, analyzer, detector terms under instrument. - Luisa, Pete: the m/z separation method is next. - Eric: the whole thing should be moved to discussion. - Luisa, Pete: there are two 'isotope' terms here? - Eric: to 'discussion' with the whole thing. - Luisa, Pete: scan then. - Luisa, Pete, Eric: Full scan, zoom scan, what do we do with this? - Pierre-Alain: what about scan and spectrum? How do they work? - Luisa: In the schema, in spectumHeader, there is a 'instrumentSetting' element. - Lennart: this should be 'scan'. (Lennart updated local copy of the schema in this way). - Eric: msLevel should be a scan attribute. (Lennart updated local copy of the schema in this way). - Louisa: 'Spectrum' term should go, as the spectrum element does not allow CV params. - Lennart: 'spectrumHeader' has all of this - Eric: 'SpectrumDescription' is a better term. (Lennart updated local copy of the schema in this way). - Lennart: so what about 'Peak' and 'Base peak'. I vote to put them in the Spectrum Description. - Louisa: me too. - Louisa: so we create some 'spectrum attributes': Base Peak Intensity, Base Peak m/z.